For my master’s thesis, I looked at how to get better throughput over 5G mmWave, which has really unpredictable bandwidth. Just using a big BFIFO buffer (sized to the BDP) gave us twice the throughput of FQ-CoDel, but at the cost of high jitter and bad latency for interactive traffic. To fix that, I wrote a non-transparent Performance Enhancing Proxy (PEP) as a Linux kernel module in C. It mixes big buffers with FQ-CoDel and manages to get both high, stable throughput and low latency impact.
Few others have brought up Apache Modules, and they are incredibly similar to my idea. :D Did not know about them while I was developing it. The main difference as far as I could see was the fact that you had to recompile / restart the server. Which I try to avoid, so little changes require almost no recompiling.
In apache 1.3 it was far from unusual to do a complete bundled compile and restart the entire thing every time because there were gremlins that showed up when you dynamically loaded the more complex modules often enough that it was operationally less aggravating overall to take the brute force approach (I did quite a bit of that a couple decades back, for my sins).
apache 2+ is a very different (and rather more robust) beast, and also has the 'graceful restart' system - see https://httpd.apache.org/docs/2.4/stopping.html - which makes the parent tell its worker processes to drain their request queues, -then- exit, after which each one is replaced in turn until you've fully upgraded to the new configuration+code.
This approach has its disadvantages, of course, but not that morally different from how erlang processes hot reload into new code, and once you knew what you were doing the end result was simple, predictable, and nicely transparent to end users.
You might also want to look into ISAPI extensions[1,2] in Microsoft’s IIS, those are also just DLLs that the web server loads into itself, and were once advertised as the most performant way to serve dynamic stuff from it. It doesn’t look like there’s a way to request that extensions be reloaded, though: the server either unloads them at its discretion (once no in-flight requests are using them?) or not at all (if “extension caching” is enabled). But there’s an advert[3] from somebody who shimmed that capability onto it back in 2006.
(You wouldn’t have had a good day debugging these things, mind you. But it’s something that people experimented with back in the day, alongside Web servers programmable in Java[4] or Tcl[5].)
You might not necessarily need even shared memory - it's possible to pass a file descriptor over a socket on all modern unices (and albeit with a completely different API also on Win32) so your control process could do an accept(), maybe read the headers in there, then talk to module processes to determine what the desired approach is, and then hand over the http socket so the module process can do whatever it needs to with it.
When I imagine how I'd use this in my head the imagined design rapidly gets much more complicated than what you're currently doing, and I'm not at all arguing that that complexity is necessarily worth it ... but there's also all sorts of cool+weird things you could implement that way that would be exceedingly tricky otherwise, so I figured I'd point it out anyway :D
(the example of code that does the relevant magic to get fds across a socket that immediately springs to mind is https://fastapi.metacpan.org/source/MLEHMANN/IO-FDPass-1.3/F... - yes, warning, it's inside a perl extension, but I see no reason that would impede you borrowing the C parts if it was useful ;)
Initially I did use fork a lot to allow each handler in its own “process” mainly because I wanted to isolate it as well in a container fashion. However the overhead became to much at the start when I just wanted it to work. Will be looking into it again!
I think some stuff will be happier in the "director" process and some happier in its own "worker" process - and fd passing is the magic trick that lets you accept() in the former and still do the bulk of the work in an appropriate instance of the latter.
I 100% get the 'overhead' part - but at some point hopefully you'll have enough other stuff already running that the 'fun' factor of enabling that will win out :D
"Because I can" remains an entirely legitimate reason for a hobby project.
If anything, you've gone further along the "also (at least sort of) practical" scale than I expected.
Given as mentioned elsewhere a per-request arena + bump allocator system, it might actually be -genuinely- practical (to the extent that writing application logic in C is at all ;)
The second point is absolutely fair and your project is very cool and impressive, but the speed one is misleading. I am fairly sure you actually leave a fair bit of performance on the table simply by how convoluted parallelism and async IO are in C, and something like Java might easily outperform it in standard CRUD backend use cases.
You’re absolutely right, I have not tried very hard to optimize for speed either yet. To comment was more directed at the fact most just say that “speed” is the main reason to use C, but for me it’s almost exclusively for the fun and “cool” factor.
This hobby project is inspired by kernel modules and AWS Lambda. It lets you write raw C modules, upload them, and have the server compile and run only the module itself at runtime—no need to recompile the entire application or restart the server.
You can update routes and modules dynamically. Even WebSocket handlers can be updated without dropping existing connections.