This was a really good introduction to both libc++ and libc++abi for me as someone who have worked in mostly-C, in particular his thread-safe initialization example. If you look at the PR description it's clear that a lot of care went in to explaining his reasoning. I appreciate that he's thoughtful about understanding that merging this is adopting a restriction on future development and offers to maintain a fork.
On that latest episode of 'Security Cryptography Whatever' [0] they mention that the time spent on improving the harness (at the moment) end up being outperformed by the strategy of "wait for the next model". I doubt that will continue, but it broke my intuition about how to improve them
This is basically how you should treat all AI dev. Working around AI model limits for something that will take 3-6 months of work has very little ROI compared to building what works today and just waiting and building what works tomorrow tomorrow.
This is the hard part - especially with larger initiatives, it takes quite a bit of work to evaluate what the current combination of harness + LLM is good at. Running experiments yourself is cumbersome and expensive, public benchmarks are flawed. I wish providers would release at least a set of blessed example trajectories alongside new models.
As it is, we're stuck with "yeah it seems this works well for bootstrapping a Next.js UI"...
This assumes AI model improvements will be predictable, which they won’t.
There are several simultaneous moving targets: the different models available at any point in time, the model complexity/ capability, the model price per token, the number of tokens used by the model for that query, the context size capabilities and prices, and even the evolution of the codebase. You can’t calculate comparative ROIs of model A today or model B next year unless these are far more predictable than they currently are.
Chinese AI vendors specifically pointed out that even a few gens ago there was maybe 5-15% more capability to squeeze out via training, but that the cost for this is extremely prohibitive and only US vendors have the capex to have enough compute for both inference and that level of training.
I'd take their word over someone that has a vested interested in pushing Anthropic's latest and greatest.
The real improvements are going to be in tooling and harnessing.
It's wild to me that a paragraph or 7 of plain English that amounts to "be good at things" is enough to make a material difference in the LLM's performance.
As the base is an auto-regressive model that is capable of generating more or less any kind of text, it kind of makes sense though. It always has the capabilities, but you might want it to emulate a stupid analysis as well. So you're leading in with a text that describes what the rest of the text will be in a pretty real sense.
I think you took away the wrong lesson from that podcast:
I think there is work to be done on scaffolding the models better. This exponential right now reminds me of the exponential from CPU speeds going up until let’s say 2000 or something where you had these game developers who would develop really impressive games on the current thing of hardware and they do it by writing like really detailed intricate x86 instruction sequences for like just exactly whatever this, like, you know, whatever 486 can do, knowing full well that in 2 years, you know, the pen team is gonna be able to do this much faster and they didn’t need to do it. But like you need to do it now because you wanna sell your game today and like, yeah, you can’t just like wait and like have everyone be able to do this. And so I do think that there definitely is value in squeezing out all of the last little juice that you can from the current model.
Everything you can do today will eventually be obsoleted by some future technology, but if you need better results today, you actually have to do the work. If you just drop everything and wait for the singularity, you're just going to unnecessarily cap your potential in the meantime.
There is a premium on risk reduction. I believe this is one of the reasons why companies like to incorporate in Delaware as the courts there are notoriously fast (I'm going off my memory of a Planet Money episode so could be wrong here).
The sqlite project actually benefited from this dogfooding. Interestingly recursive CTEs [0] were added to sqlite due to wanting to trace commit history [1]
This is an interesting de-obfuscation tool that Trail of Bits has built. I'd never come across this technique of hiding logical/arithmetic operations so it was interesting to learn about it and how they've attempted to de-obfuscate it.
> Using these more sophisticated data structures, g++ is able to compute the prime numbers below 10000 in only 8 seconds, using a modest 3.1 GiB of memory.
This is a concerning read, I'm not quite sure what the driving motivation is for Artemis, but the following answered at least part of my question -
> That context is a moon program that has spent close to $100 billion and 25 years with nothing to show for itself, at an agency that has just experienced mass firings and been through a near-death experience with its science budget
Worded provocatively but with a $200B Iran war bill being pushed and DHS funding in the OBBA being increased by over $300B from baseline, it’s not necessarily wrong.
reply