More

anitil · 2026-04-15T23:17:53 1776295073

This was a really good introduction to both libc++ and libc++abi for me as someone who have worked in mostly-C, in particular his thread-safe initialization example. If you look at the PR description it's clear that a lot of care went in to explaining his reasoning. I appreciate that he's thoughtful about understanding that merging this is adopting a restriction on future development and offers to maintain a fork.

anitil · 2026-04-15T22:34:26 1776292466

On that latest episode of 'Security Cryptography Whatever' [0] they mention that the time spent on improving the harness (at the moment) end up being outperformed by the strategy of "wait for the next model". I doubt that will continue, but it broke my intuition about how to improve them

[0] https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...

conception · 2026-04-15T23:16:46 1776295006

This is basically how you should treat all AI dev. Working around AI model limits for something that will take 3-6 months of work has very little ROI compared to building what works today and just waiting and building what works tomorrow tomorrow.

sally_glance · 2026-04-16T00:54:54 1776300894

This is the hard part - especially with larger initiatives, it takes quite a bit of work to evaluate what the current combination of harness + LLM is good at. Running experiments yourself is cumbersome and expensive, public benchmarks are flawed. I wish providers would release at least a set of blessed example trajectories alongside new models.

As it is, we're stuck with "yeah it seems this works well for bootstrapping a Next.js UI"...

thephyber · 2026-04-16T09:24:10 1776331450

This assumes AI model improvements will be predictable, which they won’t.

There are several simultaneous moving targets: the different models available at any point in time, the model complexity/ capability, the model price per token, the number of tokens used by the model for that query, the context size capabilities and prices, and even the evolution of the codebase. You can’t calculate comparative ROIs of model A today or model B next year unless these are far more predictable than they currently are.

jorvi · 2026-04-16T10:15:45 1776334545

That seems very unlikely.

Chinese AI vendors specifically pointed out that even a few gens ago there was maybe 5-15% more capability to squeeze out via training, but that the cost for this is extremely prohibitive and only US vendors have the capex to have enough compute for both inference and that level of training.

I'd take their word over someone that has a vested interested in pushing Anthropic's latest and greatest.

The real improvements are going to be in tooling and harnessing.

Kinrany · 2026-04-16T11:47:34 1776340054

That only applies to workarounds for current limitations, no? Some things a harness can do will apply in the same way to future models.

theptip · 2026-04-16T02:11:01 1776305461

It’s a good thing to keep in mind, but LLM + scaffolding is clearly superior. So if you just use vanilla LLMs you will always be behind.

I think the important thing is to avoid over-optimizing. Your scaffold, not avoid building one altogether.

fragmede · 2026-04-16T02:35:09 1776306909

It's wild to me that a paragraph or 7 of plain English that amounts to "be good at things" is enough to make a material difference in the LLM's performance.

l33tman · 2026-04-16T06:49:30 1776322170

As the base is an auto-regressive model that is capable of generating more or less any kind of text, it kind of makes sense though. It always has the capabilities, but you might want it to emulate a stupid analysis as well. So you're leading in with a text that describes what the rest of the text will be in a pretty real sense.

chrisjj · 2026-04-16T10:52:22 1776336742

There will always be bosses who/which think telling workers to work well works well.

AlexCoventry · 2026-04-16T03:47:22 1776311242

They have no values of their own, so you have to direct their attention that way.

yorwba · 2026-04-16T08:29:33 1776328173

I think you took away the wrong lesson from that podcast:

I think there is work to be done on scaffolding the models better. This exponential right now reminds me of the exponential from CPU speeds going up until let’s say 2000 or something where you had these game developers who would develop really impressive games on the current thing of hardware and they do it by writing like really detailed intricate x86 instruction sequences for like just exactly whatever this, like, you know, whatever 486 can do, knowing full well that in 2 years, you know, the pen team is gonna be able to do this much faster and they didn’t need to do it. But like you need to do it now because you wanna sell your game today and like, yeah, you can’t just like wait and like have everyone be able to do this. And so I do think that there definitely is value in squeezing out all of the last little juice that you can from the current model.

Everything you can do today will eventually be obsoleted by some future technology, but if you need better results today, you actually have to do the work. If you just drop everything and wait for the singularity, you're just going to unnecessarily cap your potential in the meantime.

argee · 2026-04-16T03:06:36 1776308796

> it broke my intuition about how to improve them

Here we go again.

http://www.incompleteideas.net/IncIdeas/BitterLesson.html

bitexploder · 2026-04-16T02:30:03 1776306603

And if you have the better harness and the next model?

anitil · 2026-04-16T06:38:18 1776321498

I would _hope_ that the double combo would be better, but honestly I have no idea

anitil · 2026-04-15T02:52:53 1776221573

Thank you for the recommendation on that video! I've already adopted to using DuckDB for my ad-hoc analytics work but I didn't know the background

anitil · 2026-04-15T01:28:10 1776216490

I had a look at your github and blog but couldn't find the game, is it public? Or do I need to watch your streams to see it?

anitil · 2026-04-09T01:31:08 1775698268

There is a premium on risk reduction. I believe this is one of the reasons why companies like to incorporate in Delaware as the courts there are notoriously fast (I'm going off my memory of a Planet Money episode so could be wrong here).

anitil · 2026-04-09T01:26:49 1775698009

The sqlite project actually benefited from this dogfooding. Interestingly recursive CTEs [0] were added to sqlite due to wanting to trace commit history [1]

[0] https://sqlite.org/lang_with.html#recursive_query_examples

[1] https://fossil-scm.org/forum/forumpost/5631123d66d96486 - My memory was roughly correct, the title of the discussion is 'Is it possible to see the entire history of a renamed file?'

anitil · 2026-04-09T01:27:24 1775698044

On and of course, the discussion board is itself hosted in a sqlite file!

anitil · 2026-04-08T01:06:53 1775610413

> Silicon sampling removes the messy, costly part of asking people what they think

Sometimes I wonder if I'm too cynical, and at other times I wonder if I'm not cynical enough

anitil · 2026-04-06T23:17:06 1775517426

This is an interesting de-obfuscation tool that Trail of Bits has built. I'd never come across this technique of hiding logical/arithmetic operations so it was interesting to learn about it and how they've attempted to de-obfuscate it.

anitil · 2026-04-02T05:26:17 1775107577

> Using these more sophisticated data structures, g++ is able to compute the prime numbers below 10000 in only 8 seconds, using a modest 3.1 GiB of memory.

Finally, I can get some primes on my laptop!

anitil · 2026-03-31T03:14:27 1774926867

This is a concerning read, I'm not quite sure what the driving motivation is for Artemis, but the following answered at least part of my question -

> That context is a moon program that has spent close to $100 billion and 25 years with nothing to show for itself, at an agency that has just experienced mass firings and been through a near-death experience with its science budget

ta8903 · 2026-03-31T05:43:54 1774935834

I understand why NASA might be a little antsy but 100B over 25 years doesn't seem like a lot for America for a long horizon project.

radu_floricica · 2026-03-31T09:32:15 1774949535

It's 100b just to begin - the full bill would be multiples of that.

And there are options now.

wiseowise · 2026-03-31T05:53:07 1774936387

[flagged]

sawjet · 2026-03-31T07:39:33 1774942773

Yikes.

mikeyouse · 2026-03-31T12:30:59 1774960259

Worded provocatively but with a $200B Iran war bill being pushed and DHS funding in the OBBA being increased by over $300B from baseline, it’s not necessarily wrong.