Hacker Newsnew | past | comments | ask | show | jobs | submit | anitil's commentslogin

This was a really good introduction to both libc++ and libc++abi for me as someone who have worked in mostly-C, in particular his thread-safe initialization example. If you look at the PR description it's clear that a lot of care went in to explaining his reasoning. I appreciate that he's thoughtful about understanding that merging this is adopting a restriction on future development and offers to maintain a fork.

On that latest episode of 'Security Cryptography Whatever' [0] they mention that the time spent on improving the harness (at the moment) end up being outperformed by the strategy of "wait for the next model". I doubt that will continue, but it broke my intuition about how to improve them

[0] https://securitycryptographywhatever.com/2026/03/25/ai-bug-f...


This is basically how you should treat all AI dev. Working around AI model limits for something that will take 3-6 months of work has very little ROI compared to building what works today and just waiting and building what works tomorrow tomorrow.

This is the hard part - especially with larger initiatives, it takes quite a bit of work to evaluate what the current combination of harness + LLM is good at. Running experiments yourself is cumbersome and expensive, public benchmarks are flawed. I wish providers would release at least a set of blessed example trajectories alongside new models.

As it is, we're stuck with "yeah it seems this works well for bootstrapping a Next.js UI"...


This assumes AI model improvements will be predictable, which they won’t.

There are several simultaneous moving targets: the different models available at any point in time, the model complexity/ capability, the model price per token, the number of tokens used by the model for that query, the context size capabilities and prices, and even the evolution of the codebase. You can’t calculate comparative ROIs of model A today or model B next year unless these are far more predictable than they currently are.


That seems very unlikely.

Chinese AI vendors specifically pointed out that even a few gens ago there was maybe 5-15% more capability to squeeze out via training, but that the cost for this is extremely prohibitive and only US vendors have the capex to have enough compute for both inference and that level of training.

I'd take their word over someone that has a vested interested in pushing Anthropic's latest and greatest.

The real improvements are going to be in tooling and harnessing.


That only applies to workarounds for current limitations, no? Some things a harness can do will apply in the same way to future models.

It’s a good thing to keep in mind, but LLM + scaffolding is clearly superior. So if you just use vanilla LLMs you will always be behind.

I think the important thing is to avoid over-optimizing. Your scaffold, not avoid building one altogether.


It's wild to me that a paragraph or 7 of plain English that amounts to "be good at things" is enough to make a material difference in the LLM's performance.

As the base is an auto-regressive model that is capable of generating more or less any kind of text, it kind of makes sense though. It always has the capabilities, but you might want it to emulate a stupid analysis as well. So you're leading in with a text that describes what the rest of the text will be in a pretty real sense.

There will always be bosses who/which think telling workers to work well works well.

They have no values of their own, so you have to direct their attention that way.

I think you took away the wrong lesson from that podcast:

I think there is work to be done on scaffolding the models better. This exponential right now reminds me of the exponential from CPU speeds going up until let’s say 2000 or something where you had these game developers who would develop really impressive games on the current thing of hardware and they do it by writing like really detailed intricate x86 instruction sequences for like just exactly whatever this, like, you know, whatever 486 can do, knowing full well that in 2 years, you know, the pen team is gonna be able to do this much faster and they didn’t need to do it. But like you need to do it now because you wanna sell your game today and like, yeah, you can’t just like wait and like have everyone be able to do this. And so I do think that there definitely is value in squeezing out all of the last little juice that you can from the current model.

Everything you can do today will eventually be obsoleted by some future technology, but if you need better results today, you actually have to do the work. If you just drop everything and wait for the singularity, you're just going to unnecessarily cap your potential in the meantime.


> it broke my intuition about how to improve them

Here we go again.

http://www.incompleteideas.net/IncIdeas/BitterLesson.html


And if you have the better harness and the next model?

I would _hope_ that the double combo would be better, but honestly I have no idea

Thank you for the recommendation on that video! I've already adopted to using DuckDB for my ad-hoc analytics work but I didn't know the background

I had a look at your github and blog but couldn't find the game, is it public? Or do I need to watch your streams to see it?

There is a premium on risk reduction. I believe this is one of the reasons why companies like to incorporate in Delaware as the courts there are notoriously fast (I'm going off my memory of a Planet Money episode so could be wrong here).

The sqlite project actually benefited from this dogfooding. Interestingly recursive CTEs [0] were added to sqlite due to wanting to trace commit history [1]

[0] https://sqlite.org/lang_with.html#recursive_query_examples

[1] https://fossil-scm.org/forum/forumpost/5631123d66d96486 - My memory was roughly correct, the title of the discussion is 'Is it possible to see the entire history of a renamed file?'


On and of course, the discussion board is itself hosted in a sqlite file!

> Silicon sampling removes the messy, costly part of asking people what they think

Sometimes I wonder if I'm too cynical, and at other times I wonder if I'm not cynical enough


This is an interesting de-obfuscation tool that Trail of Bits has built. I'd never come across this technique of hiding logical/arithmetic operations so it was interesting to learn about it and how they've attempted to de-obfuscate it.

> Using these more sophisticated data structures, g++ is able to compute the prime numbers below 10000 in only 8 seconds, using a modest 3.1 GiB of memory.

Finally, I can get some primes on my laptop!


This is a concerning read, I'm not quite sure what the driving motivation is for Artemis, but the following answered at least part of my question -

> That context is a moon program that has spent close to $100 billion and 25 years with nothing to show for itself, at an agency that has just experienced mass firings and been through a near-death experience with its science budget


I understand why NASA might be a little antsy but 100B over 25 years doesn't seem like a lot for America for a long horizon project.


It's 100b just to begin - the full bill would be multiples of that.

And there are options now.


[flagged]


Yikes.


Worded provocatively but with a $200B Iran war bill being pushed and DHS funding in the OBBA being increased by over $300B from baseline, it’s not necessarily wrong.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: