More

gwerbin · 2026-04-27T01:05:23 1777251923

The thing that seems to bring up these extremely unlikely destructive token sequences and it totally seems to be letting agents just run for a long time. I wonder if some kind of weird subliminal chaos signal develops in the context when the AI repeatedly consumes its own output.

Personally I don't even let my agent run a single shell command without asking for approval. That's partly because I haven't set up a sandbox yet, but even with a sandbox there is a huge "hazard surface" to be mindful of.

I wonder if AI agent harnesses should have some kind of built-in safety measure where instead of simply compacting context and proceeding, they actually shut down the agent and restart it.

That said I also think even the most advanced agents generate code that I would never want to base a business on, so the whole thing seems ridiculous to me. This article has the same energy as losing money on NFTs.

mike_hearn · 2026-04-27T08:29:19 1777278559

I don't think it's that. It's really all about context. Humans always have at least a bit of context so it's hard for us to imagine what it's like to have none at all. But the AI genuinely has none. And it's under (training) pressure to get the task done quickly, be a yes man, and so on.

Humans do make mistakes like these. I'm not sure where the fault really lies here. I can imagine a human under time pressure making the same error. It's maybe a goof in the safety design of railway. It shouldn't be possible to delete all your backups with a single API call using a normal token.

gwerbin · 2026-04-27T00:56:57 1777251417

The author definitely deserves a lot of blame here and clearly doesn't understand AI well enough to have a coherent opinion on AI safety.

But Railway bears some responsibility too because, at least of the author is to be believed, it looks like they provide no safety tools for users, regardless of whether they use AI or not. You should be able to generate scoped API tokens. That's just good practice. A human isn't likely to have made this particular mistake, but it doesn't seem out of the question either.

dpark · 2026-04-27T04:41:37 1777264897

> You should be able to generate scoped API tokens. That's just good practice.

Fully agree, but given the rest of this story I don’t imagine the author would have scoped them unless Railway literally forced him to.

> A human isn't likely to have made this particular mistake, but it doesn't seem out of the question either.

The AI agent was deleting the volume used in the staging environment. It happened to also be the volume used in the production environment. 100% a human could have made this mistake.

gwerbin · 2026-04-27T00:34:59 1777250099

But at least you have a 5000 LoC project on Github that deletes LinkedIn profiles!

gwerbin · 2026-04-26T19:30:16 1777231816

Call me crazy but does AI not seem like the root cause here? At the beginning of the post they say that the AI agent found a file with what they thought was a narrowly scoped API token, and they very clearly state that they never would have given an AI full access if they realized it had the ability to do stuff like this with that token.

So while the AI did something significantly worse than anything a hapless junior engineer might be expected to do, it sounds like the same thing could've resulted from an unsophisticated security breach or accidental source code leak.

Is AI a part of the chain of events? Absolutely. Is it the sole root cause? Seems like no.

oskarkk · 2026-04-26T20:15:55 1777234555

> what they thought was a narrowly scoped API token, and they very clearly state that they never would have given an AI full access if they realized it had the ability to do stuff like this with that token

It sounds like the token the author created just didn't have any scope, it had full permissions. From the post:

> Tokens are not scoped by operation, by environment, or by resource at the permission level. There is no role-based access control for the Railway API — every token is effectively root. The Railway community has been asking for scoped tokens for years. It hasn't shipped.

So it wasn't "a narrowly scoped API token", it was a full access token, and I suspect the author didn't have any reason to think it was some special specific purpose token, he just didn't think about what the token can do. What he's describing is his intent of creating the token (how he wanted to use it), not some property of the token.

Author said in an X post[0] that it was an "API token", not a "project token", which allows "account level actions"[1], with a scope of "All your resources and workspaces" or "Single workspace"[2], with no possibility of specifying granular permissions. Account token "can perform any API action you're authorized to do across all your resources and workspaces". Workspace token "has access to all the workspace's resources".

[0] https://x.com/lifeof_jer/status/2047733995186847912

[1] https://docs.railway.com/cli#tokens

[2] https://docs.railway.com/integrations/api#choosing-a-token-t...

hunterpayne · 2026-04-26T22:49:17 1777243757

Then you need to reread the article. The author made a key for the LLM that didn't have permissions to delete a volume. The agent then found ANOTHER key with those permissions and used that instead.

oskarkk · 2026-04-27T06:00:03 1777269603

You're not contradicting my comment, I was talking specifically about the key with full permissions that the LLM found (the article doesn't talk about other keys that LLM could have had, unless I missed something).

Somewhere in the files there was a key with full API permissions. The author had no intent of having the LLM use that key, and wasn't aware that LLM can access that key. That key was created to manage some domains, and that was unrelated to the LLM's work. The author wasn't aware how dangerous the key was and is surprised that it could be used to delete a volume.

Essentially I agree with gwerbin that the situation comes down to mishandling of the key. The author makes it seem like the key was allowed to do something that it shouldn't be allowed to, but it was just a full access key, no scoping possible for that type of key (Railway has also other, less privileged types of keys/APIs).

Btw, I partially agree with author's criticisms, ideally these keys should be scoped, and maybe the UI should give more warnings when creating that type of key. But this situation could still happen as long as you put a wrong key in a wrong place (and specifically a place accessible to LLMs).

dpark · 2026-04-27T05:18:03 1777267083

> The author made a key for the LLM that didn't have permissions to delete a volume.

No he didn’t, because this doesn’t exist. Railway does not have a token with that kind of scoping.

pierrekin · 2026-04-26T19:34:10 1777232050

Anecdote: As a hapless junior engineer I once did something extremely similar.

I ran a declarative coding tool on a resource that I thought would be a PATCH but ended up being a PUT and it resulted in a very similar outcome to the one in this post.

gwerbin · 2026-04-27T00:42:43 1777250563

Yeah that's the typical junior engineer scenario right? Run a command that wasn't meant to be destructive but accidentally destroy something. This is different. AI agent went on some kind of wild goose chase of fixing problems, and eventually the most probable token sequence ended up at "delete this database". This is more like if your senior engineer with extreme ADHD ate a bunch of acid before sitting down to work.

shoo · 2026-04-27T01:17:31 1777252651

creating isolated staging & prod environments -- good idea

allowing an AI agent to get hold of creds that let it execute destructive changes against production -- not a great idea

allowing prod database changes from the machine where the AI agent is running at all -- not a great idea

choosing a backup approach that fails completely if there's an accidental volume wipe API call -- not a great idea

choosing to outsource key dependencies to a vendor, where you want a recovery SLA, without negotiating & paying for a recovery SLA -- you get what you get, and you dont get upset

dpark · 2026-04-27T05:20:28 1777267228

> creating isolated staging & prod environments -- good idea

Would have been a good idea but he didn’t do this either. The volume in question was used in both staging and production apparently, per the “confession”. The agent was deleting the volume because it was used for staging, not realizing it was also used for prod.

jcgrillo · 2026-04-27T01:30:24 1777253424

> choosing to outsource key dependencies to a vendor

This is the entire thing. The author is basically slinging blame at a bunch of different vendors, and while some of the criticisms might be valid product feedback, it absolutely does not achieve what they're trying to, which is to absolve themselves of responsibility. This is a largely unregulated industry, which means when you stand up a service and sell it to customers, you are responsible for the outcome. Not anyone else. It doesn't matter if one of your vendors does something unexpected. You don't get to hide behind that. It was your one and only job to not be taken by surprise. Letting the hipster ipsum parrot loose with API credentials is a choice. Trusting vendors without verifying their claims is a choice. Failing to read and understand documentation is a choice.

gwerbin · 2026-04-26T10:37:11 1777199831

It's source-available proprietary software that happens to be distributed through NPM.

oefrha · 2026-04-26T11:50:18 1777204218

It's not source available, there's only bundled minified code looking like https://unpkg.com/ooko@0.121.0/static/js/main..js

If this is source available then every website is source available.

reactordev · 2026-04-26T12:05:33 1777205133

But every website is source available. How else would you render it? Streaming PNGs?

efilife · 2026-04-26T15:08:36 1777216116

Am I crazy or is this not minified, but obfuscated to hide what the code does? Can't test right now

gwerbin · 2026-04-25T12:50:00 1777121400

That's a common phenomenon in model fitting, depending on the type of model. In both old school regression and neural networks, the fitted model does not distinguish between specific training examples and other inputs. So specific input-output pairs from the training data don't get special privilege. In fact it's often a good thing that models don't just memorize inputt-output pairs from training, because that allows them to smooth over uncaptured sources of variation such as people all being slightly different as well as measurement error.

In this case they had to customize the model fitting to try to get the error closer to zero specifically on those attributes.

sorenjan · 2026-04-25T21:58:11 1777154291

Yes, but why are they estimating the features when they are already available? They can estimate the other measurements from height etc, and just use the known inputs as is. I don't get the point of passing them through a model at all.

arkadiuss · 2026-04-27T09:32:04 1777282324

The previous response was exactly right. The estimated features are impacting height, so the height can't be set then do the rest. It also cannot be tuned afterwards because it would change the mass. So vicious circle.

gwerbin · 2026-04-24T22:55:20 1777071320

This is just goofy prompting.

I have good success when I ask the agent to help me debug the harness. "Help me debug why Claude Code is ignoring my hook".

gwerbin · 2026-04-24T22:50:45 1777071045

That's the point. Burgers are more expensive (relative to "all" other goods) compared to back then.

gwerbin · 2026-04-24T18:33:27 1777055607

Or just don't use AI to write code. Use it as a code reviewer assistant along with your usual test-lint development cycle. Use it to help evaluate 3rd party libraries faster. Use it to research new topics. Use it to help draft RFCs and design documents. Use it as a chat buddy when working on hard problems.

I think the AI companies all stink to high heaven and the whole thing being built on copyright infringement still makes me squirm. But the latest models are stupidly smart in some cases. It's starting to feel like I really do have a sci-fi AI assistant that I can just reach for whenever I need it, either to support hard thinking or to speed up or entirely avoid drudgery and toil.

You don't have to buy into the stupid vibecoding hype to get productivity value out of the technology.

You of course don't have to use it at all. And you don't owe your money to any particular company. Heck for non-code tasks the local-capable models are great. But you can't just look at vibecoding and dismiss the entire category of technology.

onlyrealcuzzo · 2026-04-24T19:12:09 1777057929

> Or just don't use AI to write code.

Anecdata, but I'm still finding CC to be absolutely outstanding at writing code.

It's regularly writing systems-level code that would take me months to write by hand in hours, with minimal babysitting, basically no "specs" - just giving it coherent sane direction: like to make sure it tests things in several different ways, for several different cases, including performance, comparing directly to similar implementations (and constantly triple-checking that it actually did what you asked after it said "done").

For $200/mo, I can still run 2-3 clients almost 24/7 pumping out features. I rarely clear my session. I haven't noticed quality declines.

Though, I will say, one random day - I'm not sure if it was dumb luck - or if I was in a test group, CC was literally doing 10x the amount of work / speed that it typically does. I guess strange things are bound to happen if you use it enough?

Related anecdata: IME, there has been a MASSIVE decline in the quality of claude.ai (the chatbot interface). It is so different recently. It feels like a wanna-be crapier version of ChatGPT, instead of what it used to be, which was something that tried to be factual and useful rather than conversational and addictive and sycophantic.

mlinsey · 2026-04-24T19:44:17 1777059857

My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.

A small app, or a task that touches one clear smaller subsection of a larger codebase, or a refactor that applies the same pattern independently to many different spots in a large codebase - the coding agents do extremely well, better than the median engineer I think.

Basically "do something really hard on this one section of code, whose contract of how it intereacts with other code is clear, documented, and respected" is an ideal case for these tools.

As soon as the codebase is large and there are gotchas, edge cases where one area of the code affects the other, or old requirements - things get treacherous. It will forget something was implemented somewhere else and write a duplicate version, it will hallucinate what the API shapes are, it will assume how a data field is used downstream based on its name and write something incorrect.

IMO you can still work around this and move net-faster, especially with good test coverage, but you certainly have to pay attention. Larger codebases also work better when you started them with CC from the beginning, because it's older code is more likely to actually work how it exepects/hallucinates.

onlyrealcuzzo · 2026-04-24T20:00:51 1777060851

> My anecdata is that it heavily depends on how much of the relevant code and instructions it can fit in the context window.

Agreed, but I'm working on something >100k lines of code total (a new language and a runtime).

It helps when you can implement new things as if they're green-field-ish AND THEN implement and plumb them later.

antonvs · 2026-04-25T05:11:52 1777093912

In a well-designed system, you can point an agent at a module of that system and it's perfectly capable of dealing with it. Humans also have a limited context window, and divide and conquer is always how we've dealt with it. The same approach works for agents.

onlyrealcuzzo · 2026-04-27T18:38:10 1777315090

> In a well-designed system, you can point an agent at a module of that system and it's perfectly capable of dealing with it.

Yes, but the problem is that LLMs don't default to well-designed systems... So, you need to aggressively stay on top of them.

janalsncm · 2026-04-24T20:50:32 1777063832

How can a person reconcile this comment with the one at the root of this thread? One person says Claude struggles to even meet the strict requirements of a spec sheet, another says Claude is doing a great job and doesn’t even need specific specs?

I have my own anecdata but my comment is more about the dissonance here.

oefrha · 2026-04-25T01:35:19 1777080919

One aspect you have to consider is the differences in human beings doing the evaluation. I had a coworker/report who would hand me obvious garbage tier code with glaring issues even in its output, and it would take multiple iterations to address very specific review comments (once, in frustration, I showed a snippet of their output to my nontechnical mom and even my mom wtf’ed and pointed out the problem unprompted); I’m sure all the AI-generated code I painstakingly spec, review and fix is totally amazing to them and need very little human input. Not saying it must be the case here, that was extreme, but it’s a very likely factor.

rhubarbtree · 2026-04-25T08:50:31 1777107031

This is plausible. Assuming it’s true, we would see the adoption of vibe coding at a faster rate amongst inexperienced developers. I think that’s true.

A counterpoint is Google saying the vast majority of their code is written by AI. The developers at Google are not inexperienced. They build complex critical systems.

But it still feels odd to me, this contradiction. Yes there’s some skill to using AI but that doesn’t feel enough to explain the gap in perception. Your point would really explain it wonderfully well, but it’s contradicted by pronouncements by major companies.

One thing I would add is that code quality is absolutely tanking. PG mentioned YC companies adopted AI generated code at Google levels years ago. Yesterday I was using the software of one such company and it has “Claude code” levels of bugginess. I see it in a bunch of startups. One of the tells is they seem to experience regressions, which is bizarre. I guess that indicates bugs with their AI generated tests.

SpaceNoodled · 2026-04-25T14:55:15 1777128915

You don't think Sundar would do that, just go on the Internet and tell lies?

tclancy · 2026-04-25T02:56:20 1777085780

This is magical because you are both on the exact right path and not right. My theory is there’s a sort of skill to teasing code from AI (or maybe not and it’s alchemy all over again) and this is all new enough and we don’t have a common vocabulary for it that it’s hard for one person who is having a good experience and one person who is not to meaningfully sort out what they are doing differently.

Alternatively, it could be there’s a large swath of people out there so stupid they are proud of code your mom can somehow review and suggest improvements in despite being nontechnical.

onlyrealcuzzo · 2026-04-27T18:42:06 1777315326

> This is magical because you are both on the exact right path and not right. My theory is there’s a sort of skill to teasing code from AI (or maybe not and it’s alchemy all over again) and this is all new enough and we don’t have a common vocabulary for it that it’s hard for one person who is having a good experience and one person who is not to meaningfully sort out what they are doing differently.

I don't think this is a hypothesis.

Outside of asking for one-shot tasks that have been done a million times before, LLMs do not "default" to good work.

If you ask them over-and-over again to find holes in their solution, to fix them, to evaluate for tech debt, to test all cases, to re-asses after the cases if it's architecturally coherent, to compare to the closest available known good implementations, etc etc, they can eventually get what you want done unbelievably cheaply to an acceptable level of quality.

I mentioned initially - their work is unbelievably cheap, you should be EAGER to reject it. Most people wouldn't even bend down to pick a penny up off the sidewalk. They can literally pump out CLs for a penny. You shouldn't even waste time looking at "I'm done" until they've gone through 10+ rounds of reviews, refactors, bug fixes, thought of more test cases, compared to known implementations, etc.

Why are you going to spend ~$50-$100+ of your time reviewing $0.01 of LLM time?! It makes no sense!

If you just listen to them say "I'm done" and move on to their next task, it won't take too many days before you're swimming in a sea of incoherent garbage.

DennisP · 2026-04-25T02:19:26 1777083566

I just read Steve Yegge's book Vibe Coding, and he says learning to use AI effectively is a skill of its own, and takes about a year of solid work to get good at it. It will sometimes do a good job and other times make a mess, and he has a lot of tips on how to get good results, but also says a lot of it is just experience and getting a good feel for when it's about to go haywire.

sarchertech · 2026-04-24T21:33:41 1777066421

One person is rigorously checking to see if Claude is actually following the spec and one person isn’t?

hunterpayne · 2026-04-24T22:18:48 1777069128

One is getting paid by a marketing department program and the other isn't. Remember how much has been spent making LLMs and they have now decided that coding is its money maker. I expect any negative comment on LLM coding to be replied to by at least 2 different puppets or bots.

riquito · 2026-04-24T22:33:15 1777069995

Then you should expect any positive comment to be replied negatively by a competition's puppet or bot too

SpaceNoodled · 2026-04-25T14:57:21 1777129041

Not necessarily; rising tide and all that. When a new scam like this emerges, it behooves all of the grifters to cooperate and not muddy the waters with distrust.

sarchertech · 2026-04-25T15:54:18 1777132458

I’m normally very skeptical of conspiracy theories. But saw an AI booster bot responding to a negative AI post I made here.

Someone pointed out to me in the comments that the username had posted long replies to 3 completely different threads in the same minute. That and looking back at its post history confirmed it was a bot.

flyinglizard · 2026-04-24T22:06:07 1777068367

... or one person has a very strong mental model of what he expects to do, but the LLM has other ideas. FWIW I'm very happy with CC and Opus, but I don't treat it as a subordinate but as a peer; I leave it enough room to express what it thinks is best and guide later as needed. This may not work for all cases.

sarchertech · 2026-04-24T22:40:28 1777070428

If you don’t have a very strong mental model for what you are working on Claude can very easily guide in you into building the wrong thing.

For example I’m working on a huge data migration right now. The data has to be migrated correctly. If there are any issues I want to fail fast and loud.

Claude hates that philosophy. No matter how many different ways I add my reasons and instructions to stop it to the context, it will constantly push me towards removing crashes and replacing them with “graceful error handling”.

If I didn’t have a strong idea about what I wanted, I would have let it talk me into building the wrong thing.

Claude has no taste and its opinions are mostly those of the most prolific bloggers. Treating Claude like a peer is a terrible idea unless you are very inexperienced. And even then I don’t know if that’s a good idea.

timr · 2026-04-25T01:28:41 1777080521

> Claude has no taste and its opinions are mostly those of the most prolific bloggers.

I often think that LLMs are like a reddit that can talk. The more I use them, the more I find this impression to be true - they have encyclopedic knowledge at a superficial level, the approximate judgement and maturity of a teenager, and the short-term memory of a parakeet. If I ask for something, I get the statistical average opinion of a bunch of goons, unconstrained by context or common sense or taste.

That’s amazing and incredible, and probably more knowledgeable than the median person, but would you outsource your thinking to reddit? If not, then why would you do it with an LLM?

prmph · 2026-04-25T09:35:07 1777109707

> they have encyclopedic knowledge at a superficial level, the approximate judgement and maturity of a teenager, and the short-term memory of a parakeet. If I ask for something, I get the statistical average opinion of a bunch of goons, unconstrained by context or common sense or taste.

Love this paragraph; it's exactly how I feel about the LLMs. Unless you really know what you are doing, they will produce very sub-optimal code, architecturally speaking. I feel like a strong acumen for proper software architecture is one of the main things that defines the most competent engineers, along with naming things properly. LLMs are a long, long way from having architectural taste

flyinglizard · 2026-04-25T10:33:04 1777113184

Try asking to review your code as if it were Linus Torvalds. No, really.

sarchertech · 2026-04-25T15:56:58 1777132618

I’ve tried that. I’ve experimented with a whole council of 13 personas including many famous developers. It’s definitely different. But it’s hasn’t performed significantly better in my tests.

datavirtue · 2026-04-25T11:44:16 1777117456

Holding it wrong.

oops · 2026-04-24T23:53:42 1777074822

That’s interesting to hear as for me Claude has been quite good about writing code that fails fast and loud and has specifically called it out more than once. It has also called out code that does not fail early in reviews.

sarchertech · 2026-04-25T16:00:35 1777132835

If you add a single space to a prompt, you’ll get a completely different output, so it’s no surprise that feeding entirely different programs into the prompt produces radically different output.

My guess is that there must be something about the language(go) or the domain (a data migration tool that uses Kafka) that triggers this.

flyinglizard · 2026-04-25T10:32:05 1777113125

You're right, data migration is a specific case where you have a very strong set of constraints.

I, on the other hand, am doing a new UI for an existing system, which is exactly where you want more freedom and experimentation. It's great for that!

aforwardslash · 2026-04-25T01:01:20 1777078880

Have you created a plan where the requisite is not to bother you with x and y, and to use some predetermined approach? What you describe sometimes happens to me, but it happens less when its part of the spec.

sarchertech · 2026-04-25T16:02:23 1777132943

Yes. That’s one of the things included in this.

> No matter how many different ways I add my reasons and instructions to stop it to the context

justinclift · 2026-04-25T05:36:16 1777095376

> it will constantly push me towards removing crashes and replacing them with “graceful error handling”.

Is it generating JS code for that?

sarchertech · 2026-04-25T16:01:06 1777132866

No this is a kafka consumer written in go.

stevenicr · 2026-04-26T09:07:10 1777194430

I think it matters what the project and tech stack is, and how much you try to get done before starting a fresh chat.

I've had interesting chats where it explained that it's choice of tailwind for example was because it had a ton of training knowledge on it.

I've also had it try to build more in one chat than it should many times.

For some reason openai codex handles building too much without failing better - but that is total anecdata from my particular projects and ymmv.

I've had these things try to build big when a little nudge gets them to change direction and not build so much. Explain which libraries and such and asking it to change the tech stack and the steps to build at once seem to make things much better for my use cases.

Also running extra checks and cleanup later is a thing, that sure a human might have seen an obvious thing at time of build, but we have bigger memory context comparatively imho.

mojuba · 2026-04-25T07:00:42 1777100442

I think it depends on both the complexity and the quality bars set by the engineer.

From my observations, generally AI-generated code is average quality.

Even with average quality it can save you a lot of time on some narrowly specialized tasks that would otherwise take you a lot of research and understanding. For example, you can code some deep DSP thingie (say audio) without understanding much what it does and how.

For simpler things like backend or frontend code that doesn't require any special knowledge other than basic backend or frontend - this is where the bars of quality come into play. Some people will be more than happy with AI generated code, others won't be, depending on their experience, also requirements (speed of shipping vs. quality, which almost always resolves to speed) etc.

sameerds · 2026-04-25T03:33:46 1777088026

It could just be that each of the two reviewers is merely focussing on different sides of the same coin? I use Claude all the time. It saves me a lot of effort that I would have otherwise spent in looking up specific components. The magically autocompleted pieces of boilerplate are a tangible relief. It also catches issues that I missed. But when it is wrong, it can be subtly or embarassingly or spectacularly wrong depending on the situation.

justinclift · 2026-04-25T05:37:42 1777095462

Note that one person is mentioning they use Claude Sonnet, which is less capable than the higher tiers (Opus, etc).

aforwardslash · 2026-04-25T00:58:42 1777078722

It boils down to scope. I use CC in both very specific one-language systems and broad backend-frontend-db-cache systems. You can guess where the difficulty lies. (Hint: its the stuff with at least 3 distinct languages)

ghurtado · 2026-04-24T19:38:28 1777059508

> basically no "specs" - just giving it coherent sane direction

This is one variable I almost always see in this discussion: the more strict the rules that you give the LLM, the more likely it is to deeply disappoint you

The earlier in the process you use it (ie: scaffolding) the more mileage you will get out of it

It's about accepting fallability and working with it, rather than trying to polish it away with care

phatskat · 2026-04-24T20:23:36 1777062216

To me this still feels like it would be a net negative. I can scaffold most any project with a language/stack specific CLI command or even just checking out a repo.

And sure, AI could “scaffold” further into controllers and views and maybe even some models, and they probably work ok. It’s then when they don’t, or when I need something tweaked, that the worry becomes “do I really understand what’s going on under the hood? Is the time to understand that worth it? Am I going to run across a small thread that I end up pulling until my 80% done sweater is 95% loose yarn?”

To me the trade-off hasn’t proven worth it yet. Maybe for a personal pet project, and even then I don’t like the idea of letting something else undeterministically touch my system. “But use a VM!” they say, but that’s more overhead than I care for. Just researching the safest way to bootstrap this feels like more effort than value to me.

Lastly, I think that a big part of why I like programming is that I like the act of writing code, understanding how it works, and building something I _know_.

michaelmrose · 2026-04-25T00:47:54 1777078074

A lot of the benefit of scaffolding is building basic context which you can also build by feeding it the files produced by whatever CLI tool and talk about it forcing it to think for lack of a better word about your design. You can also force feed it design and api documentation. If you think that you have given it too much you are almost certainly wrong.

Doing nonsensical things with a library feed it the documentation still busted make it read the source

prmph · 2026-04-24T20:44:43 1777063483

But, how do you know the code is good?

If you do spot checks, that is woefully inadequate. I have lost count of the number of times when, poring over code a SOTA LLM has produced, I notice a lot of subtle but major issues (and many glaring ones as well), issues a cursory look is unlikely to pick up on. And if you are spending more time going over the code, how is that a massive speed improvement like you make it seem?

And, what do you even mean by 10x the amount of work? I keep saying anybody that starts to spout these sort of anecdotes absolutely does NOT understand real world production level serious software engineering.

Is the model doing 10x the amount of simplification, refactoring, and code pruning an effective senior level software engineer and architect would do? Is it doing 10x the detailed and agonizing architectural (re)work that a strong developer with honed architectural instincts would do?

And if you tell me it's all about accepting the LLM being in the driver's seat and embracing vibe coding, it absolutely does NOT work for anything exceeding a moderate level of complexity. I used to try that several times. Up to now no model is able to write a simple markdown viewer with certain specific features I have wanted for a long time. I really doubt the stories people tell about creating whole compilers with vide coding.

If all you see is and appreciate that it is pumping out 10x features, 10x more code, you are missing the whole point. In my experience you are actually producing a ton of sh*t, sorry.

hirvi74 · 2026-04-24T21:28:25 1777066105

> But, how do you know the code is good?

Honestly, this more of a question about scope of the application and the potential threat vectors.

If the GP is creating software that will never leave their machine(s) and is for personal usage only, I'd argue the code quality likely doesn't matter. If it's some enterprise production software that hundreds to millions of users depend on, software that manages sensitive data, etc., then I would argue code quality should asymptotically approach perfection.

However, I have many moons of programming under my belt. I would honestly say that I am not sure what good code even is. Good to who? Good for what? Good how?

I truly believe that most competent developers (however one defines competent) would be utterly appalled at the quality of the human-written code on some of the services they frequently use.

I apply the Herbie Hancock philosophy when defining good code. When once asked what is Jazz music, Herbie responded with, "I can't describe it in words, but I know it when I hear it."

sarchertech · 2026-04-24T22:02:51 1777068171

> I apply the Herbie Hancock philosophy when defining good code. When once asked what is Jazz music, Herbie responded with, "I can't describe it in words, but I know it when I hear it."

That’s the problem. If we had an objective measure of good code, we could just use that instead of code reviews, style guides, and all the other things we do to maintain code quality.

> I truly believe that most competent developers (however one defines competent) would be utterly appalled at the quality of the human-written code on some of the services they frequently use.

Not if you have more than a few years of experience.

But what your point is missing is the reason that software keeps working in the fist, or stays in a good enough state that development doesn’t grind to a halt.

There are people working on those code bases who are constantly at war with the crappy code. At every place I’ve worked over my career, there have been people quietly and not so quietly chipping away at the horrors. My concern is that with AI those people will be overwhelmed.

They can use AI too, but in my experience, the tactical tornadoes get more of a speed boost than the people who care about maintainability.

hirvi74 · 2026-04-25T04:44:52 1777092292

I had a long reply to your comment, then decide it was not truly worth reading. However, I do have one question remaining:

> the tactical tornadoes get more of a speed boost than the people who care about maintainability.

Why are these not the same people? In my job, I am handed a shovel. Whatever grave I dig, I must lay in. Is that not common? Seriously, I am not being factious. I've had the same job for almost a decade.

sarchertech · 2026-04-25T10:52:54 1777114374

That’s because you’ve been there a decade. It’s very common for people to skip jobs every 2 years so that they never end up seeing the long term consequences of their actions.

The other common pattern I’ve seen goes something like this.

Product asks Tactical Tornado if they can building something TT says sure it will take 6 weeks. TT doesn’t push back or asks questions, he builds exactly what product asks for in an enormous feature branch.

At the end of 6 weeks he tries to merge it and he gets pushback from one or more of the maintainability people.

Then he tells management that he’s being blocked. The feature is already done and it works. Also the concerns other engineers have can’t be addressed because “those are product requirements”. He’ll revisit it later to improve on it. He never does because he’s onto the next feature.

Here’s the thing. A good engineer would have worked with product to tweak the feature up front so that it’s maintainable, performant etc…

This guy uses product requirements (many that aren’t actually requirements) and deadlines to shove his slop through.

At some companies management will catch on and he’ll get pushed out. At other companies he’ll be praised as a high performer for years.

datavirtue · 2026-04-25T11:48:48 1777117728

Way better than the random India dev output. I seriously don't know what everyone around here is doing. All I see are complaints while I produce the output of ten devs. Clean code, solid design.

Spend a few hours writing context files. Spend the rest of the week sipping bourbon.

sarchertech · 2026-04-25T12:57:59 1777121879

So what have you released?

10x means you could have built something that would have taken 4 or 5 years in the time you've had since Opus 4.5 came out.

Where's your operating system, game engine, new programming language, or complex SaaS app?

pighive · 2026-04-26T21:12:52 1777237972

Curious to know what you are using $200/mo CC for? New applications? Business? What kind of application needs to run 2-3 clients run 24/7? How are you coming up with features. Just trying to understand the workflow.

sameerds · 2026-04-25T03:15:06 1777086906

> I can still run 2-3 clients almost 24/7 pumping out features.

Honest question. How does one do that? My workflow is to create one git worktree per feature and start one session per worktree. And then I spent two hours in a worktree talking to Opus and reviewing what it is doing.

Peritract · 2026-04-25T11:17:08 1777115828

> It's regularly writing systems-level code that would take me months to write by hand in hours, with minimal babysitting

Has your output kept pace with the code? Because months in hours means, even pushing those ratios quite far, to be years in days.

Has your roadmap accelerated multiple years in the last few months in terms of verifiable results?

kobe_bryant · 2026-04-24T21:16:22 1777065382

months you say? how incredible. it beggars belief in fact

hirvi74 · 2026-04-24T21:15:01 1777065301

Not sure about ChatGPT, but Claude was (is still?) an absolute ripper at cracking some software if one has even a little bit of experience/low level knowledge. At least, that's what my friend told me... I would personally never ever violate any software ToA.

buredoranna · 2026-04-24T19:50:36 1777060236

> the whole thing being built on copyright infringement

I am not a lawyer, but am generally familiar with two "is it fair use" tests.

1. Is it transformative?

I take a picture, I own the copyright. You can't sell it. But if you take a copy, and literally chop it to pieces, reforming it into a collage, you can sell that.

2. Does the alleged infringing work devalue the original?

If I have a conversation with ai about "The Lord of the Rings". Even if it reproduces good chunks of the original, it does not devalue the original... in fact, I would argue, it enhances it.

Have I failed to take into account additional arguments and/or scenarios? Probably.

But, in my opinion, AI passes these tests. AI output is transformative, and in general, does not devalue the original.

taikahessu · 2026-04-24T20:01:12 1777060872

In order for LLM to be useful, you need to copy and steal all of the work. Yes, you can argue you don't need the whole work, but that's what they took and feed it in.

And they are making money off of other people's work. Sure, you can use mental jiujutsu to make it fair use. But fair use for LLMs means you basically copy the whole thing. All of it. It sounds more like a total use to me.

I hope the free market and technology catches up and destroys the VC backed machinery. But only time will tell.

ragequittah · 2026-04-24T20:44:14 1777063454

I always wonder if anyone out there thinks they're not making money off of other people's work. If you're coding, writing a fantasy novel, taking a photograph or drawing a picture from first principals you came up with yourself I applaud you though.

taikahessu · 2026-04-24T20:58:19 1777064299

You are absolutely right.

Seriously though, I do think that is the case. It would be self-righteous to argue otherwise. It's just the scale and the nature of this, that makes it so repulsive. For my taste, copying something without permission, is stealing. I don't care what a judge somewhere thinks of it. Using someone's good will for profit is disgusting. And I hope we all get to profit from it someday, not just a select few. But that is just my opinion.

IcyWindows · 2026-04-24T23:59:24 1777075164

This kind of thinking seems like a road for people to have to pay a license for the rest of their life after going to school for the knowledge they "stole" from their textbooks.

taikahessu · 2026-04-25T01:58:15 1777082295

Except the school paid royalties for that specific book. Every book. The money was distributed. Writers, publishers and so on. The normal stuff.

Or if you had to buy the book yourself, same thing, distributed, royalties paid.

IcyWindows · 2026-04-25T02:17:45 1777083465

So your complaint is that they didn't pay for training data by buying every book found online?

That does seem more reasonable, but makes public libraries also evil.

taikahessu · 2026-04-25T11:28:12 1777116492

Except the libraries pay the fees of the books, they only serve a dedicated local region of people and by loaning a book, you will know the author of the book.

For LLMs the transformative part is then removing the copyright info and serving it to you as OpenAI whatever.

Sure, you can query multiple books at the same time and the technology is godlike. But the underlying issue remains. Without the original content, the LLM is useless. Someone took all the books, feed them in and didn't pay anything back to the authors.

I'm not sure whether arguing in good faith here. This information you could easily check for yourself too. The problem is not the information itself. It's the massive machinery that steals all the works and one day we are staring at the paywall. And the artists are still not funded. I'd rather just do something nice offline in the future.

IcyWindows · 2026-04-26T19:44:01 1777232641

I'm talking about the knowledge people "steal" by reading. LLMs and humans both absorb knowledge by reading. You want to tax using that knowledge that was absorbed.

It will be applied to people soon after.

ragequittah · 2026-04-27T04:50:13 1777265413

This reminds me of what happened around the time I hit year 3 in school. You could no longer buy used textbooks like everyone did from time immemorial because there was online drm making sure you had the latest textbook to take the latest quiz. I'm sure it's got even worse in the 20 years since.

ragequittah · 2026-04-25T06:34:43 1777098883

I understand but I think this will be quite a quaint idea soon in all honesty. Imagine these things are able to progress the world of science, math, physics, and whatever else (they already are) and we stopped them because someone didn't make enough royalties first. That to me would be more repulsive. We stop/slow the progress of all humanity because there wasn't enough temporary gain for x individual who wrote y book. And if it all turns out to be bogus nonsense then I doubt x individual who wrote y book loses much in the process anyway.

taikahessu · 2026-04-25T11:38:15 1777117095

Yeah, it's not an easy puzzle piece. How far are we going to go in the name of science and progress again? Are you buying it, that it's all for the greater good? Quite a lot of money involved here. Everyone wants a piece of it. But I digress. Dropping the big bomb, stealing the lands and riches of the natives, using slaves and colonies to power the whole civilization into a new era might be powerful and efficient. But it doesn't make it right. I don't buy the narrative. Do no evil until you can no longer say no?

ragequittah · 2026-04-25T19:04:22 1777143862

I think comparing intellectual property theft to slavery and stealing land is where I start leaning towards the argument being absurd. The stolen books are still on store shelves. People are likely still buying them at about the same rate as before.

And as far as it being for the greater good that seems to be the promise of many of these companies. What will inevitably get in the way is greed and money, the very same reasons we're arguing about IP theft. Good or bad I see no way out of this but through at this point.

jjwiseman · 2026-04-24T20:00:47 1777060847

And in Bartz v. Anthropic, the court found that Anthropic training their LLMs on books was "highly transformative."

verve_rat · 2026-04-25T04:58:02 1777093082

The US is not the only legal jurisdiction these services are being sold in.

idiotsecant · 2026-04-24T21:03:35 1777064615

This is a tiresome and well trod road.

The fact of the matter is that for profit corporations consumed the sum knowledge of mankind with the intent to make money on it by encoding it into a larger and better organized corpus of knowledge. They cited no sources and paid no fees (to any regular humans, at least).

They are making enormous sums of money (and burning even more, ironically) doing this.

If that doesn't violate copyright, it violates some basic principle of decency.

michaelmrose · 2026-04-25T00:50:53 1777078253

You are assuming intellectual property has intrinsic basis when it's at best functional not foundational. It's only useful if the net value to society is positive which is extremely dubious.

idiotsecant · 2026-04-25T05:48:25 1777096105

I'm assuming human creativity has intrinsic value, or what's the point of being human?

michaelmrose · 2026-04-25T16:16:52 1777133812

You are assuming that somehow human creativity was born with intellectual property and will somehow die with it. It's just not so.

idiotsecant · 2026-04-25T18:25:09 1777141509

Ok captain pedant, instead of making vague handwavey negations exclusively how about you say something.

michaelmrose · 2026-04-25T22:40:20 1777156820

Intellectual property is supposed to feed creativity by securing for creators exclusive rights to benefit from their creation. It mostly feeds uncreative leaches whose business it is to own things in exchange for crumbs for the creativity and drags down both the inherent enjoyment of the fruits of creativity and even its creation. It belonged in the bin back when we first thought of it as is only going to be more unfit for purpose as time goes on.

Madmallard · 2026-04-24T22:28:40 1777069720

What in the mental gymnastics?

They just stole everyone's hard work over decades to make this or it wouldn't have been useful at all.

NewsaHackO · 2026-04-25T00:56:02 1777078562

That's a statement. The comment you are replying to had actual reasoning behind his claim. Do you have any actual reasoning behind yours?

Madmallard · 2026-04-25T12:42:49 1777120969

Let's not ignore the entirety of reality and what has been going on for the last few years to defend a pestilence on mankind you probably have stock invested in. I'm not going to acknowledge how insane of an argument that is you're making. It's like you heard of zero leaks, zero law suits, zero open source complaints. Zero anything. Just either intentionally or unintentionally astroturfing.

Thanks.

gwerbin · 2026-04-23T13:38:47 1776951527

Sorta? Python has fairly strong types but it's no fun debugging a `None has no attribute foo` error deep inside some library function with a call site 1000 LoC away from the actual place where the erroneous None originally arose, due to a typo.

It's not just Python too, I've hit the same issue in Common Lisp.

Yes one can run contracts and unit tests and static analysis, but what's a type checker anyway other than a very strict static analysis tool?