More

gck1 · 2026-05-07T22:39:32 1778193572

I sit next to my 4U server with all enterprise components apart from fans - these are consumer grade.

I had to mod the chassis slightly (with just pliers, tape and random inserts) to fit these fans in there, and add fans in front to push the air in. The PSU that came with it was obnoxiously loud, but thankfully, Supermicro has a quiet version that I can't even hear. Even if SM didn't have this PSU, I could have easily modified the PSU and fit some noctuas in there without any issue or safety concerns - like I did with my enterprise grade Mikrotik switch that also had obnoxious fans by default.

I even have an enterprise grade UPS that is dead silent when it's not running on battery power (I swapped the fans there too).

I essentially try to buy enterprise gear whenever possible. Not only is it usually much better than the consumer alternative, but it also is frequently much cheaper too because of second hand market. Before AI sucked the soul out of the hardware market in general, you could have bought enterprise SSDs that had life expectancy - TBW - measured in petabytes, and MTFB - practically never - for half the price of the top consumer SSD that had TBW measured in tens of TB and MFTB of yesterday.

And the entire rack is just slightly more louder than the PC I was using.

The only consumer grade computer at my home is my MacBook and my phone.

lmz · 2026-05-07T23:13:38 1778195618

Enterprise SSDs are all that. Just make sure you power it up. For data retention without power the requirements are 3mo for enterprise vs 1yr for consumer grade.

gck1 · 2026-05-07T22:00:59 1778191259

As someone who went full circle prompt-enforcement > deterministic flow > prompt-enforcement, I disagree.

The reason why "DO NOT SKIP" fails is because your agent is responsible for too many things and there's things in context that are taking away the attention from this guidance.

But nobody said the agent that does enforcement must be the same agent that builds. While you can likely encode some smart decision making logic in your deterministic control flow, you either make it too rigid to work well, or you'll make it so complex that at that point, you might as well just use the agent, it will be cheaper to setup and maintain.

You essentially need 3 base agents:

- Supervisor that manages the loop and kicks right things into gear if things break down

- Orchestrator that delegates things to appropriate agents and enforces guardrails where appropriate

- Workers that execute units of work. These may take many shapes.

ex-aws-dude · 2026-05-07T23:14:07 1778195647

Exactly, just keep adding more agents

SrslyJosh · 2026-05-07T23:56:40 1778198200

I can't tell if this is satire or not. Well done!

dnnddidiej · 2026-05-08T13:52:59 1778248379

It a heisenberg satire because more agents going wild is indeed horrible but agents restricting and counterbalancing each other can be useful (token cost ignored!).

baxtr · 2026-05-08T06:40:55 1778222455

I think the key question is: How can you be sure the supervisor/orchestrator agents are reliable? You are just pushing the complexity down into another layer.

dnnddidiej · 2026-05-08T13:51:06 1778248266

You can't be sure but the point is you can be more sure, since agent 2 ("agent" which is really just a fancy way of saying some code that calls anthropics api) has only the context to look for a violation of a single rule.

gck1 · 2026-05-07T20:31:22 1778185882

They'll need a lot of google-certified phones then. And each phone will only be able to do so many verifications until the unique, cryptographically secure ID gets banned by Google.

Google already killed SMS verification market specifically for Google accounts because they reversed the verification from receiving to sending the SMS. Almost a year after, no SMS verification service that made a killing on this is offering an alternative.

So yes, this will definitely affect the captcha solving services.

gck1 · 2026-05-07T20:12:12 1778184732

I live in a small European country. It's not a shithole, but not on everyone's radar either (we got Google Pay just 3 years ago) and I tried to create a new Google account recently.

It asked me to scan the QR code for verification and I'm guessing it tied that account to my device ID because it opened the Google app and added that new account to my device without my approval.

As a fallback (i.e. no attestation or play services), QR code will send SMS to some short code. Well, it turns out that for my country of a few million people, that number simply does not work on 3/3 mobile providers.

I guess Google just doesn't care anymore if it blocks access to their services or in the OP case, all services that use their services to millions of people if they don't fit a particular profile and have a particular device and agree to have all their internet browsing tied to a static ID that Google controls.

How will this work for iPhone? Doesn't Apple restrict such behavior?

gck1 · 2026-05-06T23:58:44 1778111924

This happened on every single greeenfield project that I've started with AI, no matter how rigorous process I've had defined.

And it's not just easier because it's cheap, it's easier because you're not emotionally attached to that code. Just let it produce slop, log what worked, what didn't, nuke the project and start over.

It just gets incredibly boring.

deadbabe · 2026-05-07T00:43:39 1778114619

People will get attached to code that works just right and they don’t want to mess with it too much.

gck1 · 2026-05-06T23:40:43 1778110843

Privacy and security from government overreach is not enough?

mschuster91 · 2026-05-07T08:14:37 1778141677

What privacy? Enough drug dealers have already been busted with solid evidence from trailing the paths on public blockchains.

gck1 · 2026-05-06T23:29:55 1778110195

> The first thing I do with new agent driven project is set up quality checks. Linters, test frameworks, static analysis, etc

I do this too, but then I sit and observe how agent gets very creative by going around all of these layers just to get to the finish line faster.

Say, for example, if I needlessly pass a mutable reference and the linter screams at me, I know it's either linter is wrong in this case, or I should listen to it and change the signature. If I make the lazy choice, I will be dissatisfied with myself, I might even get scolded, or even fired if I keep making lazy choices.

LLM doesn't get these feelings.

LLM will almost always go for silencing it because it prevents it from reaching the 'reward'. If you put guardrails so that LLM isn't allowed to silence anything, then you get things like 'ok, I'll just do foo.accessed = 1 to satisfy the linter'.

Same story with tests. Who decides when it's the test that should be changed/deleted or the implementation?

Daishiman · 2026-05-07T04:39:13 1778128753

> Same story with tests. Who decides when it's the test that should be changed/deleted or the implementation?

Claude is remarkably good at figuring this is out. I asked it to look at a failing test in a large and messy Python codebase. It found the root cause and then asked whether the failure was either a regression or an insufficiently specified test, performed its own investigation, and found that the test harness was missing mocks that were exposed by the bug fix.

It has become amazingly good at investigating.

gck1 · 2026-05-07T17:05:19 1778173519

If you point it at a specific thing and ask a specific question, yes, it will figure it out.

But I never have "fix this test" as a task. What happens when you task it with a feature implementation and test breaks in the middle of the session? It will not behave the same way.

bitexploder · 2026-05-07T17:54:42 1778176482

You have to not "stress" the agents out over testing. If a gate is no failing tests they cheat. If the gate is triage failing tests, quantify risk of failing test, prioritize in next work cycles... agents behave amazingly better at cheating tests.

gck1 · 2026-05-06T20:46:18 1778100378

Limits were the last straw that made me cancel my subscription and make my workflow completely model agnostic with pi.

While this is good news, I'm not coming back. Anthropic just lost me with too many wrongs in too short of a time period.

Opus has been replaced with GPT 5.5, DeepSeek, Kimi, Qwen and they all allow me to use my own, single harness and switch models easily if any of them start treating me the same.

sergiotapia · 2026-05-07T00:20:33 1778113233

I wouldn't make any grand stand declarations like this honestly. The models themselves are all hot swappable with minimum effort. The AI labs american or chinese don't really have a moat. Today anthropic is bad and openai is good. Last month it was the other way around. Next month it may be google.

The only certainty is that you can swap models quickly and painlessly.

farfatched · 2026-05-06T23:54:19 1778111659

Same, though I'm reconsidering, in light of the recent bugs (which can happen to any provider) and the increased limits. I guess that's at least 3x more Opus for my usecase.

gck1 · 2026-04-29T20:41:06 1777495266

IIRC they're also doing integrity checks on the binary, so this could theoretically get your account banned.

gck1 · 2026-04-28T06:19:35 1777357175

I don't use my M3 Max in public, but running local models on it makes me uncomfortable because of fans too.

I also have a Dell laptop that spin up fans if I open a text editor, and it feels normal, but on Mac, I feel like fans spinning up is me somehow abusing it and shortening its lifespan.

aitchnyu · 2026-04-28T07:44:14 1777362254

Reminds me of a previous generation (plastic shell) MBP drawing both AC and battery for sustained peak performance, fans spinning.