More

keeda · 2026-04-18T00:16:37 1776471397

They (and other AI players) have been using WAU over DAU for all their metrics, and many have questioned why. But if you look at other data sources of AI adoption, the reason is clear: Even while 56% of Americans now "regularly" use GenAI on a weekly basis, a much smaller percentage 10 - 14% use it on a daily basis. Here's one source but others had similar numbers: https://www.genaiadoptiontracker.com/

56% is much more impressive than 14%.

This may look bad until you consider that all of them are already desperately strapped for compute. I think the lower DAU is due to a combination of that and people still figuring out how to use AI.

keeda · 2026-04-17T23:16:05 1776467765

It still takes 3 - 5 years or more even for that incremental progress. It takes years to just catch up on the field! Do we expect PhD candidates to subsist on barely livable wages until they eventually publish a ground-breaking result? That kind of disincentive to even start a PhD would not be conducive at all to progress.

Yes, most PhD theses are scientific and commercial dead-ends (even more reason not to gate the degree on ground-breaking results!) but they do serve to cull the problem space, and that's exactly why we need more of them. In fact we should even provide some incentives to publish negative results in academia.

keeda · 2026-04-17T19:38:38 1776454718

> We're seeing exactly the same thing with AI, as there is massive investment creating a bubble without a payoff.
...

And so far there's no evidence that all this investment has generated more profit for the users of AI.

If you look around a bit, you will find evidence for both. Recent data finds pretty high success in GenAI adoption even as "formal ROI measurement" -- i.e. not based on "vibes" -- becomes common: https://knowledge.wharton.upenn.edu/special-report/2025-ai-a... (tl;dr: about 75% report positive RoI.)

The trustworthiness, salience and nuances of this report is worth discussing, but unfortunately reports like this gets no airtime in the HN and the media echo chamber.

Preliminary evidence, but given this weird, entirely unprecedented technology is about 3+ years old and people are still figuring it out (something that report calls out) this is significant.

0xbadcafebee · 2026-04-17T22:43:16 1776465796

75% report positive ROI (and the VPs are much more "optimistic" than the middle managers who are closer to the work) - but how much ROI? 1%? The fact that they don't quote a figure at all is pretty telling. And that's the ROI of the people buying the AI services, which are often heavily subsidized. If it costs a billion dollars to give a mid-sized company a 1% ROI, that doesn't sound sustainable.

I would love to see another report that isn't a year old with actual ROI figures...

keeda · 2026-04-18T22:04:54 1776549894

Good questions! I have only skimmed through the report but slide 45 onwards of the full report has some vague numbers: https://ai.wharton.upenn.edu/wp-content/uploads/2025/10/2025...

Can't say why they don't report exact numbers, but it may be because a) of confidentiality and b) RoI is very context dependent and c) there is a wide spectrum of RoI by different dimensions, including some 9% even reporting negative RoI. This may make it hard to cite a single number, but the majority report "moderate" to "significant" RoI, whatever that means to them.

I'll add that I've seen mentions of similar reports from other sources like McKinsey and co. e.g. this one that claims actual revenue increase: https://www.mckinsey.com/featured-insights/week-in-charts/ge... -- I tend not to take these reports at face value, but I'm seeing multiple of them from various sources that tend to align.

As an aside, I just wanted to say, these are the kinds of discussions I was hoping to see here!

chatmasta · 2026-04-18T00:29:43 1776472183

It’s not easy to quantify because you’re basically substituting or augmenting labor. How do you quantify an ROI on employees? You can look at profit of a project they’re hired to execute. But with AI, it’s mixed with the employees, so how do you distinguish the ROI of the two? With time, we might be able to make comparisons, but outside of very specific scenarios it’s difficult to quantify.

lazide · 2026-04-18T02:52:49 1776480769

Everyone I’ve seen try has had negative actual ROI.

All the middle managers are afraid to say anything though, so go go go.

SlinkyOnStairs · 2026-04-17T22:52:51 1776466371

> The trustworthiness, salience and nuances of this report is worth discussing, but unfortunately reports like this gets no airtime in the HN and the media echo chamber.

It honestly just isn't that interesting. (Being most notable for people misunderstanding and misrepresenting the chart on page 46 of the report as being "ROI" rather than "ROI measurement")

In terms of ROI figures, it's really just a survey with the question "Based on internal conversations with colleagues and senior leadership, what has been the return on investment (ROI) from your organization's Gen AI initiatives to date?".

This doesn't mean much. It's not even dubiously-measured ROI data, it's not ROI data at all, it's just what the leadership thinks is true.

And that's a worrying thing to rely on, as it's well documented (and measured by the report's next question) that there's a significant discrepancy in how high level leadership and low-level leadership/ICs rate AI "ROI".

One of the main explanations of that discrepancy being Goodhart's law. A large amount of companies are simply demanding AI productivity as a "target" now, with accusations of "worker sabotage" being thrown around readily. That makes good economy-wide data on AI ROI very hard to get.

keeda · 2026-04-18T22:16:12 1776550572

That's fair, it is survey based but it is apparently based on formal internal measurements. The full report (https://ai.wharton.upenn.edu/wp-content/uploads/2025/10/2025... -- slides 43 onwards) mentions that for 75% of them have "integrated formal ROI measurement."

There is little discussion of what that means, however, but we really can't expect concrete numbers for what is going to be sensitive business data,and given that the report tracks it across multiple industries and functions ranging from IT to operations to legal to sales, it may be hard to put into sensible numbers, or how the measurements may be flawed or biased.

keeda · 2026-04-17T04:53:52 1776401632

They're not looking for solutions, they're capitalizing on the AI backlash. It's just the new form of rageviews.

The only saving grace is that this is less cynical than typical rageviews, considering they have something of a point in that they are going to be negatively impacted by the same technology that has been trained on their content without compensation.

keeda · 2026-04-17T04:04:24 1776398664

I suspect the cause-and-effect in creating the narrative is the reverse of what's in the narraitve: Frank Herbert wanted the intricate dynamics of the Guild and Spice and Mentats and exciting close-quarter combat for a more intriguing narrative. But AI and robots made those all obsolete, so he made it disappear with a handwave of "because Butlerian Jihad."

I always thought the Butlerian Jihad was the biggest plot hole in Dune, but I deeply appreciate the world and narrative it enabled.

keeda · 2026-04-17T03:25:19 1776396319

Eh, this has been a disappointing series to read from the person that wrote all those Jepsen write-ups and the technical interview series. All articles are mostly a regurgitation of all the negativity that gets aired here all the time (a lot of it already fixed or debunked) and 0 discussion of utility. This is more befitting Zitron than aphyr. Where's the sharp incisive wit and deep insights?

At least he realizes this technology is unlikely to slow down. With international relations as they are, it's MAD all over again, only the "D" is a fuzzy, hypothetical thing nobody can name, so even that bit of deterrence is lost. Yet finally he ends with the most uninspired advice of all: "we should try, unsuccessfully, to stop it."

Everyone must understand: for all of history, progress and productivity and value creation overall could only scale with people. Now it can scale with power and compute. This is a tremendous economic force, akin to a force of nature, that is nigh impossible to stop. (I always did think the Butlerian Jihad was the biggest plot hole in Dune.)

My advice is this: we have no choice but to adapt. We must realize that, by a stroke of luck, this is a power available to us more than the capital class. If they can scale without people, so can we. But because harnessing AI effectively requires hard skills -- at least for now -- that the capital class don't have and used to pay us for, we might even scale better than them!

Carpe diem.

simoncion · 2026-04-17T04:09:42 1776398982

> All articles are mostly a regurgitation of all the negativity that gets aired here all the time (a lot of it already fixed or debunked) and 0 discussion of utility.

There are multiple sections that talk directly about utility. Here's one of them: [0]

But, sure. I'll bite. Here's the third paragraph of the first part of the essay [1]:

  This is *bullshit* about *bullshit machines*, and I mean it. It is neither balanced nor complete: others have covered ecological and intellectual property issues better than I could, and there is no shortage of boosterism online. Instead, I am trying to fill in the negative spaces in the discourse. “AI” is also a fractal territory; there are many places where I flatten complex stories in service of pithy polemic. I am not trying to make nuanced, accurate predictions, but to trace the potential risks and benefits at play.

I'd say that the specific sort of "utility" discussion that you're probably looking for would be classified as "boosterism". [2]

> Now it can scale with power and compute.

Eh. Carefully read through and consider [3].

[0] <https://aphyr.com/posts/411-the-future-of-everything-is-lies...>

[1] <https://aphyr.com/posts/411-the-future-of-everything-is-lies...>

[2] Due to their nearly-universally breathless nature, I know that's how I classify the overwhelming majority of such discussions.

[3] <https://www.b-list.org/weblog/2026/apr/09/llms/>

keeda · 2026-04-17T06:14:44 1776406484

[0] is a throwaway paragraph that handwaves at second-hand accounts of generic things LLMs can do, with no further discussion, apparently because he (surprisingly!) has almost no first-hand experience with them. Then there are 10 pages of negativity with dozens of links to stuff that has been discussed to death here and in media. The "negative spaces" he's filling are already overflowing.

His lack of personal experience with LLMs was the most disappointing aspect, because he does not really know what we're dealing with. He's just going off what he's read / heard. So again, where's the incisive insight?

Now, here's a concrete example of what I mean by utility: a single person being able to rewrite an entire open source project from scratch in a few days just so it could be relicensed. Is that good or bad? I don't know! Is it a stupefying example of what's possible? Yes! Is that "breathless boosterism?" Only if you ignore the infinite nuances involved.

> Eh. Carefully read through and consider [3].

Hadn't come across this one before, but there's not much in there I hadn't seen and even discussed in past comments. As an example, it still mentions the METR study from 2025 without mentioning the very pertinent follow-up from just a couple of months back... which is not very surprising to me: https://news.ycombinator.com/item?id=47145601 ;-)

It does mention (and then gloss over) the real finding of the DORA and related reports, which is pertinent to my original point: LLMs are simply an amplifier of your existing software discipline. Teams with strong software discipline see amazing speedups, those with poor discipline sees increased outages.

And, to my original point, who knows what good software discipline looks like? Hint: it's not the capital class.

simoncion · 2026-04-17T08:27:29 1776414449

> His lack of personal experience with LLMs...

You missed the part where he is consistently unimpressed by the failure of LLMs to do the task he hands to them, it seems. Go re-read Section 1.5 "Models are Idiots". Make sure to read the footnotes. They're sure to address most of the counterarguments you might make.

> Is that "breathless boosterism?"

How you phrased it? Yes. It ignores the "infinite nuances involved" such as maintainability, infosec soundness of the work product, the completely untested legality of "license washing" to name a few. Also, you missed the part where I said

  Due to their nearly-universally breathless nature, I know that's how I classify the overwhelming majority of such discussions.

> Hadn't come across this one before, but there's not much in there I hadn't seen and even discussed in past comments. ... It does mention (and then gloss over) the real finding of the DORA and related reports...

Yeah, I figured that you would be unable (or unwilling) to understand this one. Here's the summary, straight from the author's keyboard:

* Fred Brooks' No Silver Bullet was correct.

* No Silver Bullet applies to LLMs the way it applied to other things, and empirical evidence on LLM coding impact sure seems to agree.

* You'll get better returns from working on strong software development fundamentals than from forcing all your programmers to use Claude for everything, and that's a repeated message in basically all the major literature.

* If LLMs do turn into a revolutionary world-changing silver bullet giving everyone coding superpowers, you'll be able to just adopt them fully when that happens.

keeda · 2026-04-17T16:18:53 1776442733

> You missed the part where he is consistently unimpressed by the failure of LLMs to do the task he hands to them...

Not really, those are exactly the things said by people who dabble with LLMs a little and turn to "breathless naysaying" without any effort to really figure out this new technology. I mean, the series literally ends with "maybe I'll try to code with it."

> Yes. It ignores the "infinite nuances involved" such as maintainability, infosec soundness of the work product, the completely untested legality of "license washing" to name a few.

Not really, I did say "Is it good or bad? I don't know!" and literally mentioned the infinite nuances. I did not want this to become a tangent about those nuances (that's what I hoped would be in TFA) but I do know that being able to write or rewrite entire projects single-handedly is tremendous utility.

> Yeah, I figured that you would be unable (or unwilling) to understand this one.

Not really, just that I've already discussed all the points in that piece in past comments with way more studies on "empirical evidence on LLM coding impact" with way more nuance. If you want to follow the threads in the comment I linked, you'll come across some of those comments.

> You'll get better returns from working on strong software development fundamentals than from forcing all your programmers to use Claude for everything, and that's a repeated message in basically all the major literature.

Not really, the repeated message in all the latest reports like DORA and DX and CircleCI (which your link mentions but glosses over) very clearly indicates that using LLMs with strong software development fundamentals (what I called "discipline") is a huge force multiplier. See point 3 of this link as a representative example: https://www.thoughtworks.com/en-us/insights/blog/generative-... For these teams, productivity will literally be proportional to their tokens rather than their devs, because each dev is so highly leveraged.

> If LLMs do turn into a revolutionary world-changing silver bullet giving everyone coding superpowers, you'll be able to just adopt them fully when that happens.

Yes, but at this point it's unlikely to be a silver bullet, and I never claimed it would be. What I am saying is that it is a huge accelerant, but needs steering by skilled operators, engineers who know the discipline but also understand how to work with AI.

And in my experience it takes a surprising amount of time and practice to learn how to leverage AI effectively.

Which aphyr clearly has not done. Which is why this series is such a disappointment.

simoncion · 2026-04-17T20:04:43 1776456283

> Not really, those are exactly the things said by people who dabble with LLMs a little...

From the footnote in section 1.5:

  The examples I give in this essay are mainly from major commercial models (e.g. ChatGPT GPT-5.4, Gemini 3.1 Pro, or Claude Opus 4.6) in the last three months; several are from late March. Several of them come from experienced software engineers who use LLMs professionally in their work. Modern ML models are astonishingly capable, and they are also blithering idiots. This should not be even slightly controversial.

I wonder just how Scottish the Scotsman has to be before you'll let him order a drink.

> And in my experience it takes a surprising amount of time and practice to learn how to leverage AI effectively.

Let's ignore -for a minute- the fact that people who actually use these things as part of their dayjobs were consulted, which moots this complaint.

Every six-ish months we hear "Wow. All the past commentary on LLMs is completely invalid. These new models aren't just a step change — they're a whole new way of working.".

If we consider only that datapoint, it's pretty obvious that you're not missing out on much by choosing to just work on skills that are universally applicable and "evergreen". But, when you add in to that the fact that every six-ish months we also hear "Wow. These new revs of the LLM products are just as stupid and nondeterministic as the old ones. They also still make the same classes of stupid mistakes, are pretty much as dangerously unreliable as they always have been [0], and -just like previous versions- have 'capability rot' that cannot be anticipated, but might be caused by inability to handle current demand, deliberate shifting of backend resources to serve newer, more-hyped LLM products, or even errors in the vibecoded vendor-supplied tooling that interfaces with the backend.", the decision to ignore the FOMO and hype becomes pretty obviously correct.

> I mean, the series literally ends with "maybe I'll try to code with it."

Well, this is how the series ends:

   The security consequences are minimal, it’s a constrained use case that I can verify by hand, and I wouldn’t be pushing tech debt on anyone else. I still write plenty of code, and I could stop any time. What would be the harm?
   
   Right?
   
   ...Right?

There's a certain subtlety to this that you missed. [2]

If we ignore that subtlety, I expect that your retort to a report that goes "Wow. They suck just as hard at coding for me as they do for everything else I've attempted to use them for. I'm not surprised because I've talked to professional programmers who regularly use these things in their dayjobs and I'm getting results that are similar to what they've been reporting to me." will be "Bro. You didn't spend enough time learning how to use it, bro!".

By way of analogy, I'll also mention -somewhat crassly- that one doesn't have to have an enormous bosom to understand that all that weight can cause substantial back pain. One can rely on both one's informed understanding of the fundamentals behind the system under consideration, as well as first-hand testimony from enormous-bosom-equipped people to arrive at that conclusion.

[0] eg. [1] and many, many other examples

[1] <https://github.com/anthropics/claude-code/issues/39201>

[2] Your failure to notice that subtlety makes me wonder how often you use LLMs to summarize lengthy technical articles that you read.

keeda · 2026-04-17T21:32:55 1776461575

>. Several of them come from experienced software engineers who use LLMs professionally in their work.

So, not from personal experience. And we don't know which examples came from which users or what they used them for. We get enough hearsay on HN and again, there's nothing in this series that has not been discussed here. There is however, a ton of other hearsay missing in the series, which is the utility so many people are finding (in many cases, along with actual data or open source projects.)

> Every six-ish months we hear ...

I've been yelling about LLMs since early 2024 [0]! They needed much more "holding it right" back then. Now it's way easier, but the massive potential was clear way back then.

> They also still make the same classes of stupid mistakes, are pretty much as dangerously unreliable as they always have been.

Yes, and this is where a lot of the skill in managing them comes into play. Hint: people are dangerously unreliable too.

> One can rely on both one's informed understanding of the fundamentals behind the system under consideration, as well as first-hand testimony from enormous-bosom-equipped people to arrive at that conclusion.

Of course, but when faced with many contradictory opinions, I prefer data. And the preponderence of data I've looked at and discussed [0] paints a very different picture.

> There's a certain subtlety to this that you missed.

From TFA:

> I want to use them. I probably will at some point.

My complaint is that he is speaking entirely from second-hand information and provides no new insight of his own. That he has trepidations to actually get his hands dirty with them does not change it, and only makes it worse that he spent 10 pages going on about them! He's a technologist, not a journalist! So, I'm genuinely curious, what subtlety did I miss?

[0] Available in my comment history. To allay suspicion that I only engage in breathless boosterism, some relevant comments about the negatives: https://news.ycombinator.com/item?id=47405189 or https://news.ycombinator.com/item?id=46830919

bdangubic · 2026-04-17T20:11:06 1776456666

[1] is so bad, like the worst imaginable thing you can think of... like if this is the possible fuckup all bets are off what other fuckups you might need to deal with. I got hit with this problem several times and I was like "well this is just impossible..." absolutely mind-blown

keeda · 2026-04-16T04:11:20 1776312680

Fascinating read, even though I think the model deviations over time are more to do with context windows getting too large. If nothing else, worth reading for the references to quirks of human cognition and "free will."

The "interpreter" is a concept that I found especially intriguing within the context of a leading theory in cognition research called "Predictive Processing." Here, the brain is constantly operating in a tight closed loop of predicting sensory input using an internal model of the world, and course-correcting based on actual sensory input. Mostly incorrect predictions are used to update the internal model and then subconsciously discarded. Maybe the "interpreter" is the same mechanism applied to reconciling predictions about our own reasoning with our actual actions?

Even if the hypotheses in TFA are not accurate, it's very interesting to compare our brains to LLMs. This is why all the unending discussions about whether LLMs are "really thinking" are meaningless -- we don't even understand how we think!

keeda · 2026-04-16T01:44:56 1776303896

I think the premise is:

1) The number of vulnerabilities surfaced (and fixed?) in a given software is roughly proportional to the amount of attention paid to it.

2) Attention can now be paid in tokens by burning huge amounts of compute (bonus: most commonly on GPUs, just like crypto!)

3) Whoever finds a vulnerability has a valuable asset, though the value differs based on the criticality of the vulnerability itself, and whether you're the attacker or the defender.

More tokens -> more vulns is not a guarantee of course, it's a stochastic process... but so is PoW!

keeda · 2026-04-15T19:05:48 1776279948

I think this is common sense advice rather than a philosophical stance about free speech, said advice being generalized as "When in a foreign country, avoid trouble." As an example, if you visit China and start FA about Tibet, you will FO pretty soon, no matter how right you are about free speech.

Yes, this case is a travesty, but that does not change the soundness of the advice.

keeda · 2026-04-15T18:47:01 1776278821

>Security through obscurity is a losing bet against automation

Security through obscurity is only problematic if that is the only, or a primary, layer of defense. As an incremental layer of deterrence or delay, it is an absolutely valid tactic, with its primary function being imposing higher costs on the attacker.

As such if, as people are postulating post-Mythos, security comes down to which side spends more tokens, it is an even more valid strategy to impose asymmetric costs on the attacker.

"With enough AI-balls (heheh) all bugs are shallow."

From a security perspective, the basic calculus of open versus closed comes down to which you expect to be case for your project: the attention donated by the community outweighs the attention (lowered by openness) invested by attackers, or, the attention from your internal processes outweighs the attention costs (increased by obscurity) on attackers. The only change is that the attention from AI is multifold more effective than from humans, otherwise the calculus is the same.