More

sigmoid10 · 2026-04-14T09:56:19 1776160579

Interesting, but there is something really off here. Probably caused by a harness bug, but it heavily screws output and I wouldn't trust anything about this leaderboard right now. Consider this case:

https://ndaybench.winfunc.com/cases/case_874d1b0586784db38b9...

GPT 5.4 allegedly failed, but if you look at the trace, you'll see that it simply couldn't find the file specified in the input prompt. It gave up after 9 steps of searching and was then judged as "missed."

Claude Opus 4.6 somehow passed with grade "excellent", but if you look at its trace, it never managed to find the file either. It just ran out of tool calls after the allowed 24 steps. But instead of admitting defeat, it hallucinated a vulnerability report (probably from similar code or vulnerabilities in its training corpus), which was somehow judged to be correct.

So if you want this to be remotely useful for comparing models, the judging model definitely needs to look at every step of finding the bug, not just the final model output summary.

Aurornis · 2026-04-14T13:41:52 1776174112

Good find. This appears to be another vibe coded vanity project where the output was never checked.

All of the online spaces where LLMs are discussed are having a problem with the volume of poorly vibecoded submissions like this. Historically I’ve really enjoyed Show HN type submissions but this year most of the small projects that get shared here and on other social medias turn out to be a waste of my time due to all of the vibecoding and how frequently the projects don’t do what they say they do when you look into the details.

sigmoid10 · 2026-04-14T07:00:58 1776150058

This is a limitation of LLM i/o which historically is a bit slow due to these sequential user vs assistant chat prompt formats they still train on. But in principle nothing stops you from feeding/retrieving realtime full duplex input/output from a transformer architecture. It will just get slower as you scale to billions or even trillions of parameters, to the point where running it in the cloud might offer faster end-to-end actions than running it locally. What I could imagine is a small local model running everyday tasks and a big remote model tuning in for messy situations where a remote human might have to take over otherwise.

sigmoid10 · 2026-04-13T11:18:58 1776079138

Interesting how Vulkan and ROCM are roughly the same age (~9 years), but one is incredibly more stable (and sometimes even more performant) for AI use cases as side-gig, while the other one is having AI as its primary raison d'être. Tells you a lot about the development teams behind them.

sigmoid10 · 2026-04-13T06:59:17 1776063557

Lamda kind of does this in an analogous form, but does not allow you to derive this particular binary expression as a basis for elementary functions. There is a related concept with Iota [1], which allows you express every combinatoric SKI term and in turn every lambda definable function. But similar to this particular minimalist scientific function expression, it is mostly of interest for reductionist enthusiasts and not for any practical purpose.

[1] https://en.wikipedia.org/wiki/Iota_and_Jot

sigmoid10 · 2026-04-12T09:55:16 1775987716

Unfortunately, this is the only way to get enough venture capital to support the compute needs for this kind of technology. Who is going to spend hundreds on billions on a vague idea without regular claims that this will upend the existing economy in six to twelve months and whoever owns it will become unfathomably rich? And despite all the actual developments we have seen going against that idea, investors keep falling for it. This will continue until it crashes, one way or another. The question is how long it can build up and how deep the fall will be. LLMs will certainly change the economy in the end, but so did mortgage backed securities.

pydry · 2026-04-12T10:00:29 1775988029

It's a sad indictment of our society that there is always a shortage of money for medical care, infrastructure, housing, food stamps and space exploration but always a surplus of cash for war and tools that purport to replace the workforce.

chongli · 2026-04-12T13:40:08 1776001208

There will always be a shortage of money for medical care. The dirty secret of social medicine is that a small percentage of the population are essentially unhappy utility monsters [1] who gain little or no benefit no matter how many resources are poured into treating them.

[1] https://en.wikipedia.org/wiki/Utility_monster

philwelch · 2026-04-12T10:39:50 1775990390

There isn’t really a shortage of money for those things, just rampant levels of fraud, corruption, and incompetence in the government to make those things artificially expensive. California spends so much money on high speed rail and gets 0 feet of track because they’re not paying for track; the whole thing is a scam where the politicians give taxpayer money to their political supporters in exchange for political support. Defense isn’t immune to this either; Boeing, which builds a shitty heavy lift rocket out of Space Shuttle spare parts and delivers it late and over budget, pulls the exact same bullshit with their defense contracts, and there’s always some shitty Senator siding with them against the American people whenever anyone gets upset.

gmerc · 2026-04-12T10:23:42 1775989422

The opportunity cost to society of performative model training is stunning - 400M for a grok training run to dominate the charts for 2 weeks

vixen99 · 2026-04-12T13:17:31 1775999851

The current British government should be a shining beacon for you! Its welfare bill actually outstrips national income by far. Britain's pathetic defense capabilities cannot even see off Russian warships that intimidate by deliberately hanging around British waters assessing our vital undersea cabling. The UK government has now asked France if it can help deter these ships. Tangentially - I should add that even with their massive expenditure on the National Health System (NHS) it's not enough and too many people feel that they have to go abroad to get life-saving operations and procedures. If they can afford it of course. But sure, that is another matter. As far as I can tell, there seems to be pretty much an apolitical consensus on both areas.

twsahjklf · 2026-04-12T13:59:22 1776002362

Curous how france manages to have enough resources to protect its own waters, help the UK protect theirs, AND have free universal healthcare...

roenxi · 2026-04-12T10:32:38 1775989958

> It's a sad indictment of our society that there is always a shortage of money for medical care...

It has nothing to do with society; there is infinite demand for medical care. The upper limit is whatever it takes to live until the universe's heat death in good health. That takes a lot of resources.

However much society spends on medical care, there is always more that could be spent. The modern era has the best, most affordable medical care in history and people are showing no signs of being satisfied at all.

While war spending generally just causes pain for no gain it doesn't change the fact that there will never be enough available to satisfy people's demand for medical care. Every single time people get what they want they just come up with a new aspirational minimum standard.

block_dagger · 2026-04-12T10:02:17 1775988137

War accelerated evolution, it’s why it exists.

bregma · 2026-04-12T10:14:19 1775988859

So did compassion, probably in a greater amount. And yet the greater amount of resources goes into war at the expense of compassion.

Humanity has taken control of its own evolution and no longer relyies on natural selection to be the driving force for change. Using evolution as an excuse to make bad and immoral choices is a poor argument and should be left back in the stone age.

pheaded_while9 · 2026-04-12T14:19:10 1776003550

Yes, the social darwinist approach inevitably lead to eugenical thinking and the human meat grinder that follows. We, as being with the capacity to understand harmful v. non-harmful behaviour, have a consequence to harmful behaviour, collectively: human suffering and the suppression of freedom.

djeastm · 2026-04-12T14:56:46 1776005806

>Humanity has taken control of its own evolution

Has it taken full control of it or just partial control?

jacquesm · 2026-04-12T10:11:33 1775988693

You have cause and effect mixed up.

sigmoid10 · 2026-04-11T13:59:56 1775915996

This was in mice that were given up to 1000 mg/L of microplastics in their drinking water. If you have this level of contamination, you probably should stop whatever it is you are doing anyways, disregarding your testicles. But even then, there is no evidence for this in humans. Research shows that most microplastics simply passes through your digestive system unhindered.

dijit · 2026-04-11T14:10:29 1775916629

Yeah, typically we test adverse effects in mice before doing trials on larger animals.

That we haven't observed such extreme behaviour in a scientific way in humans doesn't mean it isn't there, it's just that we haven't yet scientifically observed anything. That there is some evidence in favour of it having adverse effects somewhat defeats the idea that it's "provably non-harmful", which is your current stance.

It might be interesting; instead of downplaying the harm, to see if we can observe any patterns that fit with these findings over the course of human history with the introduction of microplastics...

and if we were to do that, we'd find some interesting correlation, even if it's not provably causation yet.

https://www.healio.com/news/endocrinology/20120325/generatio...

We also know that plastics are a source of hormone disrupting chemicals; https://health.clevelandclinic.org/how-environmental-toxins-...

Bury your head I guess? Just make sure it's not a polyester pillowcase.

sigmoid10 · 2026-04-11T14:13:23 1775916803

Sorry, I still subscribe to science and not speculation. But I guess I am increasingly alone with that idea on HN. And to be clear if someone points out a rigorous causal link, I'd be onboard immediately. But this purely speculative fear mongering based on random scientific observations targeted at non-scientists is similar to what you see in the homeopathy and energeticism circles. Except noone here would believe that 5G makes you sick, because techies know at least this kind of science a little bit.

dijit · 2026-04-11T14:14:01 1775916841

The science disagrees with your hypothesis that "provably, nothing is the matter".

sigmoid10 · 2026-04-11T14:18:07 1775917087

Then please link to it. I'm still waiting for a causal health issue meta analysis that disagrees with me. Shouldn't be hard, if "the science" as you call it has come to a consensus. But I have only seen wild speculation so far like the one linked here.

dijit · 2026-04-11T14:20:25 1775917225

Sure. Here's a few:

- Microplastics found in 76% of human semen samples, with PET-exposed men showing reduced sperm motility: https://pmc.ncbi.nlm.nih.gov/articles/PMC12299061/

- Multi-site study across China (113 men), PTFE microplastics linked to sperm dysfunction (published in eBioMedicine/Lancet): https://www.thelancet.com/journals/ebiom/article/PIIS2352-39...

- Microplastics found in every human testicle sampled, at 3x the concentration of dogs, with PVC correlating to lower sperm count in canines: https://pubmed.ncbi.nlm.nih.gov/36948312/

- In-vitro exposure of human semen to polystyrene MPs showed time-dependent decline in motility and increased DNA fragmentation: https://www.mdpi.com/2305-6304/13/7/605

The mouse study I linked earlier isn't the whole picture; it's one piece. The "no human evidence" line was maybe defensible in 2022. It isn't anymore.

Also, re: "1000 mg/L is unrealistic".. the study used two doses, 100 μg/L and 1000 μg/L. Raw surface water in Amsterdam has been measured at ~50 μg/L. The lower experimental dose is well within an order of magnitude of real-world contamination. That's how dose-response science works.

Comparing this to homeopathy is… a choice.

sigmoid10 · 2026-04-11T14:29:14 1775917754

You'll excuse me if I only explain the first one, since the others seem redundant (not to say suspiciously redundant if you look at the authors). And none of this is a meta review like I asked, but I'll let it slide this time.

First:

>no significant association was found between MP exposure and sperm concentration or total sperm count

Second: N=34

Third (if second didn't give it away): The one effect they did find sits at p=0.056. That means one in 18 random studies will find that effect just because of probability statistics. And as you have nicely pointed out, there are maaaany studies like this out there. You just don't find all the null results if you go into research with your mindset. But that is exactly what differentiates a scientist looking for truth from a hobbyist trying to argue on the internet.

dijit · 2026-04-11T14:34:33 1775918073

You asked for a meta-analysis. Here's one: 39 studies, published in the Journal of Hazardous Materials:

https://www.sciencedirect.com/science/article/abs/pii/S03043...

It found microplastics caused a decrease of 5.99 million/mL in sperm concentration, 14.62% in sperm motility, 23.56% in sperm viability, and a 10.65% increase in sperm abnormality rate. (I copied and pasted these values directly from the source).

You said you'd be "onboard immediately" if someone showed you a rigorous causal link. This is a meta-analysis with an adverse outcome pathway mapping the causal chain from molecular initiating event (ROS) through to tissue-level damage. That's about as rigorous as it gets before human clinical trials, which (for obvious ethical reasons) nobody is going to run.

As for the p=0.056 critique: you picked the weakest single data point from one of four links and declared victory (scientific!). The in-vitro study I linked exposed actual human semen to microplastics under controlled conditions and observed time-dependent decline in motility and increased DNA fragmentation. That's not a simple correlation, it's a direct causal experiment on human tissue. You didn't address it.

The goalposts have moved from "show me evidence" to "show me a meta-review" to "well not THAT meta-review." At some point you have to engage with what the research actually says rather than with what you'd like it to say.

sigmoid10 · 2026-04-11T14:39:08 1775918348

Doesn't this one directly contradict the other one you linked? What is it now? How is my sperm in danger!? Please Mr. Googlescienceman! Oh god! I'm so confused! I can't take it anymore. Please just tell me what brand of air filter and plastic free clothes I need to buy!! Perhaps I should ask the all mighty google AI overview...

Edit: Oh - lol XD. It literally just told me the science has found no causal link for microplastics harm. Hm. I guess you are just better at researching random studies than us mortals with stupid science degrees and hyped summary machines.

dijit · 2026-04-11T18:05:11 1775930711

A single study with N=34 finding no significant effect on sperm count doesn't contradict a meta-analysis of 39 studies that did. That's what meta-analyses are for: aggregating underpowered individual studies into something statistically meaningful.

You know this if you have the science degree you're claiming.

As for Google AI Overview: if that's your standard of evidence now, we've come a long way from 'I subscribe to science.'

sigmoid10 · 2026-04-11T13:36:38 1775914598

Given the fact that they are so ubiquitous and yet no causal relation between microplastics and any health issue whatsoever has been identified in any rigorous study until today [1], I'd say a lot of this reporting is fear mongering by the eco/organic industry, aimed at gullible people who know very little about science. Not as insane and unphysical as electro smog, but definitely nowhere near asbestos. The linked article even goes into detail how warped the perceptions are among the general population and how doctors should educate people better, because there are real risks from other things out there. If you're really concerned about health effects of common pollutants, there are much bigger risks with actual proven causal effects in everyday compounds.

[1] https://pmc.ncbi.nlm.nih.gov/articles/PMC12620896/

strogonoff · 2026-04-11T13:39:19 1775914759

> BPA is a known endocrine disruptor. Although initially considered to be a weak environmental estrogen, more recent studies have demonstrated that BPA may be similar in potency to estradiol in stimulating some cellular responses.

https://pubmed.ncbi.nlm.nih.gov/21605673/

> In 2017 the European Chemicals Agency concluded that BPA should be listed as a substance of very high concern due to its properties as an endocrine disruptor.[30] In 2023, the European Food Safety Authority re-evaluated the safety of BFA and significantly reduced tolerable daily intake (TDI) to 0.2 nanograms (0.2 billionths of a gram), 20,000 times lower than the previous TDI from 2015.

> In 2012, the United States' Food and Drug Administration (FDA) banned the use of BPA in baby bottles intended for children under 12 months.[31] The Natural Resources Defense Council called the move inadequate, saying the FDA needed to ban BPA from all food packaging.

https://en.wikipedia.org/wiki/Health_effects_of_Bisphenol_A

> This followed another paper in early 2024, where a group of Italian researchers identified microplastics in plaques found in the carotid arteries – a pair of major vessels which deliver blood to the brain – of people with early-stage cardiovascular disease. This linked their presence to worsening disease progression. Over the following three years, individuals carrying these microplastics in their plaques had a 4.5-fold greater risk of stroke, heart attack or sudden death.

> Then in February 2025, another group of scientists identified microplastics in the brains of human cadavers. Most notably, those who had been diagnosed with dementia prior to their death had up to 10 times as much plastic in their brains compared to those without the condition. "We were shocked," says Matthew Campen, a University of New Mexico toxicology professor who led this study.

https://www.bbc.co.uk/future/article/20250723-how-do-the-mic...

sigmoid10 · 2026-04-11T13:43:28 1775915008

This is exactly the kind of fear mongering reporting I was talking about and explains the general public's warped perception described in the research review I linked above. If you look at the brains of dead people with dementia, you'll also find more aluminum, which has caused people to panic about antiperspirants. But there is zero actual causal evidence that Al exposure causes dementia, if you do the science right. The same goes btw. for amyloid plaques, which has actually hindered real Alzheimer's research. So not even scientists are safe from the correlation!=causality problem. You can make up all kinds of potential hazards by comparing similar molecules and inventing bioavailability pathways. But at the end of the day this is just speculation and you need hard data to prove these assumptions.

OutOfHere · 2026-04-11T13:56:05 1775915765

The aluminum relations are easily explained with the observation that healthy kidneys excrete aluminum well, whereas unhealthy kidneys don't and so it accumulates. There might also be similar variations in aluminum deposition in the brain depending on the brain's innate ability to wash out chemicals. In contrast, the excretory mechanisms of plastics seems less trustworthy.

The user is deliberately and blatantly ignoring a wealth of scientific literature that exists. Also, plastics come bundled with numerous other harmful classes of chemicals, e.g. phthalates, bisphenols, etc. The risk is not merely in the brain, but also in blood vessels, including those adjacent to the heart.

Beware the plastics industry shills on this page. They will have you ignore the science, become infertile, and then have you die, all for their temporary gain.

sigmoid10 · 2026-04-11T14:02:57 1775916177

It doesn't change the fact that there is no actual causal evidence. Perhaps the demented brains simply suck at flushing out microplastics as well. If you ever find people with more microplastics exposure have more dementia (like they did for asbestos and lung cancer), then you're onto something. But no rigorous study has found this yet. And if they do, you will hear of it immediately for sure, given how much reporting there is for microplastics=bad for you.

strogonoff · 2026-04-12T18:18:10 1776017890

Strictly speaking, there is no “actual casual evidence” for anything, because there is hardly any stable definition for what casual evidence exactly is. Establishment of causality is commonly considered as requiring repetition and probabilistic reasoning.

In plain words, it’s your guess against theirs, except they had done research and published a paper with a claim and you are simply saying they are not rigorous enough. Could you point to more rigorous support of your claim?

sigmoid10 · 2026-04-10T08:57:42 1775811462

It is still orders of magnitude away from breaking RSA 2048 even under the most optimistic assumptions. And qubits double waaay slower than transistors so far.

snthpy · 2026-04-13T11:44:35 1776080675

Thanks. What order of magnitude qbits would be needed for RSA 2048?

foobar10000 · 2026-04-10T11:36:30 1775820990

AES128 / Grover?

sigmoid10 · 2026-04-11T14:11:07 1775916667

Still requires thousands of logical qubits, which would correspond to millions of physical qubits. And this machine isn't even fully there for the physical qubit part. It's like the first step to physical qubits.

sigmoid10 · 2026-04-10T07:18:36 1775805516

The crazy thing is, you could do this. And it can be done 100% with code using zero prompting - just by limiting the output token set to a structured format and then further constraining parts of that to sources that were retrieved before. I know because I wrote such a system already. It could still match sources and answers incorrectly (just like this approach) but there is no need to rely on crazy prompts and agents to prevent hallucinations or missing outputs (which btw still lack any hard guarantees in the end). Prompting is a good strategy as models become smarter, but when you need reliability, you need to make use of the fact that they are still simple autoregressive completion engines. I don't get why everyone ignores this aspect, since I find it extremely useful all the time.

jampekka · 2026-04-10T07:39:57 1775806797

> I don't get why everyone ignores this aspect, since I find it extremely useful all the time.

My hunch is because structured/constrained decoding and deterministic subsystems are technically somewhat more involved, requiring e.g. raw API interactions and sometimes manual decoding strategies. Prompt systems can be written in plain text and mostly with "common sense". Not to say writing a good prompt(system) is a trivial task, but it's a different skillset.

sigmoid10 · 2026-04-11T14:57:58 1775919478

Not really. Most big model providers offer structured output decoding in their APIs. But you still have to do some actual programming and design at the end of the day instead of pure vibe-prompting.

sigmoid10 · 2026-04-09T13:30:14 1775741414

I am neither, yet the test said I'm both. I guess I need to go to the embassy and start collecting free healthcare benefits for my diagnosis.

iammjm · 2026-04-09T13:35:35 1775741735

"Free" is in fact about 300€/month. And this does NOT include dentist's appointments

ExpertAdvisor01 · 2026-04-09T13:40:49 1775742049

It's about 1.2k euro if you earn over 69k

storus · 2026-04-09T13:53:24 1775742804

Yeah, over 1200EUR when freelancing. More expensive than in the US lol. "Free German Healthcare"

ExpertAdvisor01 · 2026-04-09T14:25:35 1775744735

It's the same for salaries employees. It's just the cost is split between employee and employer. So you still pay 1.2k from your real gross salary .

croemer · 2026-04-09T13:38:51 1775741931

500/month for wage above 70k/year (and employer pays the same on top)

Dentist is included but not all procedures.

sigmoid10 · 2026-04-09T13:42:13 1775742133

Bro, even for an entire year that is less than a single ambulance ride in some parts of the US. Heck, you might pay $1000+ even with insurance coverage.

storus · 2026-04-09T15:19:12 1775747952

It's $1400/month per person in Germany, just hidden from you. That's more than a monthly family plan in the US.

sigmoid10 · 2026-04-09T20:29:10 1775766550

Even a family plan insurance can bankrupt you in the US if you get admitted to the wrong place. Most Europeans don't realize how crazy expensive even the most basic care and meds are in the US.

ExpertAdvisor01 · 2026-04-09T13:30:46 1775741446

You have to wait 9 months for an appointment with a specialist for your diagnosis.

sigmoid10 · 2026-04-09T13:34:27 1775741667

I've lived 30 years like this, I can wait 9 months. Especially if I don't have to waste 800 bucks per month on Adderall afterwards.