Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

By the time these academic studies get published they are usually already several months out of date. o3-mini was released yesterday and if one wants to know about the limitations of current technology they are much better to check twitter than some research paper


I think the breathless hype train of twitter is probably the worst place to get an actually grounded take on what the real world implications of the technology is.

Seeing the 100th example of an llm generating some toy code for which there are a vast number of examples of approximately similar things in the training corpus doesn’t give you a clearer view of what is or isn’t possible.


I think that most of the developers who advocate for AI coding have never worked all by themselves on projects with over 500/1000 files. Because if they had they would not advocate for AI coding.


I posted this earlier, but I wanted a java port of sed for ... Reasons, and despite the existence of man pages and source code it couldn't do anything but the most basic flags.

Imo this should be low hanging fruit. Porting non-trivial but yet 3-4 core code files that are already debugged and interface specified should be what an LLM excels at.


I tried this with Microsoft's Copilot + the Think Deeper button. That allegedly uses the new o1 model. It goes into a lot of fancy talk about...pretty much what you said older models did. Then it said "here's some other stuff you could extend this with!" and a list of all the important sed functionality.

It's possible it could do it if prompted to finish the code with those things, but I don't know the secret number of fancy o1 uses I get and I don't want to burn them on something that's not for me.

You should be able to access it here if you have a Microsoft account and want to try the button: https://copilot.microsoft.com/


Or neither. Try it yourself.

For me, LLMs still don’t meet basic usefulness and are a net negative when I try to use them. I push code daily for my job.


I have a good use case for them: Communication with the bureaucracy of my country. I tell my LLM of choice to write a letter to $whoever about $whatever, then I print it out (yes, we still have to do this as email don't get accepted) and send if off. I don't even need to proof read it because if there's a mistake the bureaucracy will tell me in another letter. So the burden of correctness checking is on some bureaucrat which saves me time and mental resources.

I wouldn't ever use a LLM for anything where correctness matters (code) because I'd spend the same amount of time checking the generated code as writing it myself. But a letter to my tax office? Heck, why not. If something goes really wrong I can always say "gee, I made a mistake let's try it again".


So what, you use it to spam and waste other people's time? I know, dealing government bureaucracy and corruption is soul leeching but spam was always one of the golden usecases for generated AI.


Sending official letters to the local government isn't spam, and generally not a waste of time.

People with cognitive issues, issues typing, language or presentation issues, LLMs provide a massive improvement in how they are percieved and recieved by the other side. Also, immigrants or people with langauge issues aren't quite as disadvantaged and don't need to use excess time translating or risking an embarrasing misstatement. It's a night or day accommodation tool in the right circumstances.


No, I don't just send them random letters. I reply to mail I get from them or when I need them to do something (like adjust my tax pre-pay).

Also one could argue that bureaucracies only exist to create bullshit jobs and waste citizens' time. So I wouldn't even feel bad about spamming those assholes.


The paper is recent and being discussed here: https://news.ycombinator.com/item?id=42889786


It fundamentally does not matter. Matrix multiplication does not erase the truth of Godel and Turing.


Godel and Turing just proved that there are some true things that can't be proved, and things that cannot be computed. They didn't show where those boundaries are.

They certainly didn't show those boundaries to be below human cognition level.


Godel proved that there are unprovable statements. Turing showed that certain classes of problems can only be solved by machines with infinite tapes. This no bounded LLM can possibly solve every turing complete problem. Only theoretically infinite chain of thought can possibly get us that power.

Godel then tells us that, if we have such a system, there are things where this system may get stuck.

Indeed this is what we see in chain of thought models. If you give them an impossible problem they either give up or produce a seemingly infinite series of tokens before emitting the </think> tag.

Turing tells us that examining any set of matrices modeling a finite state machine over an infinite token stream is the halting problem.


Theoretical computability is of dubious practical relevance.

Consider two problems:

Problem A is not computable

Problem B is computable in principle, but, even for trivially sized inputs, the best possible algorithm requires time and/or space we’ll never have in practice, orders of magnitude too large for our physical universe

From a theoretical computer science perspective, there is a huge difference between A and B. From a practical perspective, there is none whatsoever.

The real question is “can AIs do anything humans can do?” And appealing to what Turing machines can or can’t do is irrelevant, because there are a literally infinite number of problems which a Turing machine can solve, but no human nor AI ever could


So the article is about what humans v LLMs can do, except in the article, LLM is taken to mean just a single output auto regressive model (no chain of thought). Since an LLM has a constant number of steps at each token generation, no it cannot do everything a human can. Humans can choose when to think and can ponder the next action interminably. That's my point. When we force LLMs to commit to a particular answer by forcing an output at each token generation, the class of problems they can solve is trivially less than the equivalent human.


I agree that a raw autoregressive LLM model with just a single output is (almost necessarily) less capable than humans. Not only can we ponder (chain of thought style), we also have various means available to us to check our work – e.g. for a coding problem, we can write the code, see if it compiles and runs and passes our tests, and if it doesn't, we can look at the error messages, add debugging, try some changes, and do that iteratively until we hopefully reach a solution–or else we give up – which the constraint "single output" denies.

I don't think anyone is actually expecting "AGI" to be achieved by a model labouring under such extreme limitations as a single output autoregressive LLM is. If instead we are talking about an AI agent with not just chain of thought, but also function calling to invoke various tools (including to write and run code), the ability to store and retrieve information with a RAG, etc – well, current versions of that aren't "AGI" either, but it seems much more plausible that they might eventually evolve into it.

I don't think we need to invoke Turing or Gödel in order to make the point I just made, and I think doing so is more distracting with irrelevancies than actually enlightening.


Yeah, the grounded take is that Turing and Gödel apply just as much to human intelligence. If not, someone please go ahead and use this to physically prove the existence of an immortal, hypercomputational soul.


Who is trying to “erase the truth of Gödel and Turing”? (Well, some cranks are, but I don’t think that’s who you are talking about.)

Gödel and Turing’s results do not appear to give any reason that a computer program can’t do what a person can do.


That's not the point. Computer program with a finite number of steps (an auto regressive LLM without chain of thought) has a limit in what it can reason in one step. This article does a lot of wordcelling to show this obvious point.


That seems irrelevant to Gödel? If that was your point, you should have said that rather than the things about Turing and Gödel (which leads people to expect you are talking about the halting problem and incompleteness, not the limitations that come from a limited depth circuit)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: