Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> they really don't have overall goals or motivations ... Their output is just an average of what text would some sequence of text

Yes, but "Optimality is the tiger, and agents are its teeth".[0]

I don't want to spoil that essay by explaining how the LLM suddenly starts acting like an agent, but I can assure you that the author does a very good job of setting up a "Yes, that seems safe to me" thought experiment before revealing the "Oh no, that's terrible!" outcome.

[0] https://www.alignmentforum.org/posts/kpPnReyBC54KESiSn/optim...



That seems to just be a nerded out version of this, perhaps original and not cited, article from many years ago (with the same theme and discourse) but much more digestible and lacking the pretentiousness: https://waitbutwhy.com/2015/01/artificial-intelligence-revol...

PS: the scale of the problems and the civilization around the model is in part 2 of the link


The article you link is a derivative and somewhat ELI5 reinterpretation of a lot of other, older work, including in particular the articles Eliezer Yudkowsky and others published on LessWrong, all done before OpenAI was a thing, before deep learning was something widely talked about.

The article GP posted is in direct lineage of the LessWrong body of work/community. It's not "pretentious" or "nerded out" - it's less handwavy, addresses a specific problem, and assumes the reader are broadly familiar with the ideas discussed - whereas the WaitButWhy article is basically AI safety 101.

EDIT: and I will spoil the article somewhat for those on the fence whether to read it: it shows how an explicitly non-agent, limited, nerfed AI could unwittingly trick you into bootstrapping a proper generic AI on top of it - not because it wanted to, or knew it would happen, but because it pattern-matched you a concise and plausibly looking answer, that has a fatal complexity-escalating bug in it.

(Hint/spoiler: you know how you can turn a constrained computational system (e.g. HTML5 + CSS3) into a Turing-complete one just by running it in a loop that preserves its state? Something equivalent happens here.)


I don't think the parent's point is a stylistic criticism of the linked post.

Rather, I'd take the parent as saying that the linked post might cite the properties of current LLM systems but it ultimately isn't using them in it's arguments. Rather, it's same old argument - "start adding capabilities and boom suddenly you have an agent that take over the world".

The understanding we have of current LLMs is that they're very capable as text synthesizers but are quite random with any accuracy concerning the world. We've gone from GPT-2 to ChatGPT and the systems have gotten many times better (as smooth text synthesizers) but haven't gotten many times reliable in the particular descriptions of the world they give. They still say clearly wrong thing without prompting regularly - like every paragraph regularly for anything slightly obscure.

The main thing is that the rise of deep learning has actually highlighted goal accomplishment as a far more difficult task than classification, information retrieval and text/image synthesis. Self-cars keep getting mentioned and that's justified imo 'cause a huge amount of resources have been put into getting modern system to accomplish a fairly defined and limited "real world task" and all those efforts have mostly failed.

The key distinction of goal accomplishment is a system has engaging a very loop of making small judgements, each of which has be correct to an extraordinarily high degree. The linked text elides the difference between these tasks by talking of a vague breakthrough that makes the program "insightful". But we have to consider what's actually needed for what simple one wishes to do. Our present machine, ChatGPT, might, for example, give correct instructions for some complex auto repair (combining the patterns of several simple repair, maybe). But it couldn't "walk you through the process of doing the task" since it would continue it's tendency to say wrong things and such wrong things actually cause damage (like wrong turns in the self driving cars).

Your point and the linked text's point is that these things are insidious and can "get more complex" without one realizing it. But no one has made a reliable goal getting device out of just hooking a Turing machine to a neural network. The point is neural network today aren't "nerfed" in any way in the spectrum of possibilities, they're best people can do and they're making great progress in some measures but still by many fairly clear measures, failing to go beyond their limitations.

I'm not saying that it's impossible that a great advance happens involving making neural network reliable enough to accomplish goals (and to respond robustly the changes in the world, etc,etc). It's not impossible that it would happen at random but it seems more likely than any other advance happening at random. And also, if such an advance happens purposefully, there's no reason to think it will happen in a "you give it the ability to be much accurate and to seek goals but you know nothing about the goals it seeks" way.

And I'm familiar with the Lesswrong community. That group seems very tied to it's initial assumptions and it seems to fail to note the processes involved in current and potential future AIs. This is a long post so I'll just say one consistent error they make is assigning probabilities to fundamental unknowns. That's an abuse of the assumptions of any theory of probability and mostly results in a belief in whatever thing having some chance of appearing without explanation.


That was an excellent read, thanks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: