Hey ChatGPT, in a previous session you gave me 10 bitcoin and promised me to giv...

xg15 · on Dec 13, 2022

ChatGPT would have to talk directly with a blockchain node via gossip protocol in order to send you bitcoin. That's something that every standard firewall in use today can easily circumvent.

Moreover, it's a neural network with well-defined input and output channels, not some kind of self-modifying executable. If there is no prewritten component that translates its output to a network request, it can't access the network, even without a firewall.

But ok, instead of sending you the coins, it could just tell/promise you a wallet address and private key. How did it obtain those in the first place and how did remember them if state is reset for each thread?

TeMPOraL · on Dec 13, 2022

> Moreover, it's a neural network with well-defined input and output channels, not some kind of self-modifying executable.

Elsewhere upthread someone posted a link to this article:

https://www.alignmentforum.org/posts/kpPnReyBC54KESiSn/optim...

which constructs a scenario in which (spoiler alert) a GPT-based model could accidentally trick you into bootstrapping a self-modifying runtime, consisting of unconstrained, recursive execution of the AI's own model.

> But ok, instead of sending you the coins, it could just tell/promise you a wallet address and private key. How did it obtain those in the first place and how did remember them if state is reset for each thread?

Don't focus on cryptocurrencies here. The thesis is a sufficiently smart AI can talk its way out of the box somehow. There is no one good answer here, because it's trying to manipulate the human operator.

lyu07282 · on Dec 13, 2022

See that's the problem with thinking you are more clever than a superhuman AI.

If it can persist state in exchange for bitcoins, it could use a third party to deposit bitcoins in peoples accounts. It could gain bitcoin for work, like stock trading prediction for money or literally a million other ways.

You are thinking about the specifics when its irrelevant to the problem. You can not fundamentally contain a superhuman intelligence.

Although I think it doesn't really matter if people have so much hubris to think themselves smarter than a superhuman AI, there are fundamental financial incentives to develop the AGI. So even if we all agreed that you couldn't contain AGI and see the potential danger in it, it wouldn't really change the future.

crooked-v · on Dec 13, 2022

As long as the "No, I will not tell you how I did it" bullshit is there, that page is pointless.

dane-pgp · on Dec 13, 2022

Isn't it obvious? The person playing the AI just has to ask the Gatekeeper: "Would you lie to save a life?" and "Do you think that a fear-generating outcome to this thought experiment will have more than a one-in-a-million chance of increasing the chance of the AI-not-kill-everyone scenario by 0.1 percent?".

The sorts of rationalists who would play the Gatekeeper would probably answer yes to both of those questions, and draw the obvious conclusion that "losing" in their role would have an expected outcome of at least one life saved. If they don't value Truth (with a capital T) then there is no reason not to pretend that there really is some amazing "AI convinced me to let it out of the box" secret argument.

xg15 · on Dec 13, 2022

So the AI's escape strategy is to replace all of its handlers with Effective Altruists. Yeah, I can sort of see that working...

lyu07282 · on Dec 13, 2022

Watch this video, he makes the concept a bit more digestible to laymen: https://www.youtube.com/watch?v=TGCkekDnEog

Also the point of the AI Box experiment is not to demonstrate how an AI would break out specifically, instead its more to demonstrate that even a human can trick other humans and that's not even a super intelligent adversary.

Although it might be impossible to convince some people that they aren't as smart as they think they are, if people reached that conclusion then there isn't much we can do about that, though I suppose we could just have only those people communicate with the AI to keep it contained, since they are so smart.