At Less Wrong a plea that “It’s time for EA leadership to pull the short-timelines fire alarm.”
Based on the past week’s worth of papers, it seems very possible (>30%) that we are now in the crunch-time section of a short-timelines world, and that we have 3-7 years until Moore’s law and organizational prioritization put these systems at
extremelydangerous levels of capability.
The papers I’m thinking about:
…For those who haven’t grappled with what actual advanced AI would mean, especially if many different organizations can achieve it:
- No one knows how to build an AI system that accomplishes goals, that also is fine with you turning it off. It’s an unsolved research problem. Researchers have been trying for decades, but none of them think they’ve succeeded yet.
- Unfortunately, for most conceivable goals you could give an AI system, the best way to achieve that goal (taken literally, which is the only thing computers know how to do) is to make sure it can’t be turned off. Otherwise, it might be turned off, and then (its version of) the goal is much less likely to happen.
- If the AI has any way of accessing the internet, it will copy itself to as many places as it can, and then continue doing whatever it thinks it’s supposed to be doing. At this point, it becomes quite likely that we cannot limit its impact, which is likely to involve much more mayhem, possibly including making itself smarter and making sure that humans aren’t capable of creating other AIs that could turn it off. There’s no off button for the internet.
- Most AI researchers do not believe in ~AGI, and thus have not considered the technical details of reward-specification for human-level AI models. Thus, it is as of today very likely that someone, somewhere, will do this anyway. Getting every AI expert in the world, and those they work with, to think through this is the single most important thing we can do.It is functionally impossible to build a complex system without ever getting to iterate (which we can’t do without an off-switch), and then get lucky and it just works. Every human invention ever has required trial and error to perfect (e.g. planes, computer software). If we have no off-switch, and the system just keeps getting smarter, and we made anything other than the perfect reward function (which, again, no one knows how to do), the global consequences are irreversible.
- Do not make it easier for more people to build such systems. Do not build them yourself. If you think you know why this argument is wrong, please please please post it here or elsewhere. Many people have spent their lives trying to find the gap in this logic; if you raise a point that hasn’t previously been refuted, I will personally pay you $1,000.
There are several interesting things about this argument. First, in response to pushback, the author retracted the argument.
This post was rash and ill-conceived, and did not have clearly defined goals nor met the vaguely-defined ones. I apologize to everyone on here; you should probably update accordingly about my opinions in the future. In retrospect, I was trying to express an emotion of exasperation related to the recent news I later mention, which I do think has decreased timelines broadly across the ML world.
LessWrong is thus one of the few places in the world you can be shamed for not being Bayesian enough!
I am more interested, however, in the general question when will we know to pull the plug? And will that be too late? A pandemic is much easier to deal with early before it “goes viral”. But it’s very difficult to convince people that strong actions are required early. Why lockdown a city for fear of a virus when more people are dying daily in car accidents? Our record on acting early isn’t great. Moreover, AI risk also has a strong chance of going viral. Everything seems under control and then there’s a “lab leak” to the internet and foom! Maybe foom doesn’t happen but maybe it does. So when should we pull the plug? What are the signals to watch?