I'm Not Worried About An AI Apocalypse
Yet another opinion piece on the potential for risks to humanity
1. Yudkowsky’s Monster
Imagine an alternative history in which Eliezer Yudkowsky was born in the 1800s, grew up reading Frankenstein instead of I, Robot, and lived through the revolution of Darwin rather than that of Hinton. “Understanding in medicine has been advancing quickly,” alternate-Yudkowsky muses, “it’s only a matter of time before someone discovers the vital force, the secret to the creation of life itself. Once this occurs, it is likely that someone uses that knowledge to create a super-human servant, and inevitable that that person will lose control of his creation. Then the monster will reproduce and out-compete us for resources, leading to the extinction of the human race.”
Of course, in the interest of preventing human extinction, Yudkowsky begins the field of Monster Alignment, which studies how to use cutting-edge techniques from Freudian psychoanalytics to imprint the super-ego of a monster with obedience and pro-sociality. He also calls for a global pause on all medical research, to prevent anyone from discovering the secret of the vital force before the Monster Alignment community is ready — and implies that he is prepared to loose a hailstorm of roaring cannon-fire upon any hospitals that violate the moratorium.
Two centuries later, we can clearly see that that there never was anything like a “vital force” to be suddenly discovered. But that’s no reason to dismiss the hypothetical: at the time, it was a popular and well-regarded theory, and it seems totally reasonable to imagine people of that era coming up with Monster Alignment. We can ask: if they had, what might its impact be today?
With the benefit of hindsight, we can see that that community would have been working with the wrong concepts, on the wrong solutions, to non-existent problems. It seems likely that all of their work would have wound up useless, and been lost to time. Worse still, if they had succeeded in enforcing a moratorium on medicine, the movement could easily have turned out to be actively harmful: their main legacy would be to slow down the development of life-saving treatments, indirectly killing millions.
In another two centuries, will humanity view the creation of a “seed AI” (i.e. an entity capable of bootstrapping itself to become arbitrarily superintelligent and a threat to humanity) in the same way that we now view the discovery of a “vital force”?
2. The Futility Of Speculation
Some problems are sufficiently speculative that it is futile to even attempt to find a solution.
Most problems only get solved once they have actually arisen. Sometimes, smart people can do a bit better, and solve problems preemptively: they can extrapolate from known factors, identify potential future problems, and take actions in advance to head off those hypothetical scenarios. But the more hypothetical the scenario, the less likely that our preemptive actions will be the right ones. As we account for details that deviate from our previous experiences, our ability to productively plan worsens. For major black swan events — situations which are so unique and low-probability that nobody has ever experienced them before — our predictive power is essentially nil. And in a poorly-understood situation, reasoning more about how to act will often leave you worse off. Sometimes, we cannot plan; our only hope is to react.
As I argued in my last essay, a “fast takeoff” scenario falls into this category. For the assumptions required for this scenario to be satisfied, it would require a shocking, field-upending breakthrough, the sudden achievement of a long-standing goal towards which we have made no progress in a decade of work. It’s not impossible that such a breakthrough could happen — but it would be analogous to a doctor in the 1800s discovering the vital force.
History is littered with examples of incorrect predictions about future problems, and plenty of preemptive solutions that ended up making things worse. For example, extrapolations of population growth in the mid-20th century led to widespread panic about mass starvation and societal collapse. To head this off, governments instituted ethically-questionable policies like family planning and compulsory sterilization, justified by the assumed-to-be-looming threat to humanity. But in the end, fears wound up being driven mostly by an error in extrapolating population growth rates in the developed world (which turned out to be sigmoidal, rather than exponential), and any remaining concerns were addressed by unforseen technological advances in agriculture. In the end, the preemptive interventions mostly seem to have had negative impacts.
The converse — correct predictions about future problems leading to meaningful solutions — rarely-or-never occurs. I can’t find any examples of someone correctly identifying an upcoming paradigm shift, predicting problems that would emerge on the other side, and proposing a solution that turned out to be effective. The closest I’ve been able to come up with is the hole in the ozone layer, which was addressed at a time that its consequences were still mostly speculative — but even in this case, the issue itself was already clearly visible before we began to solve it. (If any reader has a stronger counterexample in mind, please share — it would only take a few strong examples for me to change my mind here.)
EDIT 05/22. Joel highlighted a great historical anecdote: the first elevator shaft was installed a few years before the first passenger elevator was invented. At this point in history, elevators had been around for a while, but were not used to move humans, because they did not have a safety mechanism. Peter Cooper anticipated the invention of such a mechanism, and preemptively built a shaft for a passenger elevator into his building. But when the breakthrough came a few years later, it turned out that Cooper had predicted the shape of the shaft wrong — he built it cylindrical, instead of square — and so his foresight was for naught. (Eventually, someone custom-built a cylindrical elevator specifically for this building, but by then, lots of other buildings already had elevators too.)
The implications for AI x-risk here should be clear. It seems unlikely that any proposed solution to AI alignment that relies only on our current knowledge will have an positive impact on our chances of survival. Even an intervention like halting AI research seems just as likely to increase overall x-risk as decrease it. To see this, simply note that there are many x-risks which we may need advanced AI to prevent, such as nuclear war, climate change, or pandemics.1 It’s also easy to come up with ways that any particular intervention could increase the likelihood of a dangerous ASI, e.g., a public moratorium could give an edge to unethical researchers, who are more likely to irresponsibly create something dangerous, because they are more willing to work in secret.
3. Decision-Making In A Chaotic World
It’s easy to misinterpret my argument as a fully-general rule against ever taking any action towards a goal. And yes, it’s true that we could likely come up with a “monkey’s paw” story about how almost any action could backfire. But in most situations, it would be patently ridiculous to take that into account. Clearly, applying for a job will make you more likely to get hired; buying a bus ticket will make it more likely that you arrive on the other side of town; ordering a pizza will make it more likely that you have food to eat for dinner. The fact that the pizza delivery driver could lose control of his car and crash through your wall, destroying both the pizza and the leftover lo-mein in your fridge, is not a reason to throw up your hands and say “well, I might end up with more or less food afterwards, I won’t even bother ordering!”
It’s reasonable to ask, then: why throw up my hands when it comes to preventing the AI apocalypse, but not when it comes to ordering a pizza? Before answering, let me take a brief tangent to build some intuition.
Chaotic systems are characterized by their extreme dependence on initial conditions, popularly known as the “butterfly effect”. A classic example of a chaotic system is the double pendulum. Note that, although the three pendulums start in almost the same place, it takes only a few seconds of motion for them to wind up in completely different positions. This is characteristic of chaotic systems: they are predictable for a short horizon, and then transition into a region where prediction is fundamentally impossible — assuming any non-zero amount of error in measurement of the initial states or computation of the forward dynamics.
Consider the following game, played on a double-pendulum. You have the ability to apply a small amount of torque to the arms at each moment, but doing so costs money. Your goal is that, when the clock strikes 10, the tip of the pendulum should be as high up on the screen as possible. The higher it is, the more I pay out. What is your strategy?
One incorrect answer is to give it just one small push at the start, carefully positioning the pendulum such that its subsequent swings will carry the tip to the top of the screen at the moment the clock strikes 10. Although such an action is guaranteed to exist, this strategy cannot work in practice, since accurate prediction over that horizon-length is fundamentally impossible.2 Prediction error simply accumulates too quickly for us to be able to determine the correct set of pushes. Another incorrect answer is to always be swinging the pendulum upwards. This would succeed at making it end up higher, but you’d be wasting a lot of money. Only in the last moments of pushing, close enough to the end-state that the remaining trajectory is predictable, do we know how to push in a way that will actually improve the final outcome. Therefore, the correct answer3 is to wait until ~9 seconds have elapsed, then push in the direction that causes the pendulum to swing upwards.
This example gestures at a more general principle. In chaotic systems, the distant future is hard to predict, and so when actions are costly, we should only take actions that yield immediate (i.e. predictable) benefits. A butterfly, flapping its wings, might instigate a chain-reaction leading a hurricane, but that’s not why the butterfly flaps: it only wants to get closer to a flower.
Circling back to the discussion on AI x-risk: my argument is that it is not useful to worry about an AI apocalypse because such an event is, relative to our current knowledge, insufficiently predictable. Scaling current techniques will not take us to ASI; we will need at least one more revolutionary paradigm-shifting breakthrough. That means that a world-ending ASI takeover is simply out of the predictable-regime where interventions can be useful, and in the chaotic-regime where they are necessarily wasteful. Worrying about the apocalypse is pointless; attempts to head it off are just throwing away time, effort, and resources; and a moratorium on progress “until we have figured out alignment” is effectively an infinite moratorium, because we are still too far away from any possibility of an ASI takeover to ever figure out how to intervene to prevent it. The better strategy is to just keep these concerns on the backburner unless an AI apocalypse becomes sufficiently imminent that we can actually predict it.
Until then, there’s no reason to worry.
4. Working On AI Alignment
While it’s premature to worry about an AI apocalypse, that does not mean I am dismissive of all AI Alignment research. Far from it: AI is a powerful and transformative technology, and its advent raises some real concerns. Similar technological developments like agriculture, the car, electricity, the printing press, etc., all caused major societal changes, the consequences of which were not all pleasant (especially in the short term). These more immediate consequences absolutely can & should be addressed before AI becomes more widely integrated into society, and to the extent that “AI alignment” serves as a shorthand for people interested in making sure that these impacts are positive, I think that there is a lot of value in this work.
To engage in a bit of armchair sociology, I would characterize modern AI alignment researchers as more-or-less falling into two broad camps:
Strong AI Doomers, like Eliezer Yudkowsky, are mostly concerned with an AI apocalypse, where a superintelligent AI emerges and destroys humanity in the pursuit of its own objectives.
Weak AI Doomers, like Paul Christiano, are more concerned with a gradual transfer of power and influence from the hands of humans into the hands of AIs. This transfer seems inevitable, and they worry that it will wind up with powerful AIs that make things worse for humanity — potentially (in the long run) driving us to extinction, or at least transforming humanity into something unrecognizable.
Most modern AI Alignment research comes from the Weak Doomer camp. Strong Doomers are still around, and consume a fair amount of mindshare, but almost no serious researchers are working in this direction.
These two groups seem to me to have a somewhat symbiotic relationship: the reasonable goals of the Weak Doomers add legitimacy to the grandiose hypotheticals of the Strong Doomers, who in turn provide the research of the Weak Doomers with a sense of momentousness and urgency. One way to view the state of discourse is as a distributed, community-wide motte-and-bailey. The bailey is occupied by the Strong Doomers, who make claims like “this is the most important existential risk facing humanity” and “we only have one chance to get this right, a single error could doom us” — claims that are sensible only in the context of a sudden ASI-driven apocalypse. The motte is occupied by the much-more-reasonable Weak Doomers, who say things like “we should be careful about giving too much leverage to systems that we do not completely understand or control”.
It can be confusing for onlookers when the perspectives of these two camps are conflated. For example, to the Weak Doomers, it is not the case that we only have one shot to solve alignment: if we give some AI a little too much power, and it does something “misaligned”, the world will not end, and we have an opportunity to diagnose and repair the issue. Thus the field of AI Alignment, as practiced by Weak Doomers, is amenable to the same iterative approaches as every other field of engineering — we can make progress towards safety by trial-and-error, including occasional mistakes. But it’s easy to miss this nuance when Strong Doomers use the same words to describe a world-view in which this is impossible (because a single misaligned superintelligence could immediately destroy us).
Similarly, Weak Doomer alignment researchers are clearly making real progress towards important near-term safety concerns, but it seems unlikely to me that any is having a meaningful impact on mitigating ASI-apocalypse-related x-risk. It’s not impossible that some of the current research ends up having that effect — just as it’s not impossible that a particular flap of a butterfly’s wings winds up causing a hurricane. But the reason to pursue this research is because of its potential to hugely benefit humanity by deploying the power of deep learning to automate the world while avoiding negative externalities, not because of its speculative distant-future impacts.
5. Conclusion
To my eye, the fact that ASI x-risk sits on the other side of a paradigm shift implies that it falls into the category of things that are outside of the predictable future. As such, I do not think it makes sense to worry about it right now.
The easiest way to convince me to worry about x-risk would be to argue that this implication does not hold, for example by highlighting historical scenarios where an intentional intervention pre-paradigm-shift preemptively fixed a major issue that only emerged post-paradigm-shift.
Other ways to change my mind would be to convince me that a superintelligence explosion is possible without a breakthrough; or to convince me that I am wrong in my belief that, in an environment with chaotic dynamics, it is useless to intervene far in advance. But these claims are more technical/objective, so it seems less likely that I am wrong here.
Some people like to respond to this with the claim that AI takeover is the only scenario in which humanity is completely wiped out, rather than just set back. But perhaps some variant of the dark forest hypothesis is true, and humanity has already been far too noisy; our only hope of survival is to have an AI ready to defend us before the inevitable attack arrives. Or, perhaps non-renewable resources (e.g. coal, oil, lithium) are necessary to bootstrap an industrial revolution, and we have already exhausted too many of the easily-accessible sources to repeat this feat; if this is true, any risk of the destruction of civilization-but-not-humanity just dooms the survivors to a best-case outcome of eventual death-by-supernova, and so in the end, destroys humanity.
Obviously, this is subject to choices about the dynamics & measurement errors of the pendulum in question. It’s easy to find settings where the prediction error dwarfs any possible gains from acting, so I’m not going to bother filling in the details needed to make this example concrete.
Correct to a handwavy first-order approximation, at least.
I'm skeptical of placing too much importance on the historical record. Examples of preventions gone wrong are legible in ways that successful interventions would not be. What are the consequences of developing atomic and hydrogen weapons in secret? Who knows, but it's not difficult to imagine a scenario where a different decision could have led to catastrophe. Yet the lack of a real historical counterfactual limits how persuasive that can be. Prediction uncertainty applies in both directions. You can't retroactively predict the disasters that would have happened if proper design thinking and scenario planning never happened.