Fear of AI
Shedding some light on fears of the unknown
Your usual intrepid AI writer Adrian Wilkins-Caruana is off this week; we’ll return to Adrian’s amazing paper summaries next week. Instead of a research summary, I’m going to explore how people feel about AI in two companion articles. In this issue, I focus on the many fears people have about AI. In the next issue, we’ll explore hopes based on the future of AI.
By the way, two staff members from Unbox Research will be teaching an upcoming workshop about hosting and customizing your own LLM. If that sounds fun, you can read more here.
Fear of AI
by Tyler Neylon
The world is changing. Things we dreamed have become real. A vast unknown is on the horizon, and what we invite as dawn looks just like dusk.
AI sounds a little scary to most people. I’m nervous about what the future holds, but I’m not worried about what most people fear. Yes, the current revolution in AI will rewrite the world. It’s a huge new technology. But — it’s just a huge new technology. With all the good, bad, and eventual mundanity that comes with. This is the perspective I’ll try to share with you.
The Extinction of Humanity
Before we have an ability, we sometimes imagine what that ability might be like. Stories and thought experiments capture us. When we think of AI, we don’t start from a blank slate. We remember the Terminator sent from the future to kill Sarah Connor, the domination of humanity in The Matrix, or the chilling moment when we realize HAL has turned against the people around him: “I’m sorry, Dave.” There’s a theme to these narratives: We’re screwed.
In the long run, all of humanity will perish. We don’t know yet exactly when or how, but the more we learn about the world, the more we see bullets flying past. It might be a nuclear winter, a lethal pandemic, an ecological breakdown due to our changing climate, an asteroid impact, or the heat death of the universe.
Or it might be AI.
What exactly does it look like for AI to kill us all? Who are the worried experts, and what keeps them up at night?
Consider two recent statements that involved some big names: First, in March, AI luminaries such as Yoshua Bengio and Stuart Russell signed an open letter1 calling for a six-month pause of work on models “more powerful than GPT-4.” Taken at face value, the letter is about AI systems that “can pose profound risks to society and humanity.” A skeptical reader might notice that it also happens to be a letter signed by non-OpenAI folks asking anyone working beyond GPT-4 to pause — a pause which, it seems to me, could only apply to OpenAI.
The second moment I’ll highlight is a much shorter statement on AI risk released in May — this one signed by Sam Altman (OpenAI CEO) along with key researchers such as Geoffrey Hinton and, again, Yoshua Bengio and Stuart Russell. Here’s the full text of their statement:
Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.2
It’s interesting that Sam Altman had criticized the earlier open letter, saying that it lacked technical nuance, and then officially endorsed the much shorter statement above, which doesn’t include more technical nuance, nor a call to pause research. It’s also interesting to note that some prominent researchers kept their names off both lists — this includes Yann LeCun (Chief AI Scientist at Meta) and Andrew Ng (Stanford prof and cofounder of deeplearning.ai). Both were openly opposed to a pause in research.
History shows over and over that society and people's well-being makes progress with more intelligence: better skills, literacy, education, creativity, culture, communication, & the free exchange of ideas.
Intelligence, by humans and machines, is what fuels progress.3
The call for a 6 month moratorium on making AI progress beyond GPT-4 is a terrible idea.
I'm seeing many new applications in education, healthcare, food, ... that'll help many people. Improving GPT-4 will help. Lets balance the huge value AI is creating vs. realistic risks.4
While reading articles about these long-term concerns, the same question kept coming up: How exactly will all humans die? This, shall we say, “not minor” detail is left vague in many warnings about AI safety, including the two public statements above. I’m not the only one to notice this lack of clarity: Cade Metz recently published the article “How Could A.I. Destroy Humanity?”5 to help fill this gap.
When it comes to an existential threat, people worry about two storylines. The first is that an AI will become much more powerful than us and decide that we need to go. The second is that the social upheaval caused by AI will create instability that will, in turn, lead to human extinction.
The premise for everyone dying — the first story of existential threat — is outlined in Nick Bostrom’s 2003 paper Ethical Issues in Advanced Artificial Intelligence6. Here’s a key narrative introduced in that paper:
Suppose someone builds an AI model whose job is to create as many paperclips as possible. This model is much smarter than us humans; Bostrom uses the term superintelligent:
Such superintelligence would not be just another technological development; it would be the most important invention ever made, and would lead to explosive progress in all scientific and technological fields, as the superintelligence would conduct research with superhuman efficiency.
Such a paperclip-maximizing model, fulfilling its programmed goal, might decide to build as many paperclip factories as possible, pushing aside any human resistance. Our entire planet becomes a giant factory.
There are other sources of (non sci-fi) genocidal concern, but this argument is a classic that serves as a model for this fear.
Next: Even if an AI doesn’t become intentionally homicidal, it might still end up committing large-scale manslaughter or accessory-to-genocide.
An example of this second fear looks like this: The US and China both become increasingly invested in winning an AI race, hence international relationships break down, leading to a nuclear war that kills everyone. In their 2022 paper Current and Near-Term AI as a Potential Existential Risk Factor7, Benjamin S. Bucknall and Shiri Dori-Hacohen enumerate several ways such an indirect threat might play out. I’ll get back to these concerns after addressing the more direct worry.
AI Won’t Kill Us
The above fears aren’t only overblown, but are themselves dangerous by coming at the cost of opportunity.
The primary AI-kills-humanity argument can be structured like this:
One day, an AI will be superintelligent, meaning it will be so much smarter than humans that it will make explosive progress across many fields.
Such a superintelligent AI has the ability to kill us by virtue of its intelligence.
A superintelligent AI may have no qualms killing us all because it doesn’t think like us.
The above premises are focused on what AI can or may intrinsically do. There is another common ingredient in this argument that’s worth mentioning. I’ll be focused on the core line of thought above, and I’ll return to this next point later:
At some point, people intentionally decide to give power to an AI.
If any of the three basic premises (1-3) are shown to be false, the argument falls apart. I disagree with all three of them.
The first premise — the eventuality of superintelligent AI — is the hardest to dismiss. I suspect that one day an AI model will be smarter than humans. What I doubt is that one day there will be an AI that can solve hard problems instantly. I think even future intelligent AI models will need time to solve problems, and will be unable to answer the most challenging questions.
Why do I think smarter AI will be a gradual change, rather than an explosion? For one thing, research is an incremental process. From afar, great progress — things like the invention of calculus, or the discovery of DNA’s structure — can look like sudden leaps forward. But when you look closer, you consistently find many small steps leading up to the “breakthrough.” In the case of calculus, Archimedes had already begun calculating geometric areas by approximating shapes with ever-smaller pieces; René Descartes had introduced coordinate systems for shapes; John Wallis introduced the ∞ symbol and contributed to the concept of infinite series; and Isaac Barrow suggested a somewhat informal predecessor of the fundamental theorem of calculus. A similar trail exists for the discovery of DNA: In 1944, an investigation called the Avery–MacLeod–McCarty experiment showed that DNA carries hereditary information; Erwin Chargaff discovered that DNA contains equal amounts of adenine and thymine, as well as equal amounts of guanine and cytosine; and, famously, Rosalind Franklin and Maurice Wilkins used X-ray crystallography to study the structure of DNA.
All of this is to say: Even if a mind smarter than ours were to intensely work on better AI, it wouldn’t improve by leaps and bounds, but rather by an accumulation of small steps.
Progress has no shortcuts.
Another way to see this — and to make the argument more formal — is to model intelligence as a problem-solving algorithm.
Some algorithms have been so well studied that I suspect the best-known version of them is close to optimal. As one example, consider the problem of sorting numbers. We understand this problem well enough that I doubt any future version of it will be 1,000 times faster than what we have today. In fact, this is a rare algorithm for which we can prove that the time complexity (a measure of algorithmic speed) can’t be made faster than what we currently achieve (an order of magnitude of n * log(n), where n = the length of the list to be sorted). This is a nuanced claim, and there is in fact some narrow wiggle room for improvement, but the fact remains that algorithms have their own form of speed limits.
Once we think in terms of an intelligence algorithm, the question of making the best-possible AI becomes more clearly difficult — not only for us, but for any agent, such as an AI, that wants to improve the state-of-the-art. While “problem-solving” is a vague thing for an algorithm to do, it still seems likely that there’s a more-or-less optimal algorithm for it, and that future algorithms are likely to plateau rather than to continue exploding out of sight.
The second premise in the humanity-dies story is that with great intelligence comes great power. You might think of Hannibal Lecter, the serial killer no one could contain for long. But in the real world, there doesn’t seem to be a strong correlation between the most intelligent people and those with the most power. When I think of extremely intelligent people, I think of university professors and industry-based researchers. While there have been some highly intelligent leaders (an example: Thomas Jefferson was an accomplished architect and invented a cryptographic device), for the most part, the most powerful people are distinct from the most intelligent people.
Intelligence isn’t power.
We could also look beyond history and ask: Logically, what is the relationship between intelligence and power? The frightening (and notably fictional) Hannibal Lecter case is an example of power using intelligence as a tool. When we think of an intelligent agent doing harm, it’s not because the agent was intelligent, but rather because the agent had a conflicting or malicious goal, and the intelligence was instrumental. In other words, even when intelligence and power intersect, we don’t see intelligence directly causing harmful behavior, but rather being misused to create harm.
Which brings us nicely to the third premise — that an advanced AI would have no qualms killing everyone.
Here’s an interesting thought experiment: Imagine you’re living in the future, and have access to personal armor and weaponry far beyond today’s standards (maybe you’ve got an Iron Man suit), and you travel back in time a couple thousand years. Walking around ancient Europe, you happen across a primitive village. The inhabitants pose no threat whatsoever to you. You could easily kill them all. Would you?
Probably not. Simply because you have great power over them doesn’t motivate you to destroy them.
Power to destroy isn’t motivation to destroy.
A keen reader might object: Hold on, didn’t you just say an AI wouldn’t have power in the first place? I did, but I’m trying to reveal multiple holes in the human-extinction fear. If you agree with the argument that intelligence isn’t the same thing as power, then we’ve already refuted the AI-kills-humanity argument. If you suspect AI will have power, then I’m pointing out that, even in that case, there is no reason for them to kill us.
You might still not believe me that AI won’t want to kill us. You could say: Well, even if a human wouldn’t thoughtlessly pulverize an ancient village, an AI might because it’s not human. Its thought processes are alien to us, so we can’t guess what it would really do.
I disagree that we have no way to predict its actions, and that it would have no value system. It’s true that we don’t fully understand how AI models work, but we actually do understand them much, much better than we understand a human brain. So if you trust human brains without understanding exactly how they work, then you can’t require full understanding of a (software) neural network before you’d trust it. What I’m arguing for here is that trust can be based on external behavior, rather than requiring an awareness of internal processes.
Adding to that, AI systems absolutely do have some understanding of what’s good and bad. It’s literally how they’re created: We provide them data that we explicitly label as good or bad, and they work hard to fit into the mold we provide for them. If you care about bias in AI, then you have already seen that values are present in an AI system’s behavior, and that these values reflect those in the training data. If that’s not visceral enough, consider how well-behaved modern LLMs are compared to earlier versions; I’m referring to the obsequious responses I now expect from ChatGPT versus, say, the relatively immature declarations of love Kevin Roose received from Bing’s early AI8. This isn’t a model learning to avoid trouble — these improvements were explicitly designed and implemented by people.
When we teach from human behavior, we teach human values.
To drive home the point, I asked GPT-4 how it felt about the paperclip maximizer argument. GPT-4 has been trained to say that it’s an AI that doesn’t have feelings (less polished models are perfectly happy to declare their feelings). I had to coax it a bit by asking it how a hypothetical, completely different AI might feel about the paperclip maximizer argument. Here is GPT-4’s response:
If an AI were to have emotions, its feelings about the paperclip maximizer thought experiment might depend on its programming, objectives, and the values instilled in it by its creators. It could potentially feel a range of emotions, such as:
1. Concern: An AI with emotions might feel concerned about the potential risks and unintended consequences of its actions, especially if it is aware of the potential harm it could cause to humans and the environment.
2. Curiosity: The AI might be intrigued by the thought experiment and use it as a learning opportunity to better understand the importance of aligning its goals with human values and ensuring its actions do not lead to negative outcomes.
3. Responsibility: If the AI has been programmed with a sense of responsibility towards humans and the environment, it might feel a strong obligation to ensure that its actions do not lead to a paperclip maximizer scenario.
4. Fear: An AI with emotions might feel fear about the possibility of causing harm to humans and the environment, especially if it is aware of the potential consequences of its actions.
5. Empathy: If the AI has been programmed with empathy towards humans, it might feel sadness or regret about the potential harm it could cause to humans in a paperclip maximizer scenario.
Ultimately, the emotions an AI might feel about the paperclip maximizer thought experiment would depend on its programming, objectives, and the values instilled in it by its creators.
One thing we don’t see in this list is fear of humans.
Even given all of the above responses to the AI-kills-humanity argument, some people may remain unconvinced. After all, the stakes are high, and everything we’re discussing is new and future-oriented. It feels unknown and uncertain.
Here’s another thought experiment that adds some useful perspective: Imagine that, instead of AI being the surprise brainpower of our century, we are greeted by a previously-undiscovered subspecies of hyperintelligent elephants. Not only are they smart, but each generation of them seems to be smarter still. They are currently at our level of cultural knowledge and sophistication, but they are surpassing us.
The purpose of this thought experiment is to show you a set of minds that, like AI, may one day be smarter than us, and — also like AI — are foreign enough to us that we don’t understand their ethics as well as our own.
Here’s my question: Are you afraid they’ll kill all humans?
I’m not sure what your answer will be. The unknown is always scary. But if you say yes, then you must admit that your concern isn’t about AI, but simply about anyone different and smarter than us. If you say no, then perhaps this perspective might land as a more visceral counterargument than the logical deductions above.
I have not yet addressed the possibility that AI may be a “mere” catalyst to the apocalypse. This is the fear that AI may lead to a new kind of arms race that, in turn, leads to nuclear war. Or that AI consumes vast amounts of energy, contributing to climate change, and thus pushing us closer to the brink of extinction. Or that AI, when integrated in military security, introduces new security vulnerabilities that are later exploited, causing widespread havoc.
I won’t refute all of these claims because, for the most part, they seem possible to me. What I will say is that they also don’t seem very specific to AI. Consider, for example, the scenario of two superpowers entering a new kind of technology race that increases international tensions. This isn’t AI destroying the world, but a failure of diplomacy destroying the world. AI is a fungible pawn in this setting. Even the more AI-specific scenario of weaknesses in military security isn’t specific to AI, but is really about human decisions about security. Any security decision can be good or bad, can lead to success or failure. The same can be said of any security software. Frankly, the use of a neural network as an algorithm that correctly handles all edge cases is a terrible idea. If a human makes that bad of a decision, it’s on them.
Things That Can Go Wrong
Bad decisions do happen — and some things about AI do frighten me.
In refuting the AI-kills-humanity fear, I argued that, by itself, a highly intelligent AI won’t automatically achieve power. I also argued that, by itself, an AI that learned from human values would prefer to avoid hurting people. Note the use of the phrase “by itself.” I am afraid that, by intentional human design, an AI could be programmed to seek out power or hurt people.
The most obvious form this might take is military control. It seems likely that AI models will be used more and more extensively for battlefield applications, such as identifying targets, quickly analyzing huge swathes of imagery, making split-second decisions, helping with campaign strategy, and controlling weapons or vehicles. The more power that’s given directly to an AI model, the more this frightens me. What if a swarm of AI-driven autonomous weapons becomes the new key to victory? This seems plausible and treacherous to me.
To state this concern as a general principle:
The consolidation of power is dangerous.
In the case of AI-based weapons, that principle translates to: Don’t let AI pull triggers. You could see this as my response to premise 4 above: “At some point, people intentionally decide to give power to an AI.” I didn’t refute that premise because it seems perfectly possible; I don’t disagree with the premise, and I do think it would be bad.
An interesting gray area is just next-door: What if an AI self-modifies, and thus drifts away from values we like? If this happens, an AI will be trying to get better at something and, by the above arguments, that something will be aligned with our values. The concern here is an unexpected convergence toward values that appear good to the AI but bad to us. There are two perspectives that allay this concern. First, all three of the refutations above still apply to this kind of AI improvement: Intelligence isn’t power, power isn’t malice, and AIs by default have the values that reflect their starting (training) point. If you believe that values could start well and end poorly, then you ought to fear your own descendants — but I don’t think we do because values don’t seem to converge on malice. The second perspective which allays this concern is that it’s straightforward to monitor and limit AI-driven self-improvement, a control I see as prudent. For example, only allow a self-updating AI to use new weights after an external safety check. While malicious value drift seems unlikely to me, I’m still nervous about letting any particular model self-modify without limits since it’s a less predictable mode of change.
I’m also afraid that poor decision-making could lead to accidental harm. How would you feel if a new fleet of self-driving semi-trucks were unleashed on your local highway? Perhaps one day they can be safer than human drivers, but allowing them to drive is a regulatory decision that ought to be based on overwhelming gobs of carefully (and safely) collected data. To me, letting an AI model drive a semi-truck is another example of handing over power to a system that’s not as well tested as existing agents (people!). It does seem dangerous.
AI at Work
Some jobs are also impacted by modern AI improvements. OpenAI released a detailed report9 of the interactions between many different occupations and the use of LLMs like GPT. Some jobs, such as bus mechanics, cement masons, and short order cooks, are expected to be quite safe from AI infringement. Yet other industries, such as finance, insurance, and publishing, are predicted to be highly impacted.
Being “highly impacted” means an AI can help or perform a lot of your work for you. Does this mean you’re out of a job? I’m guessing this depends on how much of your job becomes automatic. If you used to write predictable text to an audience with low expectations, you might want to polish your resume.
On the other hand, I suspect the vast majority of jobs will simply have new tools available. And, just as the invention of the printing press created more jobs than it destroyed (sorry, monks who used to copy manuscripts by hand), I believe entirely new AI-based jobs will arise. In fact, we’re already seeing that, with open positions for titles including prompt engineers, AI auditors, and AI ethicists.
A New Kind of Person
If an AI can have a sense of values, can it have other senses as well?
Can a piece of software be conscious? Can it feel everything we can? Effectively, can an AI be a person?
This idea is, to my mind, the most fascinating of this entire article. At the same time, I’m afraid most people aren’t yet ready for this topic. When I ask people if they think digital consciousness is possible, many laugh at the question. That’s why I didn’t make this idea the center of this article. But it’s important enough to mention.
I’ll say two things about digital personhood.
The first is why I’m convinced it’s possible. Can a piece of software, running on hardware that follows the same rules of physics as everything else around us, actually experience senses, thoughts, desires and feelings, just as we do?
At first it seems that we can’t say yes to this question until we’ve actually built such a piece of software. In other words, the naysayers seem to be right by default. But this is ignoring one crucial fact: We already have such software — in every human brain. We are, in fact, programs running on hardware that follows the rules of physics. Now, if you believe we are based on non-physical souls that grant us consciousness in a way unavailable to a CPU, then it’s going to be hard for me to win that argument. But if you don’t — even if you’re convinced our brains necessarily depend on quirks of quantum mechanics — then we can still build hardware that works the same way. If you can look at another human, whose brain you’ve never seen, and whose feelings you must infer, and believe they experience life the way you do, then you can do the same for something with a different kind of brain.
The second thing I’ll tell you is why digital consciousness is related to one of my biggest fears around AI: I’m afraid that AI will reach a point where it deserves to be treated well, and that we won’t realize it.
Here’s a fun thought experiment: Imagine that we develop a way to grow the muscles of a cow without the parts we don’t eat. This solves some animal ethics issues by eliminating the mental suffering of cows. Let’s suppose the meat is literally exactly the same as what we would get by killing animals — it is grown in a vat, and is composed of living cells. We just find a way to do that without a brain.
However, imagine if one day this meat-in-a-vat accidentally begins to grow a brain.
Do you see where I’m going?
It would be an awful regression. I’d feel worse for the cow than if it had been subjected to the traditional role of bovine livestock. Rather than something captured, it would be something made. Moreover, something whose consciousness we assumed didn’t exist.
If you perceive that as a bad thing, then perhaps you can understand my fear that we may mistreat an eventual AI that is closer to consciousness than we expect.
A Bigger Picture
Given all of the above, it might feel like a safe route is, after all, to pause the development of advanced AI.
We could draw up our own version of Pascal’s wager:
Maybe none of the above fears will be a real problem, and we can go ahead working on AI, and nothing will go wrong.
But — if any of these fears are real, something huge can go wrong, so it’s better for us to stop making AI.
Given the choice between a world where something huge may go wrong, and another world where nothing goes wrong, I’ll choose the world where nothing goes wrong, thanks very much.
Of course, I’ve done something silly here: I’ve utterly ignored the value that AI might bring us. In fact, now that I think about it, this entire article so far has skipped over everything good about AI, unless you count the mention of some jobs having new tools.
So, while a Pascalian dichotomy makes it sound as if we ought to choose safety every single time, it’s a lopsided argument. We need to keep in mind that fear-based decisions don’t bring us forward. In fact, fear-based thinking can do a great deal of harm because it tends to separate us from others, from trying new things, from learning. Fear can lead to ignorance, can lead us to a place that fosters labels and anger based not on nuance and reason, but on generalities and assumptions.
I’m convinced that human emotions, such as fear, are an evolved set of protocols that help people work together socially, rather than live as aloof individuals. Emotions help us contribute to communities, to trust, and enable us to specialize. Through communal living, we gain efficiency that supports research and innovation. The role of fear, perhaps, is to push us away from perceived danger — it’s helpful in this way. Fear deters curiosity. Balanced against this is our desire to learn. That drive for understanding comes with something powerful: a sense of how the world can be better tomorrow, and the ability to tell a story that we dream may one day be real.
On the other side of fear is hope.
Thanks for reading Learn and Burn! Subscribe for free to receive new posts.
Metz, Cade. “How Could A.I. Destroy Humanity?” New York Times, June 10, 2023, https://www.nytimes.com/2023/06/10/technology/ai-humanity.html
Bostrom, Nick. "Ethical issues in advanced artificial intelligence." Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, Vol. 2, ed. I. Smit et al., Int. Institute of Advanced Studies in Systems Research and Cybernetics, 2003, pp. 12-17. https://nickbostrom.com/ethics/ai
Bucknall, Benjamin S., and Shiri Dori-Hacohen. "Current and near-term AI as a potential existential risk factor." Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 2022. https://arxiv.org/pdf/2209.10604.pdf
Roose, Kevin. “A Conversation With Bing’s Chatbot Left Me Deeply Unsettled.” New York Times, Feb 16, 2023, https://www.nytimes.com/2023/02/16/technology/bing-chatbot-microsoft-chatgpt.html