2026-06-30 04:00 UTCIn-site rewrite6 min readUpdated: 2026-06-30 04:29 UTC

‘There’s this deep mystery of what, actually, is this thing?’: the philosopher inside Google DeepMind

Since 2017, Iason Gabriel has worked at the tech giant, trying to anticipate – and think through – the impact of AI. But as commercial and geopolitical pressures escalate, can ethicists make any difference?

SourceThe Guardian AIAuthor: Robert P Baird

Illustration: Deena So'Oteh/The Guardian

In 2017, a 33-year-old political philosopher named Iason Gabriel was told by a friend that he ought to apply for a job at DeepMind, the London-based subsidiary of Google where much of its AI research was concentrated. The suggestion was not an obvious one.

Gabriel was a cheerful but intense junior academic with a passion for Vipassana meditation and what his brother calls “enthusiastic” rock climbing. The eldest son of a Greek management professor and a British documentary maker, Gabriel split his time between teaching and international development work. At the University of Oxford, where he was a fellow at St John’s College, Gabriel taught courses on political theory and wrote papers on the moral contortions of “yuppie ethics” and the ethical blind spots of effective altruism. When he wasn’t there, he did crisis work for the United Nations Development Programme in Sudan and Lebanon.

DeepMind, meanwhile, was the world’s leading AI research lab. In part, this was because it had the financial and computational backing of Google, which had bought the company in 2014 for $650m. In part, it was because DeepMind had recently shown it could put those resources to stunning use. In Seoul, in 2016, a DeepMind system called AlphaGo defeated Lee Sedol, a South Korean Go champion, in a five-game match. The victory was significant not least because of Go’s legendary complexity; the game has more possible configurations than there are atoms in the universe.

Thanks to the fuss around AlphaGo, Gabriel was aware of DeepMind. Still, he found his friend’s suggestion puzzling: why did a company that made game-playing robots need an ethicist? The answer, as he soon learned, was that the company had its sights set much higher than Go. DeepMind was founded in 2010 by three men – Demis Hassabis, Shane Legg and Mustafa Suleyman – who believed that it must be possible to develop artificial general intelligence, or AGI. By this they meant computer systems that could match, and maybe surpass, human cognitive capabilities. When they started the company, this was not a popular view: to speak of AI, let alone AGI, was considered by many a sign of fatal unseriousness. Hassabis, Legg and Suleyman were undeterred. Their ambition, as they liked to say, was to “solve intelligence, and then solve everything else”.

For the DeepMind founders, it was clear that such an achievement would have widespread consequences. In 1999, when Legg was fresh out of university, he estimated that AGI would arrive somewhere between 2025 and 2028, a prediction he maintained in the face of much mockery for three decades. In his dissertation, completed in 2008, he insisted that society could not afford to wait until AGI was technically feasible to consider its effects: “We need to be seriously working on these things now.” More recently, Legg told me it was “obvious” why the company needed people like Gabriel on staff: “If you’re making some widget, and it’s probably not going to change the world, then maybe you don’t need a moral philosopher. But if you take AGI seriously, then I can’t really see how you wouldn’t consider this sort of thing as important.”

Lee Sedol, bottom right, reviews one of his matches against AlphaGo with fellow professional Go players in March 2016. Photograph: Lee Jin-man/AP

After starting at DeepMind in 2017, Gabriel was, for a time, the only active philosopher working at a frontier AI lab. He quickly discovered that his background in moral philosophy and political theory gave him an unusual perspective in an industry dominated by engineers. Over the past decade, he has assembled a body of work that tracked, and in many cases predicted, the ethical challenges created by the surprising success of large language models (LLMs).

As Dylan Hadfield-Menell, who leads the Algorithmic Alignment Group at MIT, told me, Gabriel was “the right person meeting the moment. As the field was ready to mature and move into prime time, he figured out a way to broaden the horizons without attacking or denigrating the work that came before.”

More generally, Gabriel has been a leading advocate for the idea that the current wave of AI development demands not just new technical vocabularies but also new ways of thinking about our relationship to technology, and even to ourselves. As he put it to me recently, in one of several long conversations we’ve had over the past few months, “I can take any technological artefact and ask: is it wise? Is it just? Is it caring? And the answer is no. But the depth of the question when it comes to AI – including what kind of ethics is appropriate to it – is hard to overstate. I sometimes feel like it’s very hard to look at AI directly. There’s this deep mystery there, which is: but what actually is this thing? We have a very literal answer, but the literal answer doesn’t seem to necessarily provide a moral answer.”

2 [1341]***

By the time Gabriel joined DeepMind, there were, roughly speaking, two distinct and often antagonistic approaches to questions about the social and ethical implications of AI. These approaches, sometimes classed under the headings of AI safety and AI ethics, were divided by a disagreement about the feasibility of the technology.

Like the DeepMind founders, the AI safety contingent believed that human-grade machine intelligence was not only possible but imminent. The urgent task, as they saw things, was to make sure that AI systems didn’t go awry. They took inspiration from a 1960 essay by Norbert Wiener, an American mathematician and computer scientist, who argued that humans and computers are “essentially foreign to each other”. Because machines can operate much faster than people, Wiener said, “we had better be quite sure that the purpose put into the machine is the purpose which we really desire and not merely a colourful imitation of it”.

The challenge Wiener described – getting a machine to act in the way its users intended – became known as the alignment problem. At some level, alignment is an issue for every technology, but as Wiener recognised, it was particularly pressing for machines designed to act autonomously. It was also particularly difficult for AI systems trained to mathematically optimise some reward signal, a process known as reinforcement learning.

A classic example was reported in 2016 by Dario Amodei and Jack Clark, who worked at OpenAI and later founded Anthropic with five others. Amodei and Clark described an AI system designed to play a boat-racing video game. The developers wanted the AI to learn to beat the game, so they programmed it to maximise its score. Instead of working its way through each successive level, however, the AI racked up a high score by looping endlessly around a lagoon where it found a trio of regenerating targets. The basic trouble was the one Wiener had predicted: the machine’s goal was imperfectly aligned with the developers’.

More dire versions of the problem were also contemplated. On forums such as LessWrong, which was started by the autodidact AI researcher Eliezer Yudkowsky, and in books such as Superintelligence, which was published in 2014 by the philosopher Nick Bostrom, there was speculation that a machine-intelligence explosion could result in an uncontrollable AI. If such an agent were even slightly misaligned, the consequences could be disastrous. In one imaginary example cited by Bostrom, a superintelligent AI is asked to evaluate the Riemann hypothesis, one of the most important unsolved problems in mathematics. In the course of trying to accomplish this task, the AI decides to rearrange the solar system – “including the atoms in the bodies of whomever once cared about the answer” – to maximise the resources it needs to attack the problem.

Bostrom’s insistence that aligning superintelligent AI was “quite possibly the most important and most daunting challenge humanity has ever faced” captivated technofuturists in Silicon Valley. (Sam Altman praised the book, as did Elon Musk.) His fears were also shared by a small but loquacious community of effective altruists and self-described rationalists who saw statistics as the proper measure of morality. Many people in this community held a “long-termist” perspective that factored the wellbeing of humans born in the future – even thousands of years into the future – into their moral equations. For them it was simple maths that even a small chance of a species-ending disaster was more urgent than any number of likelier, but less catastrophic, dangers.

By contrast with the AI safety crowd, the academics and technologists associated with the AI ethics tendency saw the spectre of rogue robots and existential risk as a distraction from present-day harms. Drawing inspiration from the critical race theorist Kimberlé Crenshaw and the political theorist (and former rock critic) Langdon Winner, among others, they took fairness, accountability and transparency as their watchwords and insisted that the dangers of technology could not be avoided by merely technical means. What was needed, they argued, were social, cultural and political solutions.

A central concern of this latter tendency was algorithmic bias, of the sort that affected facial-recognition and predictive-policing software. In 2017, a team led by Joy Buolamwini, of the MIT Media Lab, launched Gender Shades, a project that demonstrated systemic biases in commercial facial-recognition software. “Automated systems are not inherently neutral,” Buolamwini wrote in the online introduction. “They reflect the priorities, preferences and prejudices – the coded gaze – of those who have the power to mould artificial intelligence.”

The division between the safety and ethics camps was often pronounced. “You’d meet up with people and they’d ask: ‘Are you worried about near-term problems or long-term problems?’” Hadfield-Menell says. “The long-term concern was a euphemism for existential risk – essentially superhuman systems. Near-term meant you’re worried about biased facial recognition and the things studied within the AI ethics community.”

He noted, too, that the conflicts between the two groups often seemed to have as much to do with sociology as they did with ideas. “You can’t really separate AI safety from its origins among LessWrong and some of those communities, which were often openly disdainful of a lot of the more ‘woke’ academics, for lack of a better term. At the same time, the fairness, accountability and transparency community had a lot of open disdain for people who were worried about advanced AI. The reason why it was being talked about on LessWrong, and not at academic conferences, is because if you were an academic researcher in 2010 and you talked about AI systems getting smarter than humans and becoming catastrophically misaligned, you were a crank who didn’t actually understand the technology.”

Gabriel’s first major research project at DeepMind was a 2020 paper that straddled the concerns of both camps. The paper took the alignment problem seriously, but it also insisted that alignment had ethical and political implications that went beyond the technical challenges. As difficult as it might be to get a machine to act in accordance with some set of values, Gabriel argued, it was much harder to choose those values in the first place. “Given that we live in a pluralistic world that is full of competing conceptions of value,” he asked, “how are we to decide which principles or objectives to encode in AI – and who has the right to make these decisions?”

Hannah Rose Kirk, an A

[truncated for AI cost control]