FT. The race for an AI-powered personal assistant Google and OpenAI unveiled new tools to bring ‘intelligent systems’ a step closer.
Will this be a milestone for generative AI?
Madhumita Murgia in London 17 May
At Google’s Mountain View headquarters this week, a man clad in a rainbow-hued dressing gown emerged from a giant coffee cup to give a vibrant if somewhat surreal demonstration of the company’s latest achievements in generative AI. At the I/O event, electronic musician and YouTuber Marc Rebillet tinkered with an AI music tool that can generate synced tracks based on prompts like “viola” and “808 hip-hop beat”. The AI, he told developers, came up with ways to “fill in the sparser elements of my loops . . . It’s like having this weird friend that’s just like ‘try this, try that’.” What Rebillet was describing is an AI assistant, a personalised bot that is supposed to help you work, create or communicate better, and interface with the digital world on your behalf. This new class of products has stolen the limelight this week among a flurry of new AI developments from Google and its AI division DeepMind, as well as Microsoft-backed OpenAI. The companies simultaneously announced a series of upgraded AI tools that are “multimodal”, which means they can interpret voice, video, images and code in a single interface, and also carry out complex tasks like live translations or planning a family holiday. In a video demonstration, Google’s prototype AI assistant Astra, powered by its Gemini model, responded to voice commands based on an analysis of what it sees through a phone camera or when using a pair of smart glasses. It successfully identified sequences of code, suggested improvements to electrical circuit diagrams, recognised the King’s Cross area of London through the camera lens, and reminded the user where they had left their glasses.
Meanwhile, at OpenAI’s product launch on Monday, chief technology officer Mira Murati and her colleagues demonstrated how their new AI model, GPT4o, can perform voice translation in live conversation, and similarly interact with the user using an anthropomorphised tone and voice to parse text, images, video and code. “This is incredibly important because we’re looking at the future of interaction between ourselves and the machines,” Murati tells the FT. While smart assistants powered by AI have been in train for nearly a decade, these latest advances allow for smoother and more rapid voice interactions, and superior levels of understanding thanks to the large language models (LLMs) that power new AI models. Now, a fresh scramble is under way among tech groups to bring so-called AI agents out to consumers. These are best understood as “intelligent systems”, said Google chief executive Sundar Pichai this week, “that show reasoning, planning and memory, are able to ‘think’ multiple steps ahead, and work across software and systems, all to get something done on your behalf”. As well as Google and OpenAI, Apple is expected to be a major player in this race. Industry insiders anticipate that a significant upgrade to Apple’s voice assistant, Siri, is on the horizon, as the company rolls out new AI chips, designed in-house and capable of powering generative models on-device. Meta, meanwhile, has already launched an AI assistant on its platforms Facebook, Instagram and WhatsApp across more than a dozen countries in April. Start-ups like Rabbit and Humane are also attempting to enter the space by designing products that act as standalone AI helpers. Although analysts point out that this week’s big announcements remained largely “vapourware” — concepts rather than real products — it is clear to industry watchers that AI assistants or agents will be key to bringing the latest AI technology to the masses.
“It’s unquestionable, this is the moment for personal [artificial] intelligence,” says Mustafa Suleyman, CEO of Microsoft AI, who was not involved with either release this week. Suleyman previously founded Inflection, a start-up building a consumer-focused AI assistant known as Pi, which he left in March. “Silicon Valley has always framed tech as a functional utility — getting things done efficiently and fast. But it’s kind of incredible — these tools are now in the creative domain of the product makers,” he says. “The tech has matured enough that it’s a new kind of clay that we can all invent with and . . . we are seeing that coming to bear now.”
For nearly a decade, tech groups have been competing to bring AI to consumers through virtual assistants such as Apple’s Siri, Microsoft’s Cortana and Amazon’s Alexa, which is now embedded across a range of devices. Google, for instance, unveiled an AI Assistant back in 2016, with Pichai painting a picture of a post-smartphone world where intelligence is embedded in everything from speakers to glasses. But eight years on, the smartphone is still a primary consumer interface to the web. The big challenges to mass adoption have been latency, or slow responses from AI agents, as well as errors in their understanding and execution of human instructions and needs. The emergence in 2017 of the technology at the core of chatbots like ChatGPT, Gemini and Claude, known as the transformer, has vastly improved technologies underpinning AI assistants, such as natural language processing. But to build AI assistants that the public wants to use, “the killer feature is speed”, according to technology analyst Ben Thompson, who writes the influential industry newsletter Stratechery.
“When you cross the threshold of speed and latency, that’s when it’s fun. The delight . . . and playfulness when you’re getting that immediate feedback is so different than sitting around waiting . . . then it’s like a parlour trick,” he said on the podcast Sharp Tech this week. Thompson said he had noticed this in the context of Google and its AI search mode, known as the Search Generative Experience, which provides AI-generated answers to queries, alongside the traditional list of links. “It’s getting so fast and so consistent that I’m using it more, and frankly using ChatGPT less, not even on purpose,” he said. “Google knows this better than anyone — they know every millisecond makes a difference in how engaged people are.” But OpenAI’s flagship bot is no slouch. A version of its GPT4o model was able to fluidly translate between Italian and English in real time conversation. The model also displayed a conversational, albeit slightly flirtatious tone when chatting with the male engineers on stage. With OpenAI “the real improvements are in the user experience and the actual ChatGPT product”, Thompson said. “That is what it takes to win in consumer [technology], to a much greater extent than enterprise.” Waiting in the wings, however, is Apple. Investors have been eager to learn more about the company’s plans for AI, as its share price has declined this year compared with Alphabet and Amazon. This week, OpenAI announced it had sealed a deal with Apple to create a desktop app for Macs. The iPhone maker is also said to be exploring further potential partnerships with both OpenAI and Google Gemini, while hiring experts and pushing out research papers that give a rare insight into its work behind the scenes building AI models.
Insiders say Apple’s advantage lies in its massive existing user base, with more than 2.2bn active devices around the world, which places it in a position to steer the process of how people integrate generative tools like virtual assistants into their daily lives. Apple is likely to build out a “next level Siri technology” in partnership with OpenAI, predicts Wedbush analyst Dan Ives. An assistant capable of carrying out complex tasks for iPhone users could eventually be turned into a paid subscription service, he said in a note — similar to how the company currently monetises other services like iCloud. After OpenAI’s demo on Monday, Bank of America analysts reiterated their buy rating on Apple stock, saying it underlined the potential that virtual assistants and AI features present for app developers in its App Store ecosystem, which already nets Apple between $6bn and $7bn from commission fees every quarter, according to Sensor Tower estimates. Distribution matters and brand matters — Apple and Google . . . have big advantages in that sense Google’s edge, however, is in the suite of consumer apps it offers, from email to calendar tools, where AI agents could be integrated. “We’ve always wanted to build a universal agent that will be useful in everyday life. Our work making this vision a reality goes back many, many years. It’s why we made [the chatbot] Gemini multimodal from the very beginning,” Demis Hassabis, CEO of Google DeepMind, told reporters this week. “At any given moment, we are processing a stream of different sensory information, making sense of it and making decisions. Imagine agents that can see and hear what we do, better understand the context we’re in, and respond quickly in conversation, making the pace and quality of interaction feel much more natural.” Despite the AI companies jostling to create consumer bots that can assist in day-to-day tasks, it might be some time before they become everyday reality. The AI-generated creation of content is still in its infancy, and occasionally prone to errors and “hallucinations”, or the fabrication of false information. This could become a big problem if the assistant is completing work-related tasks where accuracy, rather than creativity, is crucial. Scaling up is also a huge challenge, says Suleyman. “It’s a hypercompetitive market . . . distribution matters and brand matters — Apple and Google . . . have big advantages in that sense.”
Suleyman moved to Microsoft in March after his start-up Inflection pivoted from a consumer focus to an enterprise model. “[Pi] was a deeply engaged product but getting to major scale like Gemini is super challenging.” But Bret Taylor, chair of OpenAI’s board, and the chief executive of a new AI agent start-up Sierra, says the displacement of existing consumer interfaces offered opportunities for a range of companies. “In big tech shifts, start-ups can stand out and succeed because there’s not necessarily a market leader right now,” he says. While the Big Tech companies and their partners might be best positioned to take advantage of the current moment, Meta’s chief AI scientist Yann LeCun says that they will need to open up their models to scale AI assistants beyond individual countries in the west. “In the new future every single interaction with the digital world will be through an AI assistant of some kind. We will be talking to these AI assistants all the time. Our entire digital diet will be mediated by AI systems,” he said at a Meta event in London last month. “This can’t be done by companies on the west coast of the US. We need them to be diverse.”