Skip to main content

The much-needed reinvention of the voice assistant is almost here

The much-needed reinvention of the voice assistant is almost here

/

With the powers of context and conversation, Siri and Alexa could one day be totally amazing instead of vaguely annoying. But only if Big Tech can get it right.

Share this story

If you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.

Apple showing off Siri’s updated logo at WWDC 2024.
Apple’s Siri voice assistant has a new look and a new brain.
Image: Apple

Voice assistants hold so much promise, but in the decade-plus since Apple’s Siri and Amazon’s Alexa first wormed their ways into our lives, their most compelling use is still setting timers. Competition from Google’s Assistant (and if we’re being charitable, Samsung’s Bixby) failed to light the spark of innovation in this space, and in many ways, voice control has regressed. These assistants regularly misunderstand, mishear, and sometimes just don’t listen at all. They’re a far cry from the proactive, actually smart digital assistants they were originally pitched as.

Enter generative AI: the technology voice assistants need to transform them from novel to necessary. This week at its Worldwide Developers Conference, Apple announced plans to infuse its long-neglected assistant with the emerging tech, providing Siri with two crucial skills: context and conversation. It’s the recipe for delivering on that original promise, or at least getting us much closer.

Apple says its Apple Intelligence will bring Siri “all-new superpowers” gleaned from improved language understanding, an awareness of personal context, and the ability to take action across apps on your phone.

Where the current Siri needs explicit instructions on what to do and how to do it, Apple promises that this new version will let you say something like, “Siri, what time does Mom’s flight land?” and the assistant will know to look through your Mail and Messages and pull out the information. You could then say, “How long will it take me to get there?” and it should know you mean the airport and pull up a route and ETA via Maps.

These seemingly minor improvements address the fundamental issues of voice assistants

You also won’t have to phrase commands precisely. Instead of saying, “Siri, set a timer for 10 minutes,” you should be able to bumble through with a phrase like, “Siri, set an alarm for — oh, wait, no, set a timer for 10 minutes. Actually, make that 5,” and the assistant will get it right.

These seemingly minor improvements address some of the fundamental issues of voice assistants — not understanding enough about you and requiring you to speak in unnaturally precise ways to get them to do anything — that turned these promising pieces of technology into barely more than glorified alarm clocks.

Siri, Alexa, et al. are already artificially intelligent voice assistants: machines that mimic human-like intelligence through a combination of command and response programming and machine learning. But with the power of generative AI and LLMs, voice assistants could have the ability to generate a response based on what they’ve learned, rather than just reacting with existing knowledge.

This should provide the tools to create that more conversational, smarter voice assistant — one that promises to be much more useful than those we have today. But all we’ve seen to date are demos of this potential, none of this exists in real life yet.

Making voice assistants smarter is not as simple as giving Siri and Alexa a ChatGPT-style lobotomy

This is because making a superintelligent voice assistant is a huge challenge with equally huge potential ramifications if it gets it wrong. It’s also not as simple as giving Siri and Alexa a ChatGPT-style lobotomy.

Voice assistants, especially ones connected to devices and services in our phones and homes, are a different beast than a chatbot in a browser. They have the ability to take action in the real world: doing things like controlling our thermostats and lights and sending emails and messages. This is not where you want a potentially hallucinatory AI in control, and it speaks to why Apple has carefully sandboxed its ChatGPT integration with Siri. 

Amazon is also working on a new and improved voice assistant, and while the company says its already integrated generative AI into components of Alexa, according to a report from Fortune, the new Alexa isn’t even close to ready.

The company announced an “all-new, smarter and more conversational Alexa” powered by a new Alexa LLM last fall with an impressive demo. It touted an Alexa that should understand conversational phrases for more human-like interactions, interpret context more effectively, and complete multiple requests from one command — like “Alexa, call Mom, turn on the living room lights, and lock the front door.” 

But we’ve seen no sign of this superpowered Alexa since, just vague assurances that it’s in a limited preview. This may be because, according to Fortune, the company is struggling to merge the old Alexa and its capabilities with its vision for the next-gen voice assistant. 

Similarly, Apple is taking a slow and steady approach. The new Siri won’t launch until the fall and, even then, will be labeled a beta. It also won’t have a place in the smart home at first: it’s not supported on any of Apple’s voice-forward, home-based devices such as the HomePod smart speakers and the Apple TV. It’s also not coming to the Apple Watch yet. 

The new Siri is not supported on any of Apple’s voice-forward, home-based devices such as the HomePods and Apple TV

While these devices likely don’t have enough processing power to run generative models, many of which Apple wants to operate locally for privacy purposes, this feels like a big gap. The smart home is a key space for a more intelligent voice assistant, not only can it help bridge the personal and home spaces, but it could help make running a smart home much easier.

Amazon’s former head of devices and service, Dave Limp, told me last year that the new Alexa LLM they’re building has been trained on hundreds of smart home APIs. This could give Alexa the context needed to proactively manage smart home devices like lights, locks, thermostats and such, making them easier to set up and use, and allowing you to give commands like, “Alexa, it’s dark in here and I’m cold,” and the voice assistant will know what to do. 

In contrast to Apple, Amazon has said its new Alexa will come to all of its Echo smart speakers, including the very first Echo released in 2014. (It can do this by offloading processing to the cloud.) Although, as the HomePod Mini is now four years old, it’s my guess we’ll see a new model with updated hardware designed for AI very soon. Apple cannot afford to cede the home to Alexa any further.

Three of Apple’s HomePod Mini smart speakers standing on a desk in various shades including orange, yellow, and blue.
The new Siri doesn’t work on the HomePod Mini, which means maybe there’s a new Mini on the way.
Photo by Jennifer Pattison Tuohy / The Verge

While the stage is set for the second coming of the voice assistant, there’s still a long way to go until we see act 1. It’s also possible the show will open with some entirely new characters if these companies can’t find a way to effectively build the new technology onto the foundations of the old. 

It’s entirely possible Google will launch a completely new voice assistant

That appears to be the road Google is taking. Its Google Assistant voice assistant has yet to undergo a big AI overhaul, with the company reportedly putting all its resources into the new AI-powered Gemini assistant. While a symbiosis there seems the natural move, given Google’s penchant for abandoning the old it’s entirely possible the company will launch a completely new voice assistant built from the ground up on generative AI.

However they get there, the promise of these smart voice assistants is exciting, especially for whichever company can effectively merge the personal assistant with the home. Imagine if your HomePod could welcome you home with personalized updates, tell you that you need to leave for your daughter’s school play 15 minutes early because of traffic, and have your EV charged with enough range to get you there by the time you walk out the door. That’s a lot more like what we were promised — and it’s a whole lot smarter than setting a timer.