Talk show – the rise of voice-based discovery
With the advent of Amazon Echo and the launch of next-generation boxes such as Sky Q, there are signs that using your voice to control the TV is finally filtering into the mainstream. Stuart Thomson looks at some of the latest moves.
Using your voice to control the TV has been possible for some time, but it is only the growing popularity of digital assistants such as Amazon Echo that has made speaking to the TV begin to seem like something your average person would do.
Novelty is not enough to sustain a business case, however. The use of voice is unlikely really to take off unless it fills a need. TV platforms have been opening up to multiple content sources, including the likes of Netflix. For consumers, this means that finding the content they want to watch is becoming more complex. On a PC, it is relatively easy to use the keyboard to search for content. On TV, in your living room, using a remote control to navigate a text-based interface is much more of a hassle and less likely to produce satisfactory results.
For this reason, TV operators are beginning to look more seriously at voice-based search as the next frontier of the user experience.
The use of voice has been given a strong impetus by the success of devices marketed by Amazon, Google and Microsoft. With voice technology becoming increasingly sophisticated and a supply of rich metadata to support increasingly complex variations in search terms, it seems that voice-based systems are here to stay.
For TiVo, a pioneer in this area, developing voice-based search and navigation has been central to its research and development activity for some time. Rovi’s acquisition of TiVo saw it put its existing Rovi Conversation Services activity together with the former Digitalsmiths content discovery technology to enable more sophisticated recommendations to be served to voice-based searches. Its involvement has given TiVo unique insights into some of the challenges, and possible solutions.
“Text-based search is quite simple because people have a limited way of entering text, particularly in the entertainment space. They don’t have a full keyboard,” says Charles Dawes, senior director, international marketing at TiVo. “They use a remote or TV on-screen keyboard or maybe their phone. So they write much shorter things like actors names and programme titles, and there are lots of technologies you can use to help guide them through. When it comes to voice search, the way people search for the content becomes very different. It becomes a lot deeper and more complicated. You might say the name of an actor or something like that.” For the content navigation system provider, there is a much stronger need to “have an in-depth understanding of the content”, he says.
Dawes says that the technology is constantly improving, with machine learning enabling systems to recognise and decode regional and foreign accents, and interpret different ways of saying the same thing. For languages other than English, however, more work is required.
Sylvain Thevenot, managing director at TV technology specialist Netgem, points out that, in addition to decoding regional accents, voice systems have to work out the context in which users are employing a particular word, such as ‘home’. This could mean a request to go back to the home screen, or a request to play a piece of content “at home”. Individuals also have private ways of referring to things that may not be at all simple for a machine to decode.
Voice recognition is just one part of a complex equation. For TV viewers, using a voice input also carries an expectation that the system is going to deliver a personalised response to a general question, in a way that isn’t true of text-based search. “When someone says ‘what is on TV tonight?’ they are not asking a general question. What they are saying is, ‘what is on TV tonight for me?’,” says Dawes.
Responding to this not only means building up profiles of individuals – or family groups – but also using voice recognition technology to identify particular individuals. This is a step beyond simply using voice for search. The technology to do this is available Banks and other organisations are already using voice to identify and log in users to their accounts. “It is something we will definitely see in the entertainment space,” says Dawes.
Enabling this kind of voice search in a meaningful way also means having a supply of rich metadata, including things such as well-known and not-so-well-known film quotations and characters’ names.
“You can’t just have a basic set of metadata. It is like in the music space with lyrics. People remember a lyric, but not the title of the tune. It is similar with video. You remember specific pieces of information and maybe the name of the character an actor plays but not the name of the actor, or you may know the same actor was in something else,” says Dawes.
Ferdinand Maier, managing director of remote control specialist Ruwido, says that TV operators will typically look to deploy voice-based search in a phased way, starting with basic requests such as the ability to search for specific titles, followed by more complex, conversational searches with commands such as ‘show me something funny’. However, the wider the attempted reach, the more room there is for misunderstanding and for presenting inappropriate content.
Supplying metadata and content descriptors that can be matched with voice requests to deliver appropriate suggestions remains challenging, despite the promise of machine learning. However, Rich Cusich, chief product officer at Nielsen-owned content recognition specialist Gracenote, believes that most voice requests will be relatively simple, easy-to-decipher commands. “Eighty per cent of things will be pretty straight forward,” he says.
Nevertheless, systems need to be aware of complex links between pieces of content that could help users find what they are looking for. Integration with social media may also be important, as viewers could be looking for something that is trending at a particular moment. Cusich says that providing meaningful search results will require a combination of collaborative filtering, cross-referencing with user data and awareness of context, including the time of day and the device that a viewer is using.
TiVo’s Dawes nevertheless cites a recent survey that showed voice being used by 43% of millennials every day to interact with devices to highlight how quickly voice technology is becoming mainstream. He believes that older TV viewers will be attracted to the technology because it is easier and more appealing than “working through complex content trees that a lot of the time are not dictated by logic but by commercial agreements”.
“Being able to sit down and say ‘show me the last Eastenders’ or show me what was on at 9pm that I missed’ will open up content that they haven’t been able to explore,” he says. According to Dawes, people with access to voice technology are increasingly using it to make over eight searches a week, meaning that voice quickly becomes “the default way to do search for content”.
YouView and EE TV
While the ways in which voice can most effectively be used are still a work in progress, some service providers are beginning to take the plunge.
Sky enabled the use of voice on touch remotes for its Sky Q advanced TV service earlier this year, focusing on a range of basic search commands, including the use of some well-known movie quotations. Other UK providers have looked to Amazon’s digital assistants as their first avenue to enable voice. BT-owned telco EE recently launched voice control for TV through the Amazon Echo and Echo Dot devices. The company said it was the first UK provider to support an Amazon Alexa ‘skill’ on its set-tops.
EE TV has integrated Alexa with its Netgem-supplied TV platform. Users can ask Alexa what is on TV tonight, with Alexa providing three recommendations from the same source that supplies Freeview Picks – the UK digital-terrestrial service’s recommendation feature.
Users have the option to record any or all of the recommended shows using a voice command. The skill works whether EE TV is switched on or in standby mode.
Netgem’s Thevenot says there are three levels at which voice can be used to interact with TV services. At the most basic level, voice can be used to replace direct remote control commands such as ‘channel up’ or ‘down’ and ‘tune into BBC1’. This has limited appeal and adds little to the functionality of a remote control. One level up is using voice to search for content using obvious keywords to replace a virtual onscreen keyboard. Beyond that, says Thevenot, voice requests can trigger recommendations based, for example, on what’s trending on social media. This is essentially what EE is focusing on.
YouView, the advanced TV platform used by BT and TalkTalk, has meanwhile also launched its own Amazon Alexa pilot.
“We are still exploring the proposition. We want to take something to the larger user base. We have to have something usable in their homes. That is our goal,” says Morgan Henry, head of emerging technologies team at YouView. “This was championed by people who were passionate about voice within YouView. All of our partners were very supportive. Right now we are getting positive responses but our goal is to put it in front of users, and learn what people are going to use it for.”
The starting place for YouView is to enable viewers to “avoid reaching for their remote as they explore” content and to enable them to carry out basic functions such as to record content via voice command. As a free-to-view platform, YouView does not require a user ID or password. However, Henry says voice ID could be used to deliver personalisation at an individual or household level. “Context is really important. Are there other people in the room? If there are kids in the room, interacting with the device, you should offer family recommendations,” he says.
Ruwido’s Maier, whose company is focused on the high-end pay TV market on the other hand, says that voice biometric recognition could be used to not only check entitlement and parental control but to enable profiling of users, allowing the system to know who is making requests and where they are in the home. Maier says that work on this is ongoing.
The EE and YouView trials are still at an early stage, but service providers are only likely to embrace voice fully if they can be sure it will have a broad appeal. According to Chloe Davies, chief product manager at UK free-to-view satellite platform Freesat, a survey of users found that around a quarter listed voice control as a must-have or a ‘like-to-have’ feature. However, detailed questioning placed it lower down the list of’ priorities, below more down-to-earth features like the availability of catch-up players and the ability to record content.
Personalised recommendation is also higher on their list of priorities. Content recommendation to some extent can be a substitute for active search, putting content that users might like in front of them without them having to lift a finger – or raise their voice.
“If you want to form a habit with people on an everyday basis you need to make recommendation prominent. Expecting people to go off and find things is always going to be of limited appeal. TV is more about giving an upfront experience on a landing or home page,” says Davies.
Personalising recommendations and enabling viewers to find content they may want to watch across live, on-demand and catch-up services may therefore deliver more value in the medium term than voice search. Nevertheless, she concedes, the popularity of Amazon Alexa could see voice pass into the mainstream. “It cuts out the hassle of using a remote control. The aim is to reduce the amount of time people spend looking for stuff,” she says.
Voice-based search seems most likely to supplant awkward text-based search if enough people are ‘voice-enabled’, but are digital assistants the best platform to use?
Davies enumerates three basic choices: using Alexa or another digital assistant such as Google Home or Microsoft Cortana; using a mobile phone or tablet; and direct input via a remote control to the set-top box. Starting with Amazon Echo makes sense, she says, possibly in combination with smartphone apps. “I think the smartphone and Amazon Echo are things you could do together. You could use the Freesat app as a companion device experience but also use it for content discovery,” she says.
Enabling voice direct via the set-top on the other hand means a new box and a new remote, pushing up the bill of materials to the user or the platform operator. However, using a box does have the advantage of giving control back to the service provider. “If you have an Amazon Echo, a YouView box and a Freesat box in different rooms, and give the Echo control of what gets played where, that could be challenging from a business perspective.”
EE and YouView are focusing on Amazon Alexa for now, nevertheless. For Netgem’s Thevenot, Alexa is an obvious starting place, providing an agile API that enables the implementation of multiple features. Amazon’s Video Manager is designed to enable easy communication between its devices and the set-top box. The Amazon voice recognition system can also enable personalisation based on the profile of the user. Thevenot says that the Alexa APIs are not currently sharing user profiles with Netgem’s platform but that this will come within a year, enabling the system to present relevant recommendations based on whoever is speaking to the device.
However the main attraction of deploying voice commands via a digital assistant – and Amazon in particular – is that these devices already have significant penetration.
“Our goal is to explore the real value of voice to our users and how it can integrate with their household. There have been previous attempts to do voice, but people are not using those direct input mechanisms. The Alexa or Google Home [platforms] are more prevalent and pervasive. That is where it has real value,” says YouView’s Henry. “I think the direct input device will have a place, but the main thing is to make it pervasive. The input mechanism and how you trigger that voice input are secondary issues. We haven’t ruled out using boxes or remote controls, and the other obvious device is mobile phones.”
YouView’s partners already have mobile apps to enable interaction with their on-demand players, all of which are present on YouView, and YouView has its own app, which enables functions such as remote recording.
“You could have Alexa in the living room and then also use your phone when you are on the move. Voice entry has been around on mobiles, and also on wearables, for some time and we have experimented with that,” says Henry. “We have put ourselves in a good position with the rollout of our next-generation app, which gives us a good foundation. Where it gets interesting is how to make content relevant and how to do a federated user interface, bringing in our content partners’ apps on our platforms. We want to keep the user journey seamless.”
Control of the experience
Digital assistants do have some technical disadvantages. The device may not be situated next to the viewer, so there may be a clash between the voice of the viewer and the TV soundtrack, for example. For this reason, the use of a microphone in the remote control could provide a better experience.
Ruwido’s Maier additionally argues that voice works best in combination with other means of inputting commands, and sets forth a vision of the ‘multimodal’ input device.
“Analogue functions will be carried out by mechanisms other than voice. We’ve spent a lot of effort on this,” says Maier. He argues that basic navigation functions – such as left or right on a grid, or surfing through a channel listing – are best carried out via touch and haptic feed back from a remote, for example.
“There is a huge interest in the market in organic haptic surfing and more contextual navigation in combination with profiling,” he says. “With our remote, if you pick it up, the system knows where you are in the home. If you talk, the system can converse with you.”
Maier says that voice recognition can be used not only for parental control but to deliver recommendations based on personal profiles. But while search will be voice-based, channel surfing and other functions will remain based on touch. For all of these reasons, and because it is more complex to hook up third-party consumer devices to the TV, he believes that the remote control has a future.
“We are not concerned that the remote control will be replaced,” he says. “With a mobile you have to connect it with the box and get an SMS to confirm and so on. The mobile is a private device that you won’t use for public or shared applications. People like to have a remote control to talk with the system. They can press a button and know when it is closed down. We are convinced that the remote will become more important.”
While building voice functionality directly into the remote and set-top carries a cost to the operator, Maier believes this will be something pay TV operators will be happy to accept because the technology provides “a calling card” that ties the user to the service.
The main advantage for service providers of using the remote and set-top is that it enables them to stay in control of the TV experience. Of course, in the set-top box space, implementation of voice also requires upgrading the hardware in the home.
“Where we see voice being implemented is as part of a hardware renewal cycle,” says TiVo’s Dawes, who adds that the decision to deploy a 4K UHD box could provide an obvious point at which voice could be enabled in a new generation of devices.
The key to the appeal of voice control is that it could enable users to untether their active search for content from the current restrictions of the EPG, enabling them ideally to jump to the content they are looking for and to search across multiple content sources on their own terms. The popularity of voice is growing, particularly among younger age groups, and sources of content are proliferating, all of which augurs well for the future of voice search. But how voice search develops alongside the delivery of increasingly sophisticated personalised recommendations, whether it functions as a standalone mode of input or as a complement to other ways of navigating via a remote control, and where is sits in viewers’ overall priorities, remain to be worked out.