Smart objects are pervading the home and becoming more intelligent by the day.

The Internet of Eyes and Ears comprises the next wave of everyday objects that integrate visual recognition technology and voice technology. As costs decline and homes become increasingly smart, the Internet of Things is undergoing a revolution, and for the first time, becoming truly mainstream.

The technologies that fit under this umbrella term seem to advance and multiply every day: facial recognition, image recognition, gesture recognition, speech recognition, natural language processing and emotion recognition are just some of the major technologies we are seeing develop.

Face ID on iPhone X

The rise of voice tech

In terms of voice technology, we have seen Alexa penetrate the market to become a household name, with approximately 22 million devices sold in the United States in 2017, according to Forrester Research estimates. Google Home is on its way to replicating Amazon’s success, while Apple’s new HomePod is hoping to get a slice of the pie too. Audio companies are also launching their own voice technology products. Sonos recently launched a platform-agnostic smart speaker that can integrate multiple digital assistants. Inspired by the growing market and increasing demand for voice-activated products, Bang & Olufsen, Anker, JBL and LG all revealed Google Assistant-powered smart speakers at CES 2018.


In the past few months, we have seen voice move outside of the living room and kitchen and integrate into other rooms in the house. Kohler’s voice-activated bathroom is one of the brands leading this shift. A smart mirror that integrates Amazon Alexa acts as the control center and the taps, bathtub and toilet can all be controlled with voice. While this might sound excessive to some, research conducted for our “Speak Easy” report found that more than two thirds of global smartphone users are interested in the prospect of voice-activated televisions (69%) and light switches (66%), while almost half (45%) are interested in the idea of chatting to their fridges, showing the desire to integrate voice in diverse products and settings around the home.

Kohler Konnect

The reason behind this preference for voice may relate to our innate need to communicate. “For most of our cultural evolution as a species, humans have transmitted knowledge and ideas from one generation to another through oral tradition,” says Nick Ryan, composer, sound designer, artist and audio specialist. “The voice is therefore perhaps the most innate and intuitive way for us to communicate.”

Smooth operator

Simply put, voice is natural and easy, saving consumers time and effort. In our research we found that the main reason for using voice was efficiency, with our respondents describing the tech as “convenient” and “simple to use.” In fact, 76% of all regular voice tech users say that “using voice technology feels really natural now and I don’t even think about it.” Duncan Anderson, CEO and cofounder of Humanise.AI and former CTO of IBM Watson Europe, explains, “It’s about being super-helpful, super-efficient, not getting in the way, building something that allows me to get the job done with minimal fuss.”

With developments in natural language processing and speech-recognition error rates now matching human parity at 5%, we’re seeing voice become more than just a time-efficient shortcut. Voice assistants are starting to play an advisory role, acting as a digital butler and seemingly forming relationships with consumers. Globally, almost half (43%) of regular voice technology users say that they love their voice assistant so much they wish it were a real person.

Brands are going to have to make sure they craft their own voices and personalities to build a deeper emotional connection with consumers. “Companies will now need to think about the actual voice of their brand,” says Martin Reddy, cofounder and CTO of the PullString voice technology development company. “They have to think about how their brand sounds, and the words and language that their brand uses when communicating with customers—the personality of their brand as it’s presented to users.”

Voice tech goes global

It’s important to note the global nature of this phenomenon. It’s not just North American and European markets embracing voice—our research shows that Asian markets are also welcoming this tech. Japan is at the forefront, with the likes of Musio, Gatebox and Line’s Clova. Similarly, in China, consumers are very familiar with voice technology and often find it easier than typing Chinese phonetic characters. In fact, a majority of our Chinese respondents believe voice technology will encourage communication—78% of smartphone users think voice technology will help people interact more with each other, as they won’t be always looking at a screen. Globally, this figure decreases to 53%.


Major brands are recognising the importance of catering to different international markets. Siri now speaks over 20 languages, while Google Assistant will be able to speak 30 languages by the end of 2018, covering 95% of Android users. Not only this, but Google Assistant will be able to recognize the language you’re speaking and switch to it automatically, meaning that multilingual users won’t have to stick to one language.

Visual recognition tech

Voice is just the beginning. We also live in an image-driven culture and the focus on the visual is growing stronger. We are seeing the rise of visual recognition technology and advanced home products that integrate smart cameras. “All inanimate objects will have at least one camera,” says Evan Nisselson, general partner at visual technology venture fund LDV Capital. “And the key is not just that it has a camera, but what computer vision and artificial intelligence can analyze out of that visual data.”

Pinterest Lens

Following impressive developments in image recognition, Pinterest launched Lens, a feature that has been hailed as a “Shazam for objects.” All users have to do is point their phone camera at an item and the app will analyze the image and return relevant results. This could have uses across various categories. Consumers could shop online for products instantly by simply hovering their phone over an item. In terms of the food industry, there are already several tools such as Google’s Im2Calories and Pic2Recipe that can help users determine the calories of a meal or the recipe behind a dish just from a photo.

Unlocking facial recognition

One of the most useful and prominent visual technologies is facial recognition, which is being harnessed by many brands for security purposes. Face ID on the iPhone X is a popular example. The device recognizes a user’s face and unlocks the phone if it matches the saved profile. Nest also announced Nest Cam IQ, which differentiates between the faces of family members and strangers in the home.

Similarly, the automotive industry is starting to integrate facial recognition. Chinese startup Byton announced an electric, self-driving concept car at CES that can recognize passengers and drivers and automatically unlock doors when it detects them. The seat position can also be adjusted when the driver is recognized.

Byton self-driving car

As accuracy improves, facial recognition has the potential to revolutionise security and a range of other areas. “In the real world, most of the commercial facial recognition algorithms will only detect the front-row people in a crowd. The partial faces won’t be detected; the ones with headgear, medical masks, large caps, scarves aren’t being detected. So, how do you solve it? That’s something we have solved,” says Marios Savvides, founder and director of the Biometrics Center at Carnegie Mellon University and research professor in the university’s electrical and computer engineering department. His research focuses on making facial recognition as robust, unconstrained and bulletproof as possible, finding ways around the limitations of the tech.

Privacy concerns

Nevertheless, there is definitely a stigma when it comes to safety and privacy issues. “I think Hollywood has done a great job of scaring everyone from biometrics. You get fingerprinted and you assign it with criminal activity. Minority Report, all these movies, people are worried that they’ll be tracked. Is that possible? Yes, of course,” says Savvides. “Any technology can be used for good or bad. I think there is so much good this can do. It can be used to find missing children, find criminals. There’s so much it can do to make society safer.”

The future

We are also seeing the rise of products that combine both the Internet of Eyes and the Internet of Ears. Amazon’s Echo Look, launched in April 2017, was the first smart speaker to feature a built-in camera. Its Style Check feature combines machine learning with advice from fashion specialists to offer opinions on a user’s outfit. The Echo Spot, which followed in January 2018, features a screen and offers everything available via the Echo Dot, but also integrates a camera and allows users to make video calls. The Lenovo Smart Display, which integrates Google Assistant, also has a screen to combine voice and visual tech.

Echo Look, Model
Amazon's Echo Look
Echo Look, App

Advancements in artificial intelligence mean that cameras and speakers can recognize consumers, understand and respond to them, learn their habits and even measure their emotions. This has important implications for brands and advertisers. All of these technologies promise to provide brands with richer, deeper engagement with consumers. As long as security concerns are appeased, brands can use such technologies to build engagement and add genuine value to the customer experience.

For more on the future of voice tech download our Speak Easy trend report.