Microsoft has just announced their Custom Speech Service platform which will be a part of Microsoft Cognitive Services, a collection of 25 tools to enhance deep machine learning and IoT. The new tool is developed with intent to assist users in their daily tasks and platforms while overcoming speech recognition barriers such as speaking style, vocabulary and background noise. Microsoft believes that the new speech service platform can be an addition to VR technology.
In recent years, while working as a creative director in the gaming industry, the buzz from new experiences fizzled – doubling of computing power failed to result in a doubling of gaming excitement. “What is the next thing?” he asked. “What is the technology leap that is going to allow for new experiences that will wow the gamers?”
The questioning led to a demonstration of the latest generation of virtual reality technology. He strapped on the headgear and was taken for a wild ride on a roller coaster. The adrenaline rush returned. The experience, he said, was visceral.
“You believe that things are real when in the virtual world,” he said. “What would happen if we put a person in front you? Would you try to talk to him?”
The idea blossomed to a business plan. Mejia founded his own company, Human Interact, to develop virtual reality storytelling experiences. The company’s premier title, Starship Commander, provides players control over the narrative as they zip around space at faster-than-light speed and speak with virtual characters at every turn.
To achieve realistic, fast-paced action, Mejia and his colleagues required accurate and responsive speech recognition.
“You’ve got to make it so that anytime anybody says anything, [the speech recognition engine] is going to understand them and run them down the right path in the script,” he explained. “And that,” he added, “is the magic of Microsoft Cognitive Services.”
The Custom Speech Service platform essentially allows us to achieve what Austin Wilson has with Alexa for the Elite Dangerous VR game where he used Alexa’s voice assistant to help him navigate the galaxy. Primarily, this achieves a goal for pushing the immersive experience that virtual reality was intended for. And there are two layers to the Custom Speech Service.
The first layer and tool is called Language Understanding Intelligent Service (LUIS). LUIS essentially improves the processing portion of our natural language which ultimately cuts down errors when prompting voice commands.
There are a lot of different ways to say ‘let’s go. There is, ‘let’s go; autopilot; get me out of here; let’s go faster than light; engage the hyper-drive.’ These are all different things people say to get going in our game, especially in the heat of the moment, because sometimes you don’t have very much time before something bad happens.”
The Language Understanding Intelligent platform allows devs to train a classifier in a machine learned model to understand the intent of natural language by uploading a subset of the type of things users might utter and tagging those uterrances to an intention.
The second layer is called Custom Recognition Intelligent Service (CRIS). This allows developers to upload sample audio files and transcriptions to better understand certain voice commands even with disruption in acoustics. CRIS “provides companies with the ability to deploy customized speech recognition.”
The combination of the 2 platforms (which are a part of Microsoft’s Cognitive Services) allows VR experiences to be enhanced with voice commands and voice recognition, regardless of our tone and vocabulary. Look for VR companies to further improve VR experiences with the use of Custom Speech Service and Language Understanding Intelligent Service.