The R&D Tax Credit Aspects of Voice-Activated Software
Voice-Activated-Software
“There have been more improvements in
speech recognition over the past three years than there have
been over the past 30 years combined”. This statement by
Tim Turtle, CEO of smart voice start-up Expect Labs, points to
the fact that recent technological advancements are
revolutionizing the way humans and computers interact. The
present article will assess the rise of a new generation of
voice-activated applications as well as the wealth of
opportunities they bring about. It will also discuss how
R&D tax credits can help companies succeed in the emerging
voice-driven world.
The R&D Tax Credit
Enacted in 1981, the Federal Research and
Development (R&D) Tax Credit allows a credit of up to 13
percent of eligible spending for new and improved products and
processes. Qualified research must meet the following four
criteria:
- New or improved products,
processes, or software
- Technological in nature
- Elimination of uncertainty
- Process of experimentation
Eligible costs include
employee wages, cost of supplies, cost of testing, contract
research expenses, and costs associated with developing a
patent. On December 18, 2015, President Obama signed the
bill making the R&D Tax Credit permanent. Beginning
in 2016, the R&D credit can be used to offset Alternative
Minimum tax and startup businesses can utilize the credit
against payroll taxes
Voice-Activated
Technology
Voice and speech recognition can be
generally defined as the “the ability to convert and decode
human voice into speech easily understood by a
computer.” In other words, it enables machines to
transform natural language into actionable data.
There is, however, an
important distinction between the two categories. While voice,
or speaker recognition, allows for identity verification and
therefore focuses on who is speaking, speech recognition is
used for issuing operating commands, with focus on what is
being said.
Recent leaps in
voice-driven technology were made possible by machine learning
capabilities and cognitive computing technology along with
unprecedented access to massive amounts of data. Even though
advancements so far are beyond impressive, projections for the
future are even more exciting.
Data from artificial
intelligence company MindMeld shows that while voice search
traffic was negligible before 2015, it currently represents 10
percent of all search traffic. By 2020, 200 billion voice
search traffic was negligible before 2015, it currently
represents 10 percent of all search traffic. By 2020, 200
billion voice searches will take place every month.
According to a recent
report by research firm Tractica, the voice and speech
recognition market will experience a 40 percent compound
annual growth rate between 2015 and 2024, rising from $249
million to $5.1 billion. The report also predicts that
licenses for speech and voice recognition software will grow
from 49 million in 2015 to 565.8 million in 2024.
Promising Areas
There are a myriad of potential
applications for voice-activated technology. Consumer-facing
markets have significant potential for growth, particularly
mobile device authentication, control of wearable devices, and
the Internet of Things. Healthcare, call centers along with
government and enterprise IT are also seen as promising areas.
Mobile Devices and Applications
Mobile devices have
represented a breakthrough for voice-activated technology. By
introducing built-in speech recognition capabilities, Google,
Apple, and Microsoft’s mobile operating systems have shed
light on the actual usefulness, convenience, and flexibility
of such functionalities, which are increasingly central to the
mobile experience. The numbers for Apple’s Siri make for a
staggering example. Over 500 million people globally have
access to the virtual assistant, more than 200 million use it
monthly and 100 million use it on a daily basis.
Many experts consider
conversational interfaces the future of mobile applications.
While conventional applications offer a limited number of
interactions, which are restricted to the possibilities
featured on a screen, voice interfaces are an open-ended model
through which users can ask questions and issue voice commands
that best suit their needs.
Biometrics
Similar to a fingerprint
or iris, voice is unique to each individual and can,
therefore, be used as an authentication method. The voice
biometric, a mathematical representation of the inimitable
sound, pattern, and rhythm of an individual’s voice, is
extremely difficult to forge. By measuring qualities such as
dialect, speaking style to pitch, spectral magnitudes, and
format frequencies, innovative voice authentication technology
offers exceptional reliability, accurately pinpointing
attempts to impersonate a voice or to provide voice
recordings.
The rise of identity
theft rates as well as the increasing desire for rapid and
secure access to information has steered companies towards
voice biometrics. According to research firm Tractica, some
vendors are building voice print databases for call centers,
specially designed to identify repeat callers or known
fraudsters.
Banking is a
particularly promising market for voice biometrics. For
instance, Capital One recently partnered with Amazon to enable
Amazon Echo users to pay bills and access account information
through voice commands. After suffering an online cyber
attack, HSBC will implement voice recognition services in both
mobile apps and telephone banking. The roll out is expected to
reach 15 million customers by the summer. Barclays already
resorts to voice authentication for 300,000 of its wealthiest
clients in the UK and plans to expand it to 12 million
retail-banking customers. The experience has been extremely
successful, with the time taken to verify an identity falling
from 1.5 minutes to less than 10 seconds.
Smart Cars and the IoT
The Internet of Things
is undoubtedly a major field for speech recognition. The
multiplication of connected, smart objects creates
unprecedented need for interaction between man and machines.
In this context, voice has gained ground as a privileged
avenue of communication. Even though home automation is the
most striking example of how voice commands can facilitate the
control of IoT systems, smart cars are also a promising area
for voice-driven innovation.
Research firm TechNavio
predicted a compound annual growth rate of 10.59 percent for
the global automotive voice recognition market between 2013
and 2018. An interesting example is Bellevue,
Washington-based VoiceBox Technologies, which recently
unveiled an embedded automatic speech recognition product for
advanced automotive applications. Combining deep neural
networks and natural language understanding technology, the
solution is capable of processing complex, contextual
conversations. VoiceBox’s flexible speech recognition system
also enables enhanced safety since drivers do not have to
‘think about’ how to structure a command and are thus less
distracted.
A Voice-Driven Smart
Home Strategy
Over recent years, Amazon has established
itself as a major player in the speech recognition market.
This is largely due to Amazon Echo, a smart speaker equipped
with always-on microphones that are capable of receiving voice
commands from anywhere in the room. Behind the wonders of
cylinder-shaped Echo is Alexa, Amazon’s speech recognition
system.
Similar to other virtual
assistants, Alexa gives hands-free access to a variety of
functionalities, such as weather, calendar, and search. The
difference is that, thanks to Echo’s seven embedded
microphones and electrical connection, Alexa is always
listening, no matter where you are in the room, no buttons
pressed.
Amazon Echo is
experiencing unprecedented demand. In many instances, it has
become the center of Amazon’s smart home strategy. Echo works
as a control hub that connects various third-party home
automation applications. Examples of compatible devices
include smart lighting solutions, such as Philips Hue, Wink,
and Samsung Smart Things, as well as Wemo’s and TP-Link’s
smart outlets and switches. Compatibility with Honeywell and
Nest thermostats should be introduced in the near future.
Amazon recently unveiled
two additions to its Alexa-based lineup: Amazon Tap, a
portable version of Echo, and Echo Dot, a compact extension of
Alexa that augments existing Echo installations. This range of
speech recognition products undoubtedly points to the fact
that Amazon’s voice-activated platform is at the heart of its
strategy moving forward.
Conversation as a
Platform
An Internet or web bot is a software
application that performs online, automated tasks. Microsoft
is combining bots and speech recognition as the basis of its
“conversation as a platform” strategy. The company recently
unveiled a bot development framework at the Build 2016
developer conference, where CEO Satya Nadella also underlined
the importance of speech recognition. In his words, "human
language is the new user interface layer."
Microsoft aims to offer
developers the necessary code and machine-learning tools to
build intelligent bots and link them to voice-driven digital
assistants, like Cortana. Simplified developer experience and
the possibility of establishing natural language communication
with users are the focal points of the company’s innovative
strategy. The vision behind “conversation as a platform” is to
make bots that understand natural language the new way of
using computers. In Nadella’s words, “bots are like new
applications that you can converse with.”
Crossing the Offline
Frontier
Though offering significant speed and
accuracy, Android’s speech recognition system, as many others,
has one weakness: it is dependent on Internet connection. It
works by recording the user’s voice and transmitting it to a
server, where it is analyzed, converted into text and sent
back to the device. With limited vocabulary, the offline
system currently available is rudimentary, slow, and less
powerful than the online version.
Google is working to
change this scenario. The search giant recently unveiled a
research paper on an innovative system that is seven times
faster than real-time and works without data connection. The
company describes it as “a large vocabulary speech recognition
system that is accurate, has low latency, and yet has a small
enough memory and computational footprint to run faster than
real-time on a Nexus 5 Android smartphone."
The embedded system uses
various machine-learning techniques and features an advanced
acoustic model, based on approximately 3 million anonymous
utterances extracted from voice search traffic (approximately
2,000 hours). In order to achieve enhanced accuracy, it was
exposed to noise and reverberation using samples extracted
from YouTube videos and environmental recordings.
The application is
remarkably small, weighting a little over 20 MB. Besides
multiple compression techniques, the system incorporates both
dictation and voice commands into a single module, an
innovative configuration that enables reduced computational
footprint.
Google’s offline system
has already been tested on a Nexus 5 and experts believe that
improvements should be incorporated in the not so distant
future. Crossing the offline frontier is, therefore, an
important trend in voice-activated software innovation.
Start-Up Innovation
The wealth of opportunities surrounding
speech and voice recognition has triggered major action at the
start-up scene. Palo Alto, California-based Wit.ai, for
instance, helps developers build a speech interface for their
applications or devices. Acquired by Facebook in January 2015,
Wit.ai’s open platform is at the forefront of intelligent
language processing. With over 10,000 users, its software is
constantly learning and expanding human language capabilities.
Founded by developers of
Apple’s Siri, San Jose, California-based Viv Labs is the
creator of Viv, a personal artificial intelligence assistant
that can not only understand voice commands but also automate
users’ requests. Intended to become a ubiquitous brainpower
behind apps, devices, and machines, Viv’s remote artificial
intelligence service aims to enable developers “to create an
intelligent, conversational interface to anything.” The idea
is to develop a type of artificial intelligence app store upon
which third-party developers can build a personalized layer.
Viv is designed to go
further than other virtual assistants and overcome their
limitations. For instance, while Siri is only capable of
performing tasks that are explicitly programmed, Viv can teach
itself, combining the users’ personal preferences and its web
of connections to perform virtually any task.
Also focused on powering
a new generation of voice-driven applications, San Francisco,
California-based MindMeld is yet another example of innovation
in allying speech recognition and artificial intelligence
technology. Founded in 2011, the company has developed a
platform that provides infrastructure and customization
options for users to create unique and intelligent voice
interfaces for their apps and devices. Named by MIT one of the
50 smartest companies of 2014, MindMeld is currently working
with major online services, including music streaming service
Spotify.
Start-ups working with
voice-activated technologies can greatly benefit from R&D
tax credits, particularly due to the burden of payroll costs,
which are typically their largest expense. The R&D Credit
has now been expanded to apply credits in excess of income
taxes to FICA tax liability. Notably, even if a company
was unprofitable and had no tax liability, the credit can be
taken against payroll taxes for start-up businesses with less
than $5,000,000 in gross receipts. This offset is capped
at $250,000 per year, over a five-year period.
As such, the R&D Tax
Credit now allows this payroll tax to be taken directly
against FICA taxes, and does not require general income tax
liability for the company to utilize the credit amounts. The
company will realize significant tax benefits regardless of
not generating a profit. Most importantly, this credit
directly affects the payroll amounts incurred by the start-up.
University Research
Numerous U.S. academic institutions are
also fueling natural language processing innovation. The
following examples shed light on current trends in
voice-activated technology R&D.
Carnegie Mellon University
As the result of over 20
years of work, researchers at Carnegie Mellon University have
developed CMUSphinx, an open source toolkit for speech
recognition. Designed specifically for low-resource platforms,
it uses state-of-the-art algorithms that allow for highly
efficient speech recognition. In addition to speech decoders,
Sphinx encompasses software for acoustic model training,
language model compilation, and a public domain pronunciation
dictionary. Sphinx works as the basis for a wealth of speech
recognition research. Examples include Open Ears, which
supports not only English, but also Spanish, Mandarin, French,
German, and Dutch, as well as ILA, a fully customizable,
teachable voice assistant.
University of Texas at Dallas
Researchers from the
Erik Jonsson School of Engineering and Computer Science at UT
Dallas are currently developing speech recognition tools
capable of understanding human emotion. The idea is to create
innovative algorithms that identify spontaneous behaviors in
speeches and interpret the underlying process of
externalization of emotions. Associate professor of electrical
engineering Dr. Carlos Busso anticipates the use of
emotional-aware technology in several areas, including
security and defense, next-generation advanced user interface,
health behavior informatics and education.
Johns Hopkins University
Established in 1992, the Johns Hopkins Center for Language and
Speech Processing (CLSP) is an interdisciplinary initiative
aimed at advancing knowledge and technologies for language
processing. Examples of ongoing research include automatic
speech recognition in reverberant environments, which focuses
on creating accurate systems capable of working in noisy
environments with varying acoustic conditions and microphone
configurations.
Massachusetts Institute of Technology
Mobile applications are a highly promising field for speech
recognition technology. An interesting example is the work of
researchers from MIT’s Computer Science and Artificial
Intelligence Laboratory (CSAIL), who have applied advanced
language-processing technology to develop a nutrition-logging
system. Though a proven way to lose weight, logging
nutritional information at every meal is a time-consuming and
tedious task. Aimed at helping people that struggle with
obesity, the innovative speech-controlled application allows
users to verbally describe the contents of a meal. It then
reviews the description and automatically retrieves pertinent
nutritional data from the U.S. Department of Agriculture
database.
Conclusion
Voice is bound to become the premier means
of communication between man and machine. Recent advancements
in voice and speech recognition systems are only a glimpse
into the wealth of opportunities that lie ahead. Director of
the Johns Hopkins Center for Language and Speech Processing
Hynek Hermansky addresses this promising future by saying:
“Just as the early successes of the Wright brothers in flight
by machines heavier than air immediately spurred a frenzy of
research in relevant aeronautic technologies that gave rise to
the current air travel industry, the early successful
deployments of language technologies only suggest enormous
possibilities of truly human-like language interactions with
machines.” R&D tax credits can have a strategic role in
supporting innovative companies succeed in the emerging,
voice-driven world.