The R&D Tax Credit Aspects of AI Vision Technology
AI-Vision-Tech
There is no doubt about it: it is an increasingly
visual world. In a time when everyone has a camera in their pockets,
access to visual data has reached unprecedented levels. Recent
technological developments promise to use the wealth of visual
information currently available to enable groundbreaking applications.
Deep-learning techniques are revolutionizing image processing as
machine vision moves beyond a rules-based, linear approach and into the
new era of artificial intelligence. The present article will discuss
the revolutionary power of artificially intelligent vision systems,
highlighting current trends and challenges ahead. It will also present
the R&D tax credit opportunity available to support companies
engaged in visual intelligence innovation.
The
Research & Development Tax Credit
Enacted in 1981, the now permanent Federal Research
and Development (R&D) Tax Credit allows a credit that typically
ranges from 4%-7% of eligible spending for new and improved products
and processes. Qualified research must meet the following four criteria:
- Must be technological in nature
- Must be a component of the
taxpayers business
- Must represent R&D in the
experimental sense and generally includes all such costs related to the
development or improvement of a product or process
- Must eliminate uncertainty
through a process of experimentation that considers one or more
alternatives
Eligible costs include U.S. employee wages, cost of
supplies consumed in the R&D process, cost of pre-production
testing, U.S. contract research expenses, and certain costs associated
with developing a patent.
On December 18, 2015, President Obama signed the
PATH Act, making the R&D Tax Credit permanent. Beginning in 2016,
the R&D credit can be used to offset Alternative Minimum tax for
companies with revenue below $50MM and for the first time,
pre-profitable and pre-revenue startup businesses can utilize the
credit against $250,000 per year in payroll taxes.
45
Billion Digital Eyes
An August 2017 study by LDV Capital predicts that
the number of cameras in the world will at least triple by 2022, adding
up to staggering 45 billion. This massive surge in visual-based
technology will come from the emergence of new “camera-hungry”
products, such as autonomous cars and augmented reality glasses, as
well as from the addition of new functionalities to already widespread
devices. For instance, LDV predicts that, in five years, smartphones
could have as much as thirteen cameras, which would allow for 360
degree, 3D video making as well as augmented reality images. Smart home
products and security systems will also contribute to the increasing
number of cameras, which is made possible by a steep drop in unit
prices.
Vision-based artificial intelligence (AI) will
undoubtedly be a major trend in the new era of all-seeing digital eyes.
Access to massive visual data combined with groundbreaking
machine-learning techniques will allow algorithms to learn and evolve,
becoming the basis of a growing number of innovative AI services.
Neural
Networks and Beyond
For years, images were just too complex for
algorithms to reliably work on. The uniqueness of millions of pixels
combined into singular patterns seemed to be an unsolvable riddle,
which was beyond the capacity of computer-vision technology. However,
improvements in processing power combined with access to large amounts
of visual data allowed for the emergence of deep-learning techniques,
which inaugurated a new era in image processing.
Inspired by our brains’ interconnected neurons, the
mathematical functions of so-called deep neural networks perform
remarkably well when working with complex images. The algorithms are
able to learn discrete tasks by spotting patterns in larges sets of
data. For instance, by analyzing thousands of dog photos, they learn to
recognize a dog. University of Washington’s MegaFace Project is an
interesting example of recent advancements in neural network image
processing. When asked to match two images of the same person among 1
million face images, the system achieved 75 percent accuracy in
first-time guesses and more than 90 percent when 10 options were
allowed.
In addition to achieving greater accuracy than
traditional computer vision, deep-learning techniques offer superior
versatility. Its algorithms are less purpose-specific and its
frameworks can be re-utilized across various cases. As neural networks
grow in complexity and scale, the range of vision-based AI services
reaches impressive widths. However, there is still a long way to go in
the path towards truly intelligent vision systems. Some specialists
believe that neural networks and related techniques will be soon
considered small advances when compared to innovation that is yet to
come.
The Issue of
Generalization
Despite considerable improvements in
image-processing performance, generalization remains an issue to be
tackled. In most vision-based AI systems, object identification depends
on the angle being portrayed. In other words, there is a recurrent
inability to recognize familiar objects seen from unfamiliar angles.
Aiming to overcome the shortcomings of existing systems, Geoffrey
Hinton, creator of the neural network approach and Google employee, is
now proposing a new mathematical technique called capsule network. The
idea is to make vision systems more human-like, allowing them to “see”
not only two dimensions, as it happens in neural networks, but three.
Equipped with three-dimensional perspective, AI systems would be able
to accurately recognize familiar objects from any angle and thus allow
for considerably better generalization.
The Issue of
Motion
Despite remarkable advances made so far, the ability
to correctly identify dynamic activities also remains a challenge. Most
AI video-processing applications do not interpret actions but rather
rely on recognizing objects in static frames. This is the case of the
recently launched Google Cloud Video Intelligence, a machine-learning
application programming interface designed to detect and classify
objects in videos.
Though enabling major gains in productivity,
particularly when it comes to searching through vast libraries of video
content, the reliance on static frames is a limitation. Truly
intelligent video processing must be able to go beyond the
identification of video content and into what is actually happening on
screen. Aiming to advance towards this goal, the Massachusetts
Institute of Technology and IBM released, on December 2017, the Moments
in Time Dataset, a major compilation of videos annotated with details
of the activities being performed. This is the most recent of various
efforts to increase access to tagged videos, which included Google’s
release of 8 million YouTube videos and Facebook’s ongoing Scenes,
Actions, and Objects project. Temporal context and transfer learning
are two of the main research focuses moving forward.
The Issue of
Capabilities
Over recent years, significant improvements in
device capabilities, including computing power, memory capacity, power
consumption, image sensor resolution, and optics, have enhanced the
performance and cost-effectiveness of computer vision. Even so, further
accuracy gains will require enormous amounts of computing resources in
both training and inferencing stages. Senior VP of product management
for Qualcomm Technologies Raj Talluri points out that “going from 75%
to 80% accuracy in a vision-based application could require nothing
less than billions of additional math operations.” He further
underlines that vision-processing results are dependent on image
resolution, which is a particularly crucial aspect in applications
designed to detect and classify objects in the distance, such as
security cameras. Higher resolution means an increase in the amount of
data being processed, stored, and transferred.
Aiming to overcome existing limitations of
capability, Qualcomm Technologies is pioneering a hybrid
vision-processing implementation. The idea is to combine classic
computer vision algorithms – considered “mature, proven, and optimized
for performance and power efficiency” – with accurate and versatile
deep-learning techniques. Security cameras, for instance, could rely on
computer vision to detect faces or objects and apply deep learning to
process only the smaller segment of the image in which the face or
object was detected. The hybrid implementation uses about half of the
memory bandwidth and requires significantly lower CPU resources than
pure deep-learning solutions.
Innovative compute architecture can also contribute
to greater processing performance and power efficiency. For instance,
executing deep-learning inferences on a DSP can yield considerable
latency reductions in object detection when compared to a CPU.
Similarly, edge computing, or running algorithms and neural network on
the device itself, can also help lower latency and bandwidth
requirements while offering greater privacy and security as compared to
cloud-based implementations.
Public
Safety and Surveillance Applications
Recent technological advances have shed light on the
possibility of using advanced software to overcome human limitations
and supplement human judgment in public safety and surveillance
applications. Artificially intelligent vision systems can enable
unprecedented levels of detail and personalization in addition to
making security and law-enforcement work considerably more efficient.
Facial recognition is maybe the most widespread
application of vision-based AI so far. The New York Department of Motor
Vehicles recently announced that special facial-recognition technology
was used in the arrest of over 4 thousand people charged with identity
theft or fraud since 2010. The software, which compares new drivers’
license application photos to images on a database, illustrates how law
enforcement and public safety can benefit from AI innovation.
Police body cameras are also a potential field for
visual AI applications. Axon, formerly Taser International and
headquartered in Scottsdale, Arizona, has signaled its intention to
incorporate AI into its products. The largest distributor of police
body cameras in the U.S. acquired two AI companies in early 2017,
envisioning ambitious new functionalities, which include an automated
system for police reports. Axon CEO Rick Smith points out that the
ability to process video and audio will allow police to spend more time
doing their jobs rather than performing menial tasks, such as
note-taking and report-writing.
Another important supplier of body-worn cameras,
Motorola is working with deep-learning startup Neurala to integrate AI
capabilities that will help police officers in their search for objects
and persons of interest. These groundbreaking applications are expected
to significantly reduce the time and effort necessary to perform
recurring tasks, such as finding a missing child or identifying a
suspicious object. Based in Boston, Massachusetts, Neurala has
developed “at the edge” learning capabilities that enable real-time
applications of AI. Their patent-pending technology differs from
traditional learning processes that require lengthy training for the AI
engine. Built upon an incremental learning logic, the Lifelong Deep
Neural Network (L-DNN) enhances accuracy and latency and eliminates the
risk of “catastrophic forgetting”, the most significant limitation to
real-time AI so far.
Security cameras are yet another promising field of
AI innovation. In April, Intel Movidius and Dahua Technology USA, a
subsidiary of Chinese Dahua Technology, announced a new line of cameras
that will offer advanced video analysis features, including crowd
density monitoring, stereoscopic vision, facial recognition, people
counting, behavior analysis, and detection of illegally parked
vehicles. The groundbreaking solution represents an important step in
bringing AI and machine learning into real-world products, particularly
due to advances in power demand. The Movidius Myriad 2 Vision
Processing Unit (VPU) delivers a massive amount of deep neural networks
while requiring less than a single watt of power. This radically
low-powered computer vision allows for natively intelligent cameras
that do not require cloud-based resources.
Located in San Mateo, California, computer-vision
startup Movidius was acquired by Intel in 2016. The company has worked
with Chinese video Internet of Things (IoT) firm Hikvision, which uses
the innovative Myriad 2 VPU to run deep neural networks and perform
high-accuracy video analytics on the cameras themselves. By processing
data on edge devices the innovative cameras can detect anomalies in
real-time and thus allow for groundbreaking advances in creating safer
communities, better transit hubs, and more efficient operations.
The
Smart Home
Vision-based Internet of Things (IoT) applications
are also an important area for innovation. When it comes to the
smart home, in particular, artificially intelligent vision systems can
offer new levels of security, safety, comfort, and entertainment.
Safety and
Security
Potential safety and security applications include
front doors that recognize authorized people and unlock for them while
remaining locked for unfamiliar faces. Vision-based alarm systems could
similarly distinguish members of the family from unauthorized
strangers. AI indoor security cameras could go even further in
protecting users and preventing accidents by issuing alerts when an
elderly person falls or a child is approaching a hot stove.
IoT pioneer Nest has recently launched a smart
security camera with facial recognition. The Nest Cam IQ identifies
people it has been introduced to and sends alerts to the owner’s
smartphone in case it sees someone it doesn’t recognize.
Potential future applications may include “seeing” how many people are
in a room adjusting temperature and lighting accordingly and even
“noticing” when the fridge is low on certain items and adding them to a
grocery list.
Various companies around the globe are willing to
get ahead in the vision-based smart home market. Amazon has recently
unveiled its AWS DeepLens camera, the world’s first deep-learning video
camera for developers. Expected to reach the market in April 2018, the
innovative camera is seen rather as a training device that can use AWS
machine-learning service SageMaker to perform tasks such as object and
face detection, activity recognition, etc.
Entertainment
On the entertainment front, vision-based systems
could allow for unprecedented, seamless personalization. A TV that
“sees” and recognizes who is watching could turn on a tailored
interface and even block inappropriate content when a kid is in the
room. Emotion-recognition could take customization even further, by
adapting content to the user’s mood. Even though this kind of
application is still in its early days, various companies are beginning
to incorporate vision-based AI into their IoT entertainment products.
There is growing expectation that Apple may include face recognition in
the HomePod smart speaker by 2019.
Located in Boston, Massachusetts, Affectiva has
developed technology that allows for standard webcams to recognize
different human emotions. The emotion measurement solution is largely
visual-based; it uses facial cues, gestures, and psychophysical
responses to identify the users’ mood. Affectiva points out that
existing technology has lots of IQ, but no “emotional intelligence”.
The startup envisions, however, that IoT devices will soon became
mood-aware, as AI and optical sensors become widespread.
Autonomous
Vehicles
As autonomous-driving capabilities grow in number
and complexity, the crucial role of visual intelligence technology
becomes ever more clear. Cameras are used to monitor the road as well
as to collect information on the occupant’s behavior. In addition to
mapping and localization, smart cameras estimate distances to
surrounding objects, read traffic signs, and detect pedestrians. A much
less expensive alternative to LiDAR, cameras are a preeminent feature
of driverless cars been tested so far. By incorporating cameras at all
angles, these innovative vehicles can maintain 360-degree, real-time
view of their surroundings. For instance, Tesla’s “Full Self-Driving
Hardware” includes eight cameras while Uber’s autonomous vehicles have
20 of these devices.
The work of Palo Alto, California-based Nauto has
shed light on the crucial role of visual intelligence in
autonomous-driving functionalities. Founded in 2015, the company has
developed a groundbreaking, affordable solution to make existing cars
safer and smarter. Equipped with a powerful AI engine, the Nauto’s
bi-directional dashcams are able to detect what is happening on the
road ahead and within the vehicle. Through a combination of image
recognition, motion sensors, and GPS, the system alerts drivers if
there is a problem on the way or a dangerous distraction. According to
Nauto, the innovative solution can generate up to 37 percent reduction
in collision incidents.
Wide-Ranging
Automation
A recent report by Markets and
Markets forecasts that the machine vision market will grow from $8.12
billion in 2015 to $14.43 billion in 2022, experience a compound annual
growth rate of 8.15% between 2016 and 2022. Automation is the major
driver of this growth, with innovative efforts ranging from
manufacturing and healthcare to consumer goods and robotics. The
following sections present some outstanding initiatives in these fields.
I. Quality Inspection: Based
in Los Altos, California, Instrumental specializes in optical detection
systems for manufacturing. The company has recently added an AI
capability to its products, enabling them to automatically recognize
and triage abnormal units in the assembly line. The Monitor technology
uses vision-based AI to detect differences between products and learn
whether those are actual defects or not. Compared to traditional
computer vision approaches, the technology behind Instrumental’s
solution requires significantly less training, golden samples, or
rules. Quality inspection is just one of the many illustrations
of how vision-based AI can greatly contribute to more efficient,
automated manufacturing.
II. Medical Diagnosis: Using a
database of nearly 130,000 images, researchers at Stanford University
trained an algorithm to diagnose skin cancer. Inspired by the need for
universal access to healthcare, they built upon an existing algorithm
developed by Google to perform object identification. The team used
machine-learning techniques to expand these identifiable categories to
include melanomas and carcinomas. The algorithm matched the performance
of board-certified dermatologists, showing great promise for widespread
application. Researchers envision making a smartphone-compatible
version of the software that would bring reliable diagnosis to
patients’ fingertips.
III. Robotics:
Founded in March 2016, PerceptIn has been at the forefront of visual
intelligence. Located in Santa Clara, California, the robotics startup
has recently unveiled its Ironsides vision systems, a full vision
system that combines hardware and software for real-time tracking,
mapping, and path planning for autonomous robots. The innovative
solution can be integrated into virtually any device, enabling a wide
array of autonomous applications. As the number of robotized
products increase, visual intelligence and vision-based perception
systems become a major area for innovation.
IV. Consumer Goods:
Google recently unveiled a new concept for artificially intelligent
consumer cameras. Google Clips uses advanced machine-learning
algorithms to recognize people and decide when to take photos. Its
objective is to allow for hands-free, authentic pictures that capture
the “feeling” of each particular moment. The innovative device
automatically takes seven-second, 15-frames-per-second photos to be
later filtered and selected by the user via a mobile app. The frequency
of use is directly proportionate to the algorithm’s success, as it
learns to recognize people and moments that matter to the user.
V. Virtual Reality and Augmented Reality:
Recent advances in real-time image recognition, expanding network
bandwidth, and improvements in processing and storage are opening the
way for innovative solutions that combine vision-based AI and
VR/AR. The retail environment is a particularly promising field
for innovation. Founded in 2015, Oak Labs has created a smart
dressing-room mirror that allows customers to browse different
products, toggle different types of lighting and even makes suggestions
of complementary pieces. Early adopters include Ralph Lauren and
Rebecca Minkoff.
Conclusion
Visual
intelligence technology brings together three highly disruptive trends:
artificial intelligence, computer vision, and analytics. From public
safety and surveillance to smart homes and autonomous vehicles, there
is a wide range of potential applications for artificially intelligent
vision systems. Innovative companies engaged in visual intelligence
R&D should take advantage of the tax credit opportunity available
to support their efforts and increase their chances of success.