The R&D Tax Credit Aspects of Bioinformatics

By and

Bioinformatics        As biology becomes an increasingly data-intensive field, computers have gained instrumental importance to a wide variety of life science domains. Bioinformatics has enabled the generation and processing of useful biological data, providing greater depth and dimension of breadth to biological investigation. This emerging field of science has made the challenging task of reading complex biological data much faster and more efficient, enabling the blooming of revolutionary initiatives such as gene-based drug discovery and development.

        The present article will discuss the double tax credit opportunity brought by the tsunami of data that has washed over life science. Federal R&D tax credits are available both for companies engaged in developing new and improved bioinformatics solutions and for businesses that incorporate bioinformatics into their innovative efforts.

The Research & Development Tax Credit

        Enacted in 1981, the Federal Research and Development (R&D) Tax Credit allows a credit of up to 13 percent of eligible spending for new and improved products and processes. Qualified research must meet the following four criteria:

•    New or improved products, processes, or software
•    Technological in nature
•    Elimination of uncertainty
•    Process of experimentation
        Eligible costs include employee wages, cost of supplies, cost of testing, contract research expenses, and costs associated with developing a patent. On January 2, 2013, President Obama signed the bill extending the R&D Tax Credit for 2012 and 2013 tax years.

Bioinformatics and the Challenge of Quantitative Biology

        Bioinformatics combines computer science, statistics, mathematics, and engineering in an effort to collect, classify, store, and analyze biochemical and biological information. This interdisciplinary science has gained increasing notoriety for the development of methods and software tools that help manage and understand complex biological data.

        Noteworthy examples of biological studies that incorporate computer programming into their methodology include the fields of genetics and genomics. In fact, the consolidation of bioinformatics can be largely explained by the advent of publicly available genomic information, made possible by the Human Genome Project.

        The successful determination of the sequence of the entire human genome, which amount to nearly three billion chemical base pairs, opened the way for using genomic information in the understanding of diseases and the identification of new molecular targets for drug discovery. Without bioinformatics and its ability to study biological data at a molecular scale, however, none of this would have been possible.

        Though originally destined to the analysis of biological sequences, bioinformatics has evolved to encompass various fields of study, such as gene expression, structural biology, among others. Common tools of bioinformatics include the application of analytical algorithms in soft computing, artificial intelligence, data mining, and imaging processing.

Put simply, the objectives of bioinformatics are threefold: 
  1. To manage data in a way that facilitates researchers’ access to existing data and the submission of new data as they appear.
  2. To create tools and systems that enable proper and efficient analysis of complex biological data. 
  3. To develop mechanisms that convert data into biologically meaningful information.

The Bioinformatics Market

        Prospects are very positive when it comes to the growth of the bioinformatics industry. According to a 2012 report by Transparency Market Research, the global bioinformatics market is expected to reach $9.1 billion in 2018.

        Estimated at $2.3 billion in 2012, the market was forecasted to experience a compound annual growth rate of 25.4% in the following four years.  This dramatic upsurge is attributed to the increase in applications across various industries, including medical and clinical diagnosis, biotechnology, biopharmacology, pharmacology, and agriculture.

        The report further predicts that the bioinformatics platform segment will contribute to approximately 54 percent of the total market growth during this period of time. Similarly, the bioinformatics services segment should register considerable growth. This trend can be largely explained by research outsourcing from large pharmaceutical industries, which increasingly try to reduce R&D costs and time by subcontracting bioinformatics knowledge, management tools, platforms, and services.

University Research

        Leading U.S. universities and research institutions are engaged in bioinformatics R&D. The multiplication of academic efforts has set the basis for unprecedented advancements in this field and paved the way for a wider application of bioinformatics tools and methods.

Columbia University
        Working within the Columbia University Department of Systems Biology, the Center for Computational Biology and Bioinformatics aims to catalyze research at the interface of biology and computational and physical sciences. Its overarching objective is to dissect, model, and interrogate the molecular interaction networks that give rise to physiological and pathological cellular phenotypes. The Center is engaged in developing improved methods for analyzing large amounts of biological data and has thus created numerous algorithms, including tools for predicting protein structure and interactions, for analyzing gene sequence and expression data, and for studying genetics and evolution.

        Within the extensive list of bioinformatics software and databases created at Columbia University is the innovative geWorkbench, a user-friendly, state-of-the-art platform that provides an integrated suite of tools for the analysis and visualization of genomic data.

        There are currently more than 70 geWorkbench modules available, including parsers for most common genomic data file formats; gene expression analysis algorithms for supervised and unsupervised learning; sequence homology, pattern discovery, and promoter region prediction; gene interaction network inference and visualization; 3-D protein modeling; among others.

New York University
        New York University’s Center for Health Informatics and Bioinformatics (CHIBI) has worked on multiple domains, including microarray, proteomics, genetics-genomics, cancer, and next generation sequencing informatics. The CHIBI focuses particularly on converting the wealth of information generated by high-throughput technologies of molecular profiling into clinical progress. Main challenges include the design of experiments, the extraction of reliable de-noised biological signal, and the downstream biological interpretation of the data. 

        When it comes to cancer informatics the Center is devoted to harnessing information from multiple types of molecular profiling for practical applications to the attack on cancer. Scientists work in close collaboration with the Perlmutter Cancer Center, contributing to translational and clinical research.

        In the field of proteomics, or the large-scale study of proteins, CHIBI’s efforts include the development of algorithms for discovery of novel proteomic biomarkers; the development of algorithms and protocols for proteomic profiles for diagnosis and personalized medicine; and the creation of software for automated analysis of MS data.

Harvard University
        The Center for Health Bioinformatics (CHB) at the Harvard School of Public Health supports interdisciplinary research involving the computational analysis of complex relationships between genes and their environment as well as basic biological and quantitative sciences. 
        As part of the Center for Stem Cell Bioinformatics, CHB provides analytical support and tools for storage, sharing and analysis of stem cell research data for the Harvard Stem Cell Institute (HSCI). HSCI is the largest collaboration of its kind and seeks to bring new treatments to the clinic and new life to patients suffering from a wide range of illnesses, including leukemia, lymphoma, diabetes, heart disease, fibrosis, and nervous system diseases such as Alzheimer’s and Parkinson’s.

        In this context, CHB has supported the development of the Stem Cell Discovery Engine (SCDE), which stands out as an interesting example of how bioinformatics can foster innovation and cutting edge research. The SCDE is an integrated platform that allows users to consistently describe, share, and compare cancer and tissue stem cell data. Envisioned as a community resource, SCDE aims to encourage contributions of tools and new data sets from researchers around the globe.

        Together with the University of Oxford, UK, HSCI has recently led an international standard-compliant data sharing effort that will facilitate the collection, curation, management, and reuse of datasets in fields ranging from genetics to stem cell science and environmental studies. This major advance for bioinformatics, known as the ISA Commons, will allow scientists from different fields to coordinate with each other’s research, comparing results and finding relationships between otherwise incompatible data. 
Stanford University
        The 2014 U.S. News Best Graduate Schools Rank places Stanford University as the number one program for computer science, biological sciences, genetics, genomics, and bioinformatics. Stanford’s Computer Science Department is devoted to developing innovative computational tools that will help rewrite the textbooks on biology and make personalized medicine a reality. Researchers are engaged in cutting edge initiatives in the fields of machine learning, algorithms, data visualization, databases, systems, etc.

        Research efforts include comparative genomics, gene finding, networks of protein interactions, high throughput sequencing and assembly, population genetics, as well as the use tools from digital systems design, computational photography, and VLSI circuits to analyze biological systems.

        Established within the Stanford University School of Medicine, the Biomedical Informatics Training Program has the objective of enabling researchers to design and implement novel quantitative and computational methods that solve challenging problems across the entire spectrum of biology and medicine.

        Specific fields of application include translational bioinformatics and the understanding of diseases from basic research; clinical informatics and the development of tools and methods applied directly to patient care; public health informatics and the challenges facing health systems and populations; and, finally, imaging bioinformatics, which addresses intelligent management, interpretation, and annotation of biomedical images.

Bioinformatics R&D

        "The introduction of mathematical methods into biology is without question going to revolutionize it just as it did revolutionize physics in the days of Isaac Newton".   These were the words of mathematician and hedge-fund founder James Simons when commenting on his recent $50 million donation to the Cold Spring Harbor Laboratory, located on Long Island, NY.

        The gift aimed at enlarging and institutionalizing the Simons Center for Quantitative Biology, an interdisciplinary cooperation among mathematicians, physicists, engineers, biologists, and computer scientists whose research focuses on population genetics, genetic disorders and diseases, as well as disorders like cancer, autism, and depression.

        Dr. Simon’s donation is an indicator of the enormous potential of bioinformatics. Such potential can also be seen in the recent multiplication of corporate bioinformatics R&D efforts, particularly in the field of genetic analysis.

Bioinformatics and Genetic Analysis

        According to a 2011 Goldman Sachs report, “bioinformatics for genetic analysis will be one of the biggest areas of disruptive innovation in life science tools over the next few years”. Current studies show that, due to progress made in biochemistry, microfabrication, optics, and high-performance system design, DNA sequencing is advancing at an unprecedented rate of nearly 10-fold improvement in cost and throughput every 18 months, largely outpacing the famous Moore’s Law. This avalanche of genomic information has encouraged the development of innovative and robust data analysis solutions through an unprecedented multiplication of specialized companies, as shown in the next paragraphs.
        Based in State College, PA, SoftGenetics is a provider of effective, biologist friendly, easy-to-use genetic analysis software tools designed to meet the needs of today's genetic researcher and diagnostician.

        The company has created a range of leading edge software powertools for genetic analysis including the Mutation Surveyor, a DNA sequencing tool; the NextGENe, for the analysis of sequencing data; the Geneticist Assistant, a web-based tool designed to identify potentially pathogenic variants associated with certain diseases; the GeneMarker, a “biologist friendly” genotyping software; the GeneMarker HID, a human identity software for forensic profiling applications; and the JelMarker, an image reading and conversion software, developed in response to a growing demand for software that can analyze fluorescence, chemiluminescence, and autoradiography gel image files.

        With a 30 years-long history in meeting the needs of life scientists, DNAStar has considerably broadened its offerings of desktop software tools over the last few years in an effort to keep up with the ongoing sequencing revolution. The Madison, WI company has traditionally supported sequence assembly and analysis and has recently developed new products that contemplate next-generation sequencing applications, such as RNA-Seq and ChIP-Seq, and microarray gene expression. DNAStar also offers cloud-based solutions, which increase mobility of users, give access to powerful hardware, and foster remote collaboration.

        Founded in 2009 as a spinout from Stanford University, DNAnexus emerged from the necessity of addressing the need for computing infrastructure in DNA sequence analysis. The company has recently raised $15 million at a series C venture funding round, which involved investors such as Claremont Creek Ventures, First Round Capital, Google Ventures, and TPG Biotech.

        Designed to be the DNA data platform of the future, the DNAnexus solution combines cloud-computing infrastructure with scalable systems design and advanced bioinformatics. On October 2013, in partnership with the Baylor College of Medicine and Amazon, DNAnexus performed the largest cloud-based genomics analysis to date. The effort processed 3,751 whole human genomes and 10,771 exomes, used 2.4 million core-hours of computational time, and generated 430 terabytes of data.

        While major progress had been made over the last years, 2014 marked the beginning of a new era: the $1,000 genome era. In January, San Diego-based Illumina launched its new sequencing machine capable of delivering up to five genomes in a day, with a cost of just under $1,000. The HiSeq X Ten is expected to enable the analysis of complete genomic information from massive sample populations, paving the way for an unprecedented understanding of the genetics of human disease.

Bioinformatics, Therapeutic Development, and Open Innovation

        According to the National Institutes of Health, transforming a promising molecule into an approved drug takes more than 14 years. Not surprisingly, this costly and time-consuming task has seen poor success rates. In an effort to accelerate the therapeutic development process, researchers have turned to drug repurposing strategies, which consist in applying known drugs and compounds to new indications.

        In this context, a computational, quantitative approach is instrumental to accelerating the identification of candidate compounds and helping extract valuable animal and human information in clinical trials, without the need for actual subjects. While drug repurposing initiatives are already underway for diseases like cancer, some suggest it may be the answer for emerging global threats such as the Ebola crisis.

        In an effort to foster the identification of compounds that may interact with one or more targets or pathways not anticipated by a single mechanism-driven hypothesis, Eli Lilly has opened access to its internal panel of disease-relevant phenotypic modules, which interrogate complex cellular systems instead of specific targets.

        The Phenotypic Drug Discovery (PD2) initiative enables the screening of multiple mechanisms and targets at the same time. According to the company, bioinformatics has been crucial to the functioning of this project, particularly through the development of advanced assay technologies and computational tools that provide participants with effective means of evaluating their compounds. 

        Also part of Eli Lilly’s Open Innovation Drug Discovery program is TargetD2, which focuses on one of the most anticipated applications of genetic analysis, namely, targeted therapeutic development. The project was created to evaluate disease hypotheses through the discovery and clinical testing of molecules designed to interact with certain genomic targets supposed to be involved in the emergence of the pathological condition.

        Computational and informatics tools are at the heart of this and any other targeted drug discovery initiative, as they assist scientists in the design, selection, and optimization of molecules for specific enzymes, receptors, and other bioactive proteins.


        We are living a new era in biological research. An era in which biology generates so much data that it is virtually impossible to advance without incorporating methods and tools from other science areas, such as statistics, computer science, and mathematics. Bioinformatics can potentially revolutionize the fields of medical diagnosis, drug discovery and development, agriculture, among many others. Companies engaged in bioinformatics R&D should take advantage of federal tax credits to help them realize the countless benefits of quantitative biology.

Article Citation List



Charles R Goulding Attorney/CPA, is the President of R&D Tax Savers.

Andressa Bonafé is a Tax Analyst with R&D Tax Savers.

Similar Articles
The R&D Tax Credit Aspects of the 21st Century Cures Act
The R&D Tax Credit Aspects of Mechatronics
The R&D Tax Credit Aspects of Emotion-Recognition Technology
The R&D Tax Credit Aspects of Immunology
The R&D Tax Credit Aspects of Arthritis
The R&D Tax Credit Aspects of Asthma
The R&D Tax Credit Aspects of Telemedicine
R&D Tax Credits Enhance Life Science Impact Investing
The R&D Tax Credit Aspects of Pediatric Hypertension
The R&D Tax Credit Aspects of Parkinsons Disease Technology
The R&D Tax Credit Aspects of Legal Medical Marijuana
Reaching for the Moon: The R&D Tax Credit Aspects of Conquering Cancer
The R&D Tax Credit Aspects of Zika
The R&D Tax Aspects of CRISPR-CAS9
The R&D Tax Credit Aspects of Bioelectronic Medicine
The R&D Tax Credit Aspects of Liquid Biopsy Testing
The R&D Tax Credit Aspects of Cryogenics
The R&D Tax Aspects of Microbiome Research
The R&D Tax Aspects of Mouse Models in Clinical Research
The R&D Tax Aspects of Precision Medicine
The R&D Tax Aspects of Synthetic Biology
The R&D Tax Aspects of Pharmaceutical Packaging
New Standards and Regulations Create R&D Tax Credit Opportunities for the Pharmaceutical Packaging Industry
The R&D Tax Credit Aspects of Orphan Drugs
The R&D Tax Credit Aspects of Blood
The R&D Tax Aspects of Regenerative Medicine
The R&D Tax Credit Aspects of Environmental Remediation
The R&D Tax Credit Aspects of Gastro Technology
The R&D Tax Credit Aspects of Multiple Sclerosis
The R&D Tax Credit Aspects of the Internet of DNA
The R&D Tax Credit Aspects of Concussion Technology
The R&D Tax Credit Aspects of Schizophrenia
The R&D Tax Aspects of Cancer Treatment
The R&D Tax Aspects of the New FDA Mobile Apps Requirements
The R&D Tax Credit Aspects of the Medical Software Industry
The R&D Tax Aspects of Computer Enabled Human Identification
The R&D Tax Credit Aspects of Heart Disease
The R&D Tax Credit Aspects of Biological Drugs
The R&D Tax Aspects of Respiratory Diseases
The R&D Tax Credit Aspects of Cognitive Computing
The R&D Tax Credit Aspects of Cosmetics
The R&D Tax Credit Aspects of Major Life Science Benefactors
The R&D Tax Credit Aspects of the Health Cloud
The R&D Tax Credit Aspects of Autism
The R&D Tax Credit Aspects of 3D Bioprinting
The R&D Credit Aspects of Cell Therapy
The R&D Tax Credit Aspects of Sugar Substitution and Reduction in Food Products
The R&D Tax Credit Aspects of Gluten-Free Foods
R&D Tax Credit Aspects of Sleep Innovation
R&D Tax Credit Aspects of Medical Robotics
The R&D Tax Aspects of Neuroscience
The R&D Tax Credit Aspects of Hearing Technology
The R&D Tax Credit Aspects of Eye Disease
The R&D Tax Credit Aspects of Dermatology Innovation
The R&D Tax Aspects of Lab Equipment and Instrumentation
The R&D Tax Aspects of Salt Reduction
The R&D Tax Aspects of Generic Drugs
The R&D Tax Credit Aspects of Novel Uses for Genetically Engineered Organisms
R&D Tax Credit Fundamentals
R&D Credits for Companies Combating Superbugs
The R&D Tax Credit Aspects of Alzheimers
The R&D Tax Aspects of Diabetes
The R&D Tax Aspects of Brain Mapping
R&D Tax Credit Aspects of Human Body Weight