The Human Genome Project, that aimed to map and sequence the entire human genome, began in 1990 and ended in 2003 with a starting budget of over $1.5 million. It provided us, for the first time, a means to access invaluable data through genes – evolution patterns, diseases and their treatments, gene mutations and their effects, anthropological information, etc. Now, powerful software and analysis tools are being built that can decode an entire genome in a matter of hours. Data analytics is quickly becoming one of the most important branches of science that can be applied in the biotech industry.
DNA sequencing generates a huge amount of data that needs to be analyzed with care, as the information and conclusions drawn are applicable in a whole range of industries from medicine to forensic science. It involves data science at various levels:
The first step is storage of DNA sequencing data. If we were to sequence the genome of every living thing from a microbe to a human, then we need to have powerful data science tools that help us store, track and retrieve relevant information.
Annotation is the process of adding notes to specific genes in the sequence. Tools are being built to put …