Science

A species goes extinct when there are none of its kind left. In other words, extinction is about small numbers, so how does big data help us study extinction? Luckily for us, each individual of a species carries with it signatures of its past, information on how connected/ isolated it is today, and other information on what may predict its future, in its genome. The last fifteen years have witnessed a major change in how we can read genomes, and information from genomes of individuals and species can help better plan their conservation. 

All life on Earth harbours genetic material. Often called the blueprint of life, this genetic material could be DNA or RNA. We all know what DNA is, but another way to think of DNA is as data. All mammals, for example harbour between 2 to 3.5 billion bits of data in every one of their cells. The entire string of DNA data is called the whole genome. Recent changes in technology allow us to read whole genomes. We read short 151 letter long information bits many, many times, and piece together the whole genome by comparing it to a known reference. This helps us figure out where each of these 151 letter long pieces go in the 3 billion letter long word. Once we have read each position on an average of 10 or 20 times, we can be confident about it. If each genome is sequenced even ten times and only ten individuals are sampled, for mammals each dataset would consist of 200 to 350 billion bits of data!

Over time, the genome changes because of mutation, or spelling errors that creep in. Such spelling errors create variation, or differences between individual genomes in a population (a set of animals or plants). Similarly, large populations with many individuals will hold a variety of spellings or high genetic variation. Since DNA is the genetic blueprint, changes in the environment can also get reflected in these DNA spellings, with individuals with certain words in their genome surviving better than others under certain conditions. Changes in population size often changes the variety of letters observed at a specific location in the genome, or variation at a specific genomic position. Migration or movement of animals into a population adds new letters and variation. Taking all these together, the history of a population can be understood by comparing the DNA sequences of individuals. The challenge lies in the fact that every population faces all of these effects: changes in population size, environmental selection, migration and mutation, all at once, and it is difficult to separate the effects of different factors. Here, the big data comes to the rescue.

Photo Credit: Dr Anubhab Khan

Genomic data has allowed us to understand how a population has been affected by changes in climate, and whether it has the necessary genomic variation to survive in the face of ongoing climate change. Or how specific human activities have impacted a population in the past. We can understand more about the origins of a population. How susceptible is a population to certain infections? Or whether the individuals in a population are related to each other. Some of these large datasets have helped identify if certain populations are identical and should be managed together or separately. All of these questions help in the management and conservation of a population.

We have worked on such big genomic datasets for tigers, and our research has helped us identify which populations of tigers have high genomic variation and are more connected to other populations. We have identified populations that are small and have low genomic variation, but also seem to have mis-spelled or badly spelled words, or a propensity of ‘bad’ mutations. We have identified unknown relationships between individuals within populations and have suggested strategies that could allow these isolated populations to recover their genomic variation. It has been amazing to peek into animals lives through these big data approaches, and we hope these types of genomic dataset will contribute to understanding how biodiversity can continue to survive on this Earth.


Uma Ramakrishnan is fascinated by unravelling the mysteries of nature using DNA as tool. Along with her lab colleagues, she has spent the last fifteen years studying endangered species in India.She hopes such understanding will contribute to their conservation. Uma is a professor at the National Centre for Biological Sciences.

Dr. Anubhab Khan is a wildlife genomics expert. He has researching genetics of small isolated populations for past several years and has created and analyzed large scale genome sequencing data of tigers, elephants and small cats among others. He keen about population genetics, wildlife conservation and genome sequencing technologies. He is passionate about ending technology disparity in the world by either making advanced technologies and expertise available or by developing techniques that are affordable and accessible to all.

This series is an initiative by the Nature Conservation Foundation (NCF), under their programme ‘Nature Communications’ to encourage nature content in all Indian languages. To know more about birds and nature, Join The Flock


Interested in cryptocurrency? We discuss all things crypto with WazirX CEO Nischal Shetty and WeekendInvesting founder Alok Jain on Orbital, the Gadgets 360 podcast. Orbital is available on Apple Podcasts, Google Podcasts, Spotify, Amazon Music and wherever you get your podcasts.