Evolution of sequencing: from its origins to the present day

20 June 2022

Nowadays we hear more and more about sequencing. Just think about SARS-CoV-2 variants characterization, from alpha to omicron, which are assigned using the latest sequencing technologies. DNA sequencing allows for the reading of genetic code through the identification of the 4 nitrogenous bases (Adenine, Cytosine, Thymine, Guanidine) that characterize DNA and genes.

In 1866 Mendel hypothesized the existence of hereditary determinant traits, called genes, after observing crosses of plants or animals having distinctive phenotypic traits that segregated into definite relationships.

What is the origin of these genes and how are they made? History teaches us that many scientific steps were necessary to answer all these questions. In 1869 Friedrich Miescher isolated material from the nucleus of leukocytes that named nuclein. In 1919 Phoebus Levene isolated the 4 nitrogenous bases bound to sugars that constitute DNA. However, it was unclear how all the genetic information could be codified only with four nucleotides. Only in the 1953, thanks to the discovery of the double helix structure of DNA by James Watson and Francis Crick it was revealed that the succession of these 4 nitrogenous bases could be the key of genetic information. Several years later in 1966 other two scientists Marshal Nirenberg and Philip Leder deciphered all the codons of DNA translation revealing the genetic code.

The bacteriophage phi X 174 (or ΦX174) was the first organism to be sequenced by Frederick Sanger in 1977. His innovative sequencing method was able to sequenced all 5000 nucleotides constituting organism genome. Since the publication of Sanger’s method a sequencing revolution started. New methods and instruments were development in order to increase the number of bases sequenced and decrease the cost for base for each run. The sequencing method improvement were applied to obtain the complete sequence of complex organisms. All was driven by a competition between scientists who wanted to first decipher the “human genome”.

In 2001, the 3 billion bases of human genome were published by two groups of researchers, the HGP (Human Genome Project) consortium that published a draft in Nature (February 15), and Celera that published a draft in Science (February 16).

The “sequencing boom” started and new investments and technologies grew exponentially. In 2001 the cost of sequencing for a single human genome was about 100 M dollars. Today it is possible to sequence own DNA with 1000 dollars.

Thanks to the reduction in sequencing costs, this technology has no longer been tied up to scientific research but it was suitable also for applied research. In human health, sequencing is utilized in diagnostic and therapeutic field and we are moving to personalized medicine. In the forensic field sequencing is applied to the tracking of criminal profiles. At the same time, in food science sequencing is used for DNA tracking and therefore for food traceability, food quality control and detection of commercial fraud. DNA sequencing is also used to characterize the genetic variations occurring in animal and plant populations and to select more productive species able to adapt to environmental changing. Finally, through the massive sequencing of the microbial communities it is possible to identify all the microorganisms present in different matrices such as soil, food, animal or human biological samples.

Sequencing method has also gradually evolved and many protocols are now available to better determine the organism sequence (genomics), to analyze transcript expression (transcriptomic), to evaluate modifications of regulatory elements such as small-non-coding RNA, differential DNA methylations and histone modifications capable of regulating gene expression (epigenomic), or to analyze microbial communities (metagenomics).

Sequencing produced a high amount of data that were analyzed and interpreted by developing ad hoc bioinformatics tools and computing resources tools.

Nowadays everything is “sequenceable”, using specialized sequencing platforms. In the era of “Big Data” a huge amount of sequencing data is also available but not always informative.  The role of a modern scientist is to study and correctly interpret sequencing results in order to engage a systematic experimentation with a purpose of answering question thanks to experience and knowledge acquired over years.

Since 2012 with the “GenHome” project, many researchers from the Institute of Agricultural Biology and Biotechnology at the National Research Council IBBA-CNR, have developed innovative protocols for sequencing and bioinformatics analysis able to best characterize animal, plant and microbial communities.

The aim of the project was to acquire skills, instrumentation and computing resources necessary to study different biological aspect in the agri-food sector. In these ten years we studied the polymorphisms in small ruminants and their influence on adaptability to climate change, the epigenetic variation in small and large ruminants and its impact in animal climate adaptation and reproduction, the microflora and microfauna of the soil and its interaction with the environment through metagenomics profiling. Metagenomic analysis was also emploied to study the bacterial composition of the rumen and the possible contribution of the microbiota in animal health and in products of animal origin linked to functional and organoleptic characteristics of a food. Finally, genomic and transcriptomic characterization of different plant of interest was used to understand the biochemical, physiological and molecular bases of the adaptation of the plant to environmental variations.

The evolution of sequencing has greatly expanded the amount of information available to the entire scientific community. It is now up to research to make sense of this information for a better understanding of the biological observed phenomena.

Author: Emanuele Capra


Search website