Blog

The barcode and the qr-code of Life

Every Species has DNA so it is reasonable to use DNA to identify every species

The actual practice of using DNA sequences for systematics began to take shape in the 1970s and 1980s as DNA sequencing technology became more accessible to researchers.

The seminal work was conducted by Cann et al., 1987 using mitochondrial DNA to trace human evolution from 147 individuals across five geographic populations. Their analysis posited that all these mitochondrial DNAs stemmed from a single woman, often referred to as Mitochondrial Eve, who lived approximately 200,000 years ago, likely in Africa. This study established a strong precedent for using DNA as a tool for molecular systematics.

The proposition for using the Cytochrome c Oxidase I (COI) gene as a universal DNA marker for species identification was first made by Paul Hebert, a Canadian molecular biologist, along with his colleagues in 2003.

They found that COI profiles, derived from a relatively small number of organisms from broad taxonomic groups like phyla or orders, could effectively assign newly analyzed taxa to the appropriate phylum or order, indicating its potential for species discrimination.

COI gene conserved regions allow for universal primer design, coupled with the genetic variability across species and set the stage for the establishment of DNA barcoding as a standardized method for identifying species.

Subsequent studies demonstrated the effectiveness of the COI barcode in accurately identifying species across a wide variety of animal taxa and helped build the case for its broader adoption as the standard DNA barcode (see this, this and this).

Biological and Practical characteristics of COI contribute to its effectiveness as a DNA barcode.

  • The standard COI barcode region is about 648 base pairs long, which is a manageable length for sequencing, while still providing ample information for species discrimination. It is important to remember that DNA sequencing and analysis are automatable and scalable, making it cost-effective and time-efficient.
  • COI’s exhibits variability among species while having conserved flanking regions of COI, where primers can bind to initiate PCR. Key to enable precise species discrimination and identification.
  • As a mitochondrial gene, COI is present in many copies per cell, which increases likelihood of obtaining good quality sequences, even from degraded or minute tissue samples. Also the maternal inheritance of mitochondrial DNA reduces the complexities associated with recombination in nuclear DNA and provides clearer lineage.
  • But maybe most importantly in making COI gene unique is its rate of evolution. It is typically faster than nuclear genes, allowing for the accumulation of sufficient sequence divergence at the species level, while maintaining sequence conservation at higher taxonomic levels.

Mitochondrial genes like COI evolves faster than nuclear genes due to several factors:

  1. mtDNA lacks protective histones, increasing susceptibility to mutations.
  2. Mitochondria can replicate more frequently than nuclear DNA, raising the mutation rate.
  3. The effective population size of mtDNA is smaller, speeding up mutation fixation.
  4. mtDNA does not undergo recombination, allowing faster accumulation of deleterious mutations.
  5. Maternal inheritance causes a bottleneck effect, accelerating mitochondrial evolution.
  6. mtDNA mutations may undergo positive selection due to their role in cellular respiration.

This last point is important.

Like other mitochondrial protein-coding genes, COI is primarily under purifying selection, which weeds out harmful alleles to maintain the existing genetic makeup. This selection reflects in the rarity of amino acid substitutions, indicating a non-neutral evolutionary pressure to retain a specific sequence.

However, COI evolution patterns may vary across different taxa. The rates of substitutions, insertions, deletions, and particularly rare mutations, vary in different studies and taxa, playing a crucial role in gauging genetic divergence between species, thereby aiding species identification.

The discrepancy between the fast mutation rate of mitochondrial genes and the rare amino acid substitutions in COI due to purifying selection is reconciled by mutation and selection dynamics. The higher mutation rate in mtDNA amasses genetic variation, while purifying selection removes harmful mutations, preserving crucial functional aspects of the COI gene across species. This balance ensures the COI gene’s stability as a DNA barcode, maintaining a relatively conserved sequence within species, yet allowing sufficient variability for accurate species discrimination.

The growth in COI adoption swiftly established it as the standard barcode, becoming indispensable for large-scale biodiversity studies and international collaborations. This standardization catalyzed a universal method for species identification, setting off a virtuous cycle. It led to the establishment of extensive databases like the Barcode of Life Data Systems (BOLD), which now houses COI sequences from a myriad of species, thereby streamlining the comparison and identification processes further.

The qr-code of life

While COI serves as a valuable barcode of life, it exhibits certain limitations due to evolutionary variances across different groups. To address these, researchers have explored other mitochondrial genes to provide a more robust and precise methodology for species identification

Historically, rRNA genes, especially those grouped together, have been used in molecular systematics, providing a rich dataset for analyzing species relationships. Notably, mitochondrial 16S and 12S ribosomal RNA (rRNA) genes have been significant. Depending on the taxonomic group, the 12S rRNA gene has often been found to be more suitable than the 16S rRNA gene in supporting the monophyly of certain clades. Additionally, the 18S rRNA gene has been utilized as a key marker for molecular systematics in several groups. (see Yang et al., 2014; Chan et al., 2020, 2022).

Like COI, these genes are chosen due to their conserved nature, which allows for the development of universal primers, and their variable regions, which help in distinguishing between different species or groups. Moreover, ribosomal RNA genes has critical role in the ribosome and is present in all known life forms, making them also universal markers for molecular systematics.

This assembly of genes forms the ‘QR-Code of Life’. Much like a QR code encapsulates diverse data within a single matrix, each gene in the QR-Code of Life contributes unique informational facets, rendering a more nuanced portrayal of species identity and evolutionary relationships.

The ultimate solution for biodiversity identification.