Common questions in molecular biology: What is DNA barcoding and why is it important?
DNA barcoding is used in two similar yet divergent ways in biology. While important to different areas of biology, both are modeled on the concept of the Universal Product Codes (UPCs) used to identify products in supermarkets, with specific DNA sequences acting as the unique identifier. The importance of each is discussed here, along with the NanoString adaption of barcoding into a research tool that can be used for biomarker validation and biomarker development.
DNA barcodes based on naturally-occurring sequences
The original use for DNA barcoding referred to naturally-occurring DNA sequences from a signature region of the genome for making species-level identifications (Hebert). In this version of DNA barcoding, species-specific sequences from unidentified individuals are compared to reference sequence libraries compiled by experts (Hebert). The species-specific sequences act as DNA barcodes, analogous to UPC symbols on retail products.
Kress and Erickson identified three criteria a gene region must satisfy “to be practical as a DNA barcode: (i) contain significant species-level genetic variability and divergence, (ii) possess conserved flanking sites for developing universal PCR primers for wide taxonomic application, and (iii) have a short sequence length so as to facilitate current capabilities of DNA extraction and amplification (Kress).”
The initial genomic region that fit these criteria was the mitochondrial cytochrome C oxidase 1 (COI) subunit gene. The COI gene region was discovered to be conserved across species of fish with enough species-specific sequence for identification purposes. The pertinent COI sequence is a 648-bp region, which, when amplified by the appropriate PCR primer set, can be sequenced for comparison to libraries of reference sequences (Ratnasingham and Hebert, 2007).
Naturally-occurring DNA barcodes allow efficient categorization of unidentified organisms, a method which, theoretically, can be applied to all species of life.
Subsequent genomic regions identified for use as DNA barcodes include the internal transcribed spacer rRNA (ITS1-2) regions in fungi (Druzhinina), the ribulose-bisphosphate carboxylase (rbcL) and maturase K (matK) region in plants, and the 16S rRNA in prokaryotes (McGee). These naturally-occurring DNA barcodes allow users to efficiently categorize unidentified individuals using information from small genomic regions, a method which, theoretically, can be applied to all species of life (Kress). Barcoding therefore has the potential to “accelerate our discovery of new species, improve the quality of taxonomic information, and make this information readily available to non-taxonomists and researchers outside of major collection centers” (Miller 2007).
Engineered DNA barcodes
The idea of using DNA sequences as unique identifiers inspired scientists to use genetic engineering and nucleic acid sequencing technologies to develop high-throughput lineage tracking methods (Johnson).
Engineered DNA barcodes can be used to track cell lineages within a population over multiple generations, known as barcode lineage tracking.
Used similarly to the method for identifying species, engineered DNA barcodes are used to identify signature-tagged mutants from libraries (Mazurkiewicz), the source of sequences from multiplexed high-throughput pyrosequencing (Parameswaran), and insertions from viral vectors (Chen). Engineered DNA barcodes are also used to track cell lineages within a population over multiple generations. Known as barcode lineage tracking (BLT), these techniques are used to characterize T-cell recruitment, trace cellular differentiation during organismal development, study the clonal history of metastasis in cancer, screen and characterize mutant libraries, and study evolutionary dynamics (Johnson).
Essentially, engineered DNA barcodes as unique identifiers are applicable to any technique using recombinant DNA.
NanoString expands the definition of DNA barcoding
Expanding on the idea of engineered DNA barcoding, technology from NanoString incorporates RNA and fluorescent molecules with DNA to form probes more aptly called molecular barcodes in the nCounter® Pro Analysis System.
NanoString’s molecular barcodes utilize fluorescently-labeled RNA as the identifier molecule containing a unique color sequence for each gene of interest. Annealed to a complementary backbone of single-stranded DNA ligated to target-specific sequence, the fluorescent RNA-DNA hybrid molecule constitutes the reporter half of a probe pair (Geiss). The second half of the probe pair, the capture probe, consists of a gene-specific sequence adjacent to that of the reporter probe and a series of 3’ repeats (Geiss).
The reporter and capture probes, when hybridized to the RNA target molecule, create a tripartite structure, which can be identified by the specific fluorescent signal of the bound reporter. Designed to target specific gene subsets, NanoString’s probe panels enable quantification of target RNA transcripts by a direct count of the fluorescent signals (Geiss). The elegance of the fluorescent molecular barcoded probes used in nCounter Pro eliminate several issues seen with previous gene expression analysis methods (Geiss).
NanoString’s molecular barcoding technology is highly adaptable. Used in bulk gene expression analysis in nCounter Pro, modifications to the original barcode concept have been made to enable transcriptomic analysis in a spatial context. Molecular barcodes for both GeoMx® DSP and CosMx™ SMI utilize the concept of specific unique sequences of colored fluorescent molecules to identify and quantitate target transcripts in the three-dimensional tissue space down to the subcellular level.
Molecular barcode technology has thus proven to be adaptable for use in multiple techniques, including gene expression profiling, for advancing biology research.