Blog

Quantity in Search of Quality: Los Alamos National Laboratory Makes a Case for Improving Algae Genome Data

The potential for advancing algal biofuels and bioproducts relies on using algae strains that are best suited for industrial production.

Bioenergy Technologies Office

June 15, 2022

Author: Sheila Van Cuyk, Laboratory Relationship Manager for BETO Programs at Los Los Alamos National Laboratory

Read Sheila's bio ►
Meet the other bloggers ►
Return to Bioprose blog ►

The potential for advancing algal biofuels and bioproducts relies on using algae strains that are best suited for industrial production. Genomic sequence data—the functional information in the DNA of a specific organism such as algae—can reveal the genes and regulatory mechanisms that control how a given strain grows and responds to stress. By screening genomes from a vast array of diverse algae, scientists can unlock the secrets of how to cultivate rapidly growing, high-quality strain compositions. High-quality genomes include no gaps in their sequence and accurately reflect all of the DNA in the strain.

The importance of genomic information to optimize biomanufacturing, a process that uses the growth of plants and/or micro-organisms (e.g., algae, yeast, or bacteria) to create bioproducts, is well known and widely accepted. However, analyses by scientists from the Los Alamos National Laboratory (LANL) suggest that, as the availability of algae genomes expands, the data about novel algae genome sequences is becoming increasingly unreliable and may leave out critical information about the strains’ DNA. Although there are more algae sequences available now than ever before, the quality of the genomic data published in literature is increasingly inconsistent and full of gaps and mistakes. This lack of quality can misrepresent what genes—and functions—are available in a given species.

Microalgae culture growing in a custom photobioreactor system designed and built at Los Alamos National Laboratory. Image courtesy of Los Alamos National Laboratory

Data, Sequencing, and Databases Make Algae More Accessible

The natural variety of algae springs from its vast genetic diversity and complexity. The identification of genes and pathways in different strains presents many opportunities for the development of biofuels, bioproducts, and even therapeutics. By identifying key genes through genomic sequencing, new species of algae can be tapped for these applications. For the last decade, genome sequencing has become “democratized” as faster, less expensive sequencing machines have become more readily shareable through publicly available databases. Such databases make it possible to quickly analyze a wide breadth of algal biodiversity.

“We hope to use this data to better understand algal biology and evolution,” said LANL evolutionary biologist Erik Hanschen. “We also hope to discover novel proteins, biochemical pathways, and untapped natural products by studying these algal genomes.”

Not All Genome Sequencing Is Created Equal

Hanschen, along with Blake Hovde, LANL computational biologist and Applied Genomics Team Leader, and Shawn Starkenburg, LANL deputy group leader, recently published a technical paper and a review article on the current state of public databases and the accepted methodology for assessing genomes. The researchers evaluated algal genomes for contiguity, or the completeness of the genome, as well as gene content.

In addition to concerns about contiguity, the authors describe a benchmarking tool called Benchmarking Universal Single-Copy Orthologs (BUSCO) that helps evaluate gene content. This tool helps determine how many genes from a well-curated gene set exist in a given species’ sequence, demonstrating whether that sequence is high or low quality. By comparing against the known reference data, novel genomes can be assessed for quality by tracking how many of the known genes appear in the novel sequence data: high-quality genomes will include a significant number of known genes while lower quality genomes have fewer of these genes.

“Algae strains of interest that have low-quality genomes are more likely to have overlooked genes and pathways, information which may be critical to the question we’re investigating,” said Hanschen. “A genome with an incomplete read of the genes gives a misleading view of the actual genetic pathways available and functioning in the strain.”

Improving Public Databases: When More Is Sometimes Less

BUSCO has proved to be a helpful quality assurance tool for specific algal lineages, such as Chlorophyta for green algae and Stramenopile for brown algae, but it is not reliable for all algal genomes. To improve the quality of public databases, the researchers offered additional strategies, such as using specific sequencing technologies that assemble longer contigs and scaffolds whenever possible. Until additional lineage-specific databases are developed, the researchers promote the use of the Eukaryota database of BUSCO genes, as it is intended for all Eukaryotes. However, it is clear that lineage-specific datasets are the most valuable and reliable.

Hanschen emphasized the need for high-quality data sets, “Such high-quality data sets, which could look like completed genomes or even provide missing annotations, will be really useful to our own work, but also provide a roadmap for others to produce similarly high-quality data sets.”

The priority for future algal genomics is clear: the quality of data is just as important as the quantity of data.

Funding and mission

This research was funded by the U.S. Department of Energy Office of Energy Efficiency and Renewable Energy Bioenergy Technologies Office. The work supports the Complex Natural and Engineered Systems and Science of Signatures capability pillars.

References:

Erik R. Hanschen, Shawn R. Starkenburg. The State of Algal Genome Quality and Diversity. Algal Research 50 (2020) 101968. https://doi.org/10.1016/j.algal.2020.101968

Erik R. Hanschen, Blake T. Hovde, Shawn R. Starkenburg. An Evaluation of Methodology to Determine Algal Genome Completeness. Algal Research 51 (2020) 102019. https://doi.org/10.1016/j.algal.2020.102019

Dr. Sheila Van Cuyk

Dr. Sheila Van Cuyk is the Laboratory Relationship Manager for Bioenergy Technology Office programs at Los Alamos National Laboratory (LANL). She is a scientist in the Bioscience Division, LANL’s Biofuels Program Manager in the Applied Energy Program Office, and a National Security and Defense Program Manager for Global Security.

Sheila has a background in molecular biology and environmental engineering and has over 15 years of experience developing innovative interdisciplinary solutions to complex problems. Prior to her current role, she fulfilled an Intergovernmental Personnel Act billet as a Program Manager at the Department of Homeland Security, Science and Technology Directorate working in the areas of biological threat detection and biosurveillance.

Sheila completed her M.S. and Ph.D. in environmental engineering from the Colorado School of Mines and her B.S. in biology from the College of William and Mary.

Meet our other bloggers ►
Return to Bioprose blog ►

Tags:

Bioenergy
Bioproduct Production
Biotechnology
Genomics
Research, Technology, and Economic Security

More by this author

Dr. Sheila Van Cuyk

Laboratory Relationship Manager, Los Alamos National Laboratory