Blog

The Systems Biology Knowledgebase (KBase) Advances Science as a PuRe Data Resource

KBase enables researchers to explore complex genomic and other biological data quickly and efficiently.

Office of Science

May 29, 2024
minute read time
A group of more than 30 people of various genders and races, almost all wearing matching sweatshirts that have the Kbase logo on them. They are standing in a stone plaza with a small wooden structure behind them and further back, a sculpture of a preying mantis on a roof.
The KBase team works to provide researchers with tools to explore diverse and complex data rapidly and efficiently. At their annual All-Hands meeting, the group discusses past successes, current issues, and future projects.
Image courtesy of Thor Swift

As part of the Year of Open Science, the Department of Energy (DOE) Office of Science is highlighting our Public Reusable Research (PuRe) Data Resources. The PuRe Data Resources are authoritative sources that make data easier to find, access, and reuse across the broader scientific community. In this article, we highlight the Systems Biology Knowlegebase (KBase).

 

The Human Genome Project was a 10-year effort led by the U.S. government. It culminated in the sequencing of the first complete human genome. Completed in 2003, this effort initiated genome sequencing efforts of various other species, leading to an explosion of data in biological research. Suddenly, scientists were able to generate vast amounts of genetic data that hinted at what organisms were capable of. Availability of genomic information completely changed our view of how plants, animals, and microbes coexist. The Department of Energy’s (DOE) Systems Biology Knowledgebase (KBase) is helping scientists organize and use this data effectively.

Advances in genetic sequencing in the late 1990s to early 2000s emphasized the fact that larger organisms (including plants and humans) truly live in a world dominated by microbes. At the same time, sampling and sequencing techniques advanced significantly. Now, scientists can better understand microbes in their natural environments.  

Decoding these interactions between microbes and other organisms requires a holistic viewpoint. This viewpoint must connect an organism’s genes to an organism’s role in the ecosystem, individually and collectively. This approach requires many different types of data to capture the complexity of these ecosystems. This new understanding, combined with advancements in technology, has resulted in more and more complex data streams of increasing magnitude. The complexity and volume of these data are challenging our ability to process, analyze, and make sense of it all. How do we reflect these data back to the level of microbial communities? How can we understand what each microbe is doing among their neighbors and how they interact with their environment?

KBase enables researchers to explore diverse and complex data rapidly and efficiently. It provides analysis tools that use DOE’s powerful computing resources. As researchers explore data and new insights, they can share their data and analyses with collaborators and the broader research community. KBase’s free, online interface reduces access barriers for anyone interested in biological data science. 

KBase has over 28,000 users from all over the globe. They have contributed over 500 terabytes (TB, or one million megabytes) of their own data in addition to 12 TB of reference data. All the analyses run by KBase users add up to over five million central processing unit (CPU) hours of total computing time. These analyses range in size from processing a few megabytes to hundreds of gigabytes (GB, or one thousand megabytes). They may describe a single genome or show the power of analyzing thousands of genomes. 

The KBase platform is designed to support the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles. It also has tools to track and report credit where credit is due. Any contributions to the platform that are made public are credited back to the data owner. This results in KBase’s ability to report on the impact of shared data and analyses, including research publications. KBase makes the entire data lifecycle public and transparent. This approach encourages scientists to draw conclusions that are supported by data and analyses.

KBase’s powerful analysis tools and transparent data systems are accelerating discovery and increasing trust in scientific results.

Michael Cooke

Michael Cooke is a senior technical advisor for the Office of the Deputy Director for Science Programs. He leads the coordination of efforts to develop and steward community open research data resources across the Office of Science.

Tags:
  • Biotechnology
  • Genomics
  • National Labs
  • Research, Technology, and Economic Security
  • Explore Biology at DOE (Biology)