Genome To Life Report

BRINGING THE GENOME TO LIFE

- Energy Related Biology in the New Genomic World -

A New Research Program
for the
Department of Energy’s
Office of Biological and Environmental Research
recommended by the
Biological and Environmental Research Advisory Committee
June 2000

In response to the charge letter of Dr. Martha Krebs, November 24, 1999

OVERVIEW

Mission:

To develop the capability of understanding and predicting the behavior of single cells and multi-cellular organisms in response to biological and environmental cues – the next major challenge of the Genome Project. In this knowledge lie solutions to key Department of Energy (DOE) missions related to energy, the environment and human susceptibilities.

Vision:

The remarkable successes of the Human Genome Project, the roots of which can be traced to an initiative by the DOE begun in 1986, provide the foundation upon which to build powerful new tools and resources that will lead to some of the most important biological discoveries in the history of life on Earth. For the first time we have the opportunity to understand cells, their components and the organisms they comprise, in enough detail to predict, test and understand responses of a biological system to its environment.

The DOE Microbial Cell Project has already launched the fundamental first step by pioneering the study of one-celled microbes. The Bringing the Genome to Life Project takes another big step in building upon the Microbial Cell Project – understanding the regulation and behavior of complex microbial communities and multicellular systems and the responses of biological systems to environmental cues, understanding that may well be essential for our future survival in this fragile world. In addition, this research promises unimaginable discoveries for biotechnology, pharmaceuticals and medicine. The succeeding steps of the project will lead ultimately to new tools for promotion of human health, for new therapies and for new predictive capability of human susceptibilities to a wide range of environmental insults.

BERAC Recommended Program Goals:

Characterize regulatory elements of genes by extensive, phylogenetic sequence comparisons and by cataloging their protein regulators;
Identify genes and their functions by contributing to global mutagenesis and sequence comparisons between organisms of different phylogenies;
Exploit structure for determining function;
Characterize gene expression networks by developing global-array technologies and other technologies for analysis of gene products;
Model gene expression networks, including response to developmental cues, physiological states and environmental insults;
Aggressively develop the informatics tools and computing power that are essential for each of these areas of investigation.

The goals are very ambitious and undoubtedly represent the next many decades of biological research. Attainment of these goals will require aggressive support of technology development through interdisciplinary efforts, especially in the area of bio-informatics.

A decade commitment to this project with an investment of $200 million per year will provide a substantial start, but the progress depends heavily on how effectively the scientific community can scale research and analysis technologies to a global genomic or system scale and the level of resources made available. However, from the beginning, the vast amounts of data that will be publicly available will be of immense value, especially since they will dovetail with and be enriched by data from science funded by other agencies.

Program Strategies, Assumptions, Challenges and Principles:

Overriding Goal – The goal of this long-term research program is to understand biology well enough to be able to understand and predict the behavior and response of biological systems – from cells to organisms. This goal is one of the founding challenges made to the Biological and Environmental Research (BER) program in 1947, i.e., understanding (and, therefore, being able to predict) the health impacts of nuclear materials and other energy related insults.
Key Questions - What are the scientific opportunities and the technical challenges? What is the role of DOE relative to other agencies? How does this research agenda map to the DOE missions? What are the management requirements? What cross discipline training is needed? How do we engage labs and academia to train these specialists? What are the ethical, legal and social issues associated with the new knowledge and biological capabilities that will arise from such research?
Key Requirements - This new program will require a new genomic or system level approach to biology. It will require new types of scientists, new technical capabilities, and, remarkable computational capabilities that simply do not exist today. For years, biologists have very productively engaged in one-at-a-time approach to study how genes or small networks of genes work. To reach a predictive understanding we cannot afford this pace. We need a global effort to understand whole organism networks.
Building on Past Success – The genome sequencing effort has shown us that high throughput methods and interdisciplinary collaborations are key and multi-institution collaboration is critical. The success of the JGI shows the value of a focused team of scientists working toward achieving common goals. This approach should be pursued when there is clear delineation of suitable, specific problems.
Computation, a Great Unifier – Computational analysis and modeling is fundamental to understanding the complexity of biological systems. Greatly improved computational strategies, tools and resources will be needed. We need to ensure that there is an active linkage between modeling and experimentation.
Central Organizing Principles – The systems approach to biology with a focus on modeling is a powerful driver. Novel facilities are needed to generate and manipulate the new data - data sets that would never get produced by the cottage industry approach to biology.
Unique DOE Opportunities – The DOE should aim for activities that are out of reach of individual investigators or even small teams. There are unique opportunities in interagency coordination, integrated management, new technology innovation and transition to use in production-scale experimental approaches. The DOE needs to create an environment to attract and sustain new research teams through leveraging upon its existing strengths.

DETAILED DISCUSSION OF RESEARCH NEEDS AND OPPORTUNITIES

The following sections contain summaries of many needs and opportunities identified by BERAC for this new research program. These are not, in general, intended to represent DOE-only activities, rather the spectrum of activities that will be undertaken and coordinated across federal agencies. A first priority for BER, following the approval of this program plan, is to develop a detailed action plan using these materials as a guide and with broad input from the scientific community.

Overview:

This project will develop a network of national resources that will cause the development of technology and produce extensive data for simulating and predicting the workings and responses of cells. Knowledge will be created to:

Understand the internal functions of cells through their genes and gene products;
Understand the regulation and interactions of genes and gene products;
Understand the interactions and communications between cells;
Predict biological responses and human susceptibility to environmental stimuli including the insults of chemicals and radiation;
Adapt the sophistication of biological systems to man-made systems and biological machines.

These goals push the envelope of the life sciences, requiring new talents and new interdisciplinary fertilization. They can be achieved in part through partnerships among the DOE National Laboratories, taking advantage of the combined power of the unique capabilities of each. But to realize the full power of the project full collaboration with the academic and commercial communities will be essential.

Biological systems, through evolution, have achieved levels of complexity that are far beyond what has been achieved by engineers; for instance, the complexity of cellular systems and the human brain dwarf our most sophisticated engineering feats.

In order to understand biological systems at this next level, we must engage the most advanced research talents and resources. The potential payoff is enormous: an understanding of the underlying concepts that have resulted from four billion years of evolutionary tinkering will provide tools to revolutionize the study of human health. Similarly, the design principles used by nature to evolve biological systems can be adapted for use by today’s engineers to develop a new generation of man-made biological machines tailored to carry out specific functions such as degrading foreign compounds or synthesizing new medicines.

Beyond Genome Sequencing:

Determining the complete DNA sequence of the human genome, the "Blueprint of Life," provides the first underpinnings of an effort to understand life processes. Comparisons of genomic blueprints for simple and complex organisms show a remarkable conservation of life functions from single cell organisms to complex integrated mammals including humans. This conservation not only demonstrates that all life on earth is related, but it also provides a powerful tool – insight gained into the workings of more simple organisms helps to understand the more complex.

With the sequence of the human genome and of a number of other organisms and many microbes soon to be completed, a full catalog of genes for each will be available for detailed analysis. However, a list of the few thousand genes in a microbe or of the 100,000 genes in humans tells only the first part of a highly complex story. In the case of humans, each of the thousands of unique cell types is distinguished from the others through the particular subset of genes that it expresses as proteins at carefully regulated levels. These levels are programmed to change in response to development, changing environment, aging, damage and threats. The result is a complex orchestration of proteins, RNA molecules, carbohydrates and small molecules that dynamically interact to create the fabric of all life on earth.

The challenge is to move systematically from genome information, the "blueprint" level, to a higher-level understanding of biological function and complexity that will enable us to predict biological responses. Attaining this higher level will require that we understand the complex web of biochemical reactions and interactions that are at the heart of cellular reproduction, energy metabolism, and biological response to both internal and external cues.

A New Commitment: Bringing the Genome to Life Project:

We are in the midst of a remarkable convergence of the biological and the physical sciences that signals the time for a major push to unleash the secrets of bringing the genome to life. Current technologies enable the analysis of biological molecules at a fundamental structural and physical chemical level. The data flow resulting from the application of these tools to the legion of newly revealed genes and gene products is increasing exponentially.

Science is now poised to make the key move from its focus on understanding one pathway at a time (although this approach continues to be very productive and is essential) to the integration of pathways that will lead towards an understanding of living systems. One ultimate test for our understanding of complex biological networks is to model their behavior through simulation and iterative testing. Another test is to directly test sets of genes and their ability to support responses. In the case of microbes, with their simpler genomes, first the minimal gene sets to support cell survival can be defined and tested biologically by the reconstruction of genomes in cells. Then the effect of adding or deleting sets of genes predicted to provide particular functions might be directly assessed. This powerful combination of approaches will lead to a vastly increased understanding of biological networks and to predictive capabilities for many health-related applications.

As this complexity of genes and regulatory information is revealed, science can move forward ever more rapidly with characterization of protein products, gene regulatory mechanisms and complex networks. This will require a continued DOE investment in development of key technologies.

Efficient, high-throughput DNA sequencing is well developed and the DOE has the very effective Joint Genome Institute (JGI) facility that is making major contributions to the genome sequencing effort. Microbial genomes can now be sequenced in a matter of days and larger genomes in tens of months. However, with this efficiency in sequencing comes a crucial need for ever-improved technology for efficient assembly and annotation of sequence – this need for software, data bases and computing power will continue to grow with more and more complex searches and comparisons.

Technology for genomic scale analysis of gene expression at the level of RNA transcripts is just coming on line with the general capability to analyze tens of thousands of gene sequences on a single micro array. Further development is needed in quantitation of expression signals so that small differences in expression of genes in different conditions can be measured reliably. In complex networks even very small perturbations can be very significant. Even more important is the need for improved information handling. Data from expression arrays are multidimensional with maybe 100,000 genes assayed in different cell or tissue types, different genotypic states, different physiological states, different developmental states and at different times after perturbation. Finding and correlating the significant differences in expression becomes a very large problem for which new algorithms and computing power are needed.

Equivalent high-throughput technology for global analysis of protein expression is badly needed. Standard biochemical characterizations of one protein at a time are not going to be sufficient. To reach the required scale technology development is needed in at least two areas of proteome analysis. Emerging mass spectrometric methods and micro arrays with protein specific affinity tags hold great promise but need to be aggressively pursued.

Protein structures are being determined at a rate that in the last few years is vastly improved, thanks to advances in Nuclear Magnetic Resonance (NMR) and synchrotron radiation-based methods and instrumentation. DOE, in partnership with other funding agencies, especially the National Institutes of Health (NIH) and the National Science Foundation (NSF), has played a major role in the synchrotron structural biology revolution. However, there remain significant bottlenecks in overproduction of cloned proteins, their purification and crystal growth. Robotic, highly parallel methods must be found. Improved methodologies for high throughput structure solution and refinement are essential. Also, major developments are needed for analysis of a greater diversity of protein types, most notably membrane-bound proteins.

Determination of the patterns of protein-protein interactions for whole proteomes is key to understanding cellular pathways. In vivo approaches such as "two hybrid" methods are very powerful but suffer from a relatively high noise level from both false positive and false negative results. Rapidly advancing mass spectrometry technologies are able to directly detect complexes and identify the participants. Various micro array methods using chips with protein specific antibodies or aptemers are just beginning to be developed. It is likely that no one technique will provide all the answers and all three will be needed. All of these approaches need further development. Even with clean results from these methods there is no certainty that the observed interactions are biologically meaningful so it is crucial to couple these technologies to biological systems such as microbes or the mouse.

Predictive understanding has to include determination of protein functions. Inference based on amino acid sequence is often a good start. Biochemical characterization is the standard but this is slow and still does not necessarily reveal the real function in vivo. One powerful approach is to test function by gene modification in the mouse. While the current technology is very sophisticated, permitting modification of any gene at will, it is slow and expensive. A complementary approach is through structure and structure-based comparisons. Investment in technology to improve efficiency is very important.

Modeling of biological networks is in its infancy. Interactions of even a few genes in a regulatory pathway, pushes current algorithms and computing power to the limit. Aggressive support of modeling efforts right at the beginning is imperative. Already modelers cannot cope with the influx of gene expression data. This problem will get more severe very quickly with expansion of the breadth of different kinds of biological data. Every effort needs to be made to capture major computing power for this project.

Research Opportunities and Challenges:

Microbial Ecosphere

Over half of the world's biomass is made up of unseen, simple organisms living below the surface of the earth - microbes in the soil and the oceans. The rest of life as we know it is critically dependent on this largely unknown sector of our living world that is fundamental to the Earth's carbon cycle and holds untold new opportunities for discovery. While we know some microbes intimately and now have genome sequences for a few dozen, we have only begun to explore this rich biodiversity. Only a few percent of the microbes can currently be grown in the laboratory, and these are the ones we know most about. Even among these well-studied microbes, 20% of the genes identified in any newly sequenced organism bear no recognizable resemblance to the known genes of other organisms! Given this fact, it is certain that there is much to be gained from sampling biodiversity in the genomes of additional microbes representing different branches of the evolutionary tree and living in diverse environments. In addition, the vast majority of microbes exist in complex communities, interdependent for survival, undoubtedly signaling back and forth in the sparring cycles of evolution. New technologies of PCR (polymerase chain reaction) and micro-arrays give the added power of being able to study these organisms in their natural environments rather than in the laboratory. These creatures and their communities have much in store for us to discover. One real benefit of this work will be a vast collection of new genes that will be a rich source for new antibiotics, therapeutic agents, energy, materials and industrial catalysts. There will also be abundant opportunities to find new genes that can be engineered for bioremediation and for other new biotechnology applications. Clearly it will be fruitful to obtain genome sequences from many representatives of the under-explored pool of microorganisms.

The microbial world is very important to the DOE missions. The DOE has a great opportunity to continue its leadership role in microbial genomics and to expand this effort into a more global effort. It is well positioned to take on the challenge of understanding the minimal set of genes necessary for microbial life and to model this biological system. Also the DOE is poised to understand the complexities and interactions of microbial communities that can play future crucial roles for DOE missions in areas like environmental cleanup and carbon sequestration.

Genetic Variation

The human genome program will deliver the draft human DNA sequence by summer 2000. For the first time, biologists will have the ability to locate and study all human genes. The mouse genome is not far behind. Its completion will provide crucial annotation to the human sequence and will provide resources needed to develop and evaluate new human disease models. Within our individual genomes lies hidden the constellation of gene variants that determines our individual characteristics. These include not only our obvious phenotypic appearance but also our inherited susceptibilities to common diseases, to chemicals, to drugs and to environmental insults. For each of our 100,000 genes there is a set of variants in the human population, some but not all of these are correlated with important biological differences. Understanding the variants of each important gene will lead to a much more clear understanding of human genetic diversity and to a whole new realm of genetic diagnostics and potential therapies. These issues are of particular importance to human health, and, thus, the NIH and private companies are mounting a large effort to assess gene variation and to identify specific sequence variants (SNPS – single nucleotide polymorphisms) that will be of great value for the discovery of the human genes that cause disease.

However, the DOE has a special interest in human variation, in the areas of health effects of radiation, the low dose program and susceptibilities to other environmental stresses. New technologies now give DOE the opportunity to understand the genes involved in these exposures and especially the variation in these genes within the human population that account for individual, specific sensitivities.

Information in Genomes

All genomes contain two kinds of information: coded information for structures and instructions that orchestrate deployment of this coded information. The first kind of genome information codes for proteins – the molecular building blocks of all cells. Knowing the identities and structures of these building blocks is critical and, with genome sequence information in hand, is now approachable. The goals of the new NIH- National Institute of General Medical Science (NIGMS) structural genomics effort is to greatly increase the number of protein structures determined. The DOE should continue and expand its support of infrastructure and pursue technological improvements to automate the front-end methods needed to increase throughput.

Gene Regulatory Networks

The protein-coding DNA sequences, which in mammals make up only 5% of the genome, have until now received the vast majority of attention. The second kind of information in genomes – the other 95% - concerns a major mystery of living systems: How is deployment of these proteins managed to create the particular shape, form, size and function of each cell in each living organism? This information is regulatory – sequences in DNA that control the specific expression of each gene in animals, plants and microbes. The challenge presented by complete genome sequences is to decode and understand this highly evolved, complex system of regulatory networks. We recommend that this new project focus heavily on the regulatory DNA sequences and on the proteins that act through these sequences to elaborate the networks.

Comparative Genomics

The method of choice for locating genes and their controlling sequences within the genome sequence is to take advantage of evolution. Although human and mouse diverged some 80 million years ago, the sequences that represent genes and regulatory regions are well-conserved, while the surrounding seas of DNA sequences are free to diverge with no apparent consequences. One very powerful method of finding the biologically important sequences within the human genome, or any organism, is to compare its DNA sequence with those of genomes of other organisms that are suitably distant in evolution. The computational overlay of all the genome sequences will provide a catalogue of conserved fragments that identify both the protein-coding information and the gene-regulatory regions. The more genomes that are sequenced, the finer the resolution and the greater the ability to define exact start-stop positions of the gene fragments. The key is to have sequence from a broad representation of species that will be evolutionarily important and provide the best comparisons with humans. In addition to the current list of model organisms (mouse, zebrafish, fruitfly, nematode, yeast and bacteria) further choices might strategically include another mammal such as the dog (an excellent medical model that is well placed on the mammalian branch of the evolutionary tree), the pufferfish (Fugu, with most of same genes as humans but packaged in a much smaller genome) and Ciona intestinalis (the most primitive creature known with a notochord thus predating modern vertebrates). Discussions of the evolutionarily important organisms to sequence should be promoted. In any case, it is clear that sequencing costs continue to decrease as capacity increases so that comparative genome sequencing can continue to fuel the explosion in basic biomedical research.

Understanding Gene Function

Another powerful approach to understanding gene function and regulation is to study the biological effects of specific gene mutations in model organisms. For many basic gene functions, yeast, nematodes and fruitflies are good hosts for such studies, but mouse is the clear choice for traits relevant to human susceptibility and disease. Not only is the mouse a good mammalian model for human but well established and powerful technologies are already in hand and a crucial part of the "tool kit" for studying gene function. Genes or predicted regulatory regions in the mouse genome can be removed ("knocked out") or altered at will and the resulting effect can be studied. The corollary human genes are generally sufficiently similar to mouse genes so that they can be effectively studied in the manipulable model organism. Finally, whole genome mutagenesis of mouse allows identification of new genes important for specific aspects of health.

These vital technologies for studying gene function by manipulating mouse genes are already areas of strength for the DOE. They must be strengthened and improved to properly consider the large number of genes being uncovered by the human genome project. This approach is particularly important for understanding the biological responses to low-dose radiation and to other environmental insults - key DOE missions.

Bioinformatics and Computing

Informatics is the glue - the enabling technology – holding together the various components of the project. The data sets from these genomic-scale approaches will be immense and daunting in their complexity. We will need to bring together: gene expression patterns in different cell types in different developmental or physiological states; protein expression patterns in these same cells; protein structures and functions; kinetic parameters; small molecule concentrations; interactions of genes and gene products; responses to external stimuli; and other parameters. However an even greater challenge is posed by the creation of a predictive capability that requires real time analysis of multi-component, interacting systems. These models will be among the most complicated ever created. This is a computational challenge that will push the state of the art in supercomputing. A continued exponential growth in computer and software power will be critical to the success of this undertaking.

Program Implementation:

The Bringing the Genome to Life Project will initially focus on relatively simple one-celled microorganisms. Microbes have 20-50 fold fewer genes than humans, many fewer complex biological interactions that need to be understood, many fewer proteins to analyze and more simple responses to their environment. However, even the more simple cellular systems are exceedingly complex, involving thousands of genes that have evolved to specify and organize the activities of thousands of gene products. Understanding and predicting the complex interactions of even a few thousand proteins and their roles in cellular responses to perturbations will require substantial extensions of current technologies and conceptual breakthroughs in systems analyses.

Even when we have a comprehensive understanding of a simple organism, the move to understanding human life processes, with the added complexity of differentiated cell types and cell-cell communication, is going to be an enormous stretch of technology and intellect that will require multidisciplinary teams, including talents in molecular biology, cell biology, physics, chemistry, engineering, computation and bio-informatics, evolution, ecology and genetics. This new commitment must foster training of a new generation of scientists who can deal with the complexities of biological-systems analysis.

These efforts will require the collection of vast amounts of different types of data, including:

New genome sequences - especially microbes and evolutionarily important organisms;
Gene sequence variability;
Gene function;
Regulatory sequences in DNA and their protein regulators;
mRNA expression patterns;
Global protein expression;
Protein and RNA structures;
Environmental responses.

In many of these areas new technological solutions will be needed for analysis at the global level.

Specific milestones have not yet been identified at this very early stage of project formulation. Obviously, the rate of progress will depend on the level of resources available. However, this BERAC subcommittee believes that, by the end of 10 years, the project should have produced:

Genome sequences from many hundreds of microbes that will give a substantial understanding of their biodiversity and provide the tools for understanding their evolution and community existence and for utilization of a large set of new genes for biotechnology;
Genome sequences from 10 or more higher organisms that will fill in the evolutionary tree of genome sequences and that will be coupled with genetic approaches to give greatly expanded understanding of gene function and gene regulatory sequences;
Characterization of regulatory DNA sequences of 1,000 genes together with their regulatory protein partners through detailed molecular structures;
Highly developed proteome technology permitting global analysis of protein-expression patterns from many organisms under different physiological and environmental conditions;
Informatics tools that can integrate these data sets and model-system responses in both time and space and, at least for microbes, have the capability to simulate cell responses to perturbations;
A "minimal cell" with the minimal set of genes compatible with life;
Algorithms and computing power for modeling the responses of the "minimal cell".

DOE Strengths and Opportunities:

The DOE is well positioned to undertake this enormous project. Through efforts at DOE National Laboratories and DOE-funded research at universities and research institutions, the DOE genome project has developed much of the new technology that will serve as the foundation for this grand challenge in biology. By combining its science and technology expertise and its ability to manage large projects requiring complex technical solutions, the DOE is ready to lead the development of the next generation of technologies needed to meet these demanding challenges.

The Bringing the Genome to Life Project is a superb fit with DOE missions. It brings together key elements: the Human Genome Project, the Microbial Genome Project, Structural Biology, Computational Biology, the Environmental Sciences and the Biomedical Sciences. Importantly, the advent of genome sequences is uniting these previously separate Office of Biological and Environmental Research (OBER) missions, making it possible to consider them as different applications with a single set of over-arching goals.

The unique capabilities of the DOE National Laboratories need to be brought to bear on this venture of the post-sequencing era. These include synchrotron light sources with their associated beam lines and diffraction facilities that are essential for the structural biology aspects of the project. These aspects of the effort should be closely coordinated with NIGMS. Additionally, the labs have high-field strength NMR and advanced mass spectroscopy facilities that are at the forefront of the proteomics effort. The labs also have high-resolution electron transmission, scanning and soft X-ray microscopy facilities. These experimental tools are often coupled with unique computing resources. The labs operate some of the nation’s largest and most advanced computers (such as the National Energy Research Scientific Computing Center). These computational resources are distinct from those of the defense programs. The JGI has stimulated programs in computational biology and genomics, particularly the modeling of biological systems that use those computational facilities. Engineering resources are also among the capabilities of the national laboratories. National laboratory engineers pioneered advances in capillary sequencing and micro-array technologies well before they were available from the private sector; the current commercial instruments are based on designs that were first built and tested with funding from DOE/OBER. Expertise in robotics and micro fabrication will contribute to new high-throughput aspects of genomics. The labs also have ongoing biological research programs that span prokaryotic and eukaryotic themes. Together with cell biology, environmental microbiology, bioremediation and model-systems research, the labs foster a unique environment to carry out large-scale genomic programs.

These programs will be supported by the large and very effective DNA sequencing capacity at the JGI – a wonderful testimony to what can be accomplished if resources and expertise are brought together in an inter-laboratory venture. This capacity is essential for the new Bringing the Genome to Life Project and can be used strategically to determine the genome sequences of key, diverse species to vastly increase our knowledge of biology on Earth.

The National labs have demonstrated their capabilities in many different technological areas that are crucial for this project. However, the current capabilities will not get us where we need to be. The National Labs will need to collaborate and bring new, diverse talents to bear on the problem. The effectiveness of the National Labs will need to be greater than the sum of its parts.

Interfacing with other funded efforts:

Unraveling the complex networks that govern living systems is a monumental task, greatly surpassing the technical and intellectual goals of the genome project. As with the genome project, pooled efforts of scientists and funding agencies will be essential. The NIH and the NSF will play critical roles as will industry and science agencies in other countries. The DOE has a clear mandate based on its success in genome sequencing and its potential for large-scale, team-based, multidisciplinary efforts. For maximum impact, OBER must define roles that are both complementary to, and yet distinct from roles assumed by other partners. DOE should contribute its share of the basic biological data that will be required, playing a central role in cataloging gene expression, protein structures and protein functions. However, the DOE program should also take a lead in defining new avenues of research that are far-reaching, taking advantage of the technological strengths of the National laboratories, but tailored to the needs of its own, distinct biological applications. We recommend that, while contributing to the national/international effort, OBER should look beyond the obvious present challenges and take on the more complex, long-term problems. Such a forward-looking mission would be to begin now to develop the data and capabilities required to unravel gene regulation networks.

Program management:

This complex project will not be successful without rigorous OBER science management. Resources will need to be directed toward both immediate goals and long-term goals, based on demonstration of expertise and success in building the crucial interdisciplinary teams. The National Laboratories will need to collaborate to build the best possible teams. The best scientists from academic and commercial groups will need to be engaged to collaborate on the project. Solid and reliable peer review procedures need to be in place to be able to evaluate responses to strategic science needs and yet be able to help maximize the investment in future technologies.

Training:

This project will depend critically upon scientists trained in new interdisciplinary areas: life scientists who seriously embrace and use bio-informatics, and physicists and engineers who seriously embrace, study and understand the life sciences. These will be the scientists who will lead the new genomic era. The National Laboratories need to build specific programs with academia to provide training opportunities that expand the scope of training into technology-oriented, interdisciplinary areas that are less available at universities.

Ethical Legal and Societal Issues:

The profound insights into fundamental biology that will result from this new program will raise ethical, legal and social issues (ELSI). As with the Human Genome Project, every effort must be made to anticipate these issues and promote societal education and discussion to understand, and focus in a positive manner, their implications. As the project advances, greater knowledge will be gained not only about the fundamental structures and interrelationships of gene products, but also the timing, circumstances, manner, and regulation of their expression and use by cells of different types. This in turn will provide scientists with new and powerful tools to manipulate and ultimately control the behavior of cells, tissues, organs and eventually whole organisms. Scientists will acquire greater knowledge of the way in which cells, tissues, organs, and whole organisms interact with, and respond to, environmental signals. This in turn will provide scientists with tools to predict individual sensitivity and responses to environmental exposures, to enhance the sensitivity of microbial cells to environmental signals and also to use microbes to alter, on a local level, the environment(s) around them. Thus, this greater biological knowledge will provide new information about individuals that could be used to discriminate against them in the workplace or through their insurance. It will add capacities to design living systems to promote beneficial environmental processes (waste clean up, carbon sequestration, energy production, biotechnology, to name a few) but also, unless thoroughly understood and wisely used, to potentially cause harm to the environment. It will thus be very important, building on the successes of and lessons learned from Genome ELSI studies, to educate society about the implications of genomics, to build a comparable subprogram to similarly explore the implications of the Bringing the Genome to Life Project. This companion program should promote education of a range of communities to its implications, to build strong bi-directional linkages between the scientific community funded by this program and the scholars of its societal implications, and to actively participate in discussions of policy formulation so that the benefits of this exciting program are as wide as possible.

Funding:

This new, multiagency research effort far exceeds the scope and complexity of the Human Genome Project. It will require substantial investments across the federal government. An investment of $200 million per year will be required for DOE’s contribution to this effort described here, Bringing the Genome to Life: Energy Related Biology in the New Genomic World.

Participants in one or more of the Genome Strategic Planning Groups:

Raymond F. Gesteland, University of Utah (Chair)
Mina Bissell, Lawrence Berkeley National Laboratory
Elbert Branscomb, DOE Joint Genome Institute
David R. Burgess, Boston College
Claire M. Fraser, The Institute for Genomic Research
Marvin Frazier, Biological and Environmental Research/DOE
David Galas, Keck Graduate Institute
Harold R. (Skip) Garner, University of Texas Southwestern Medical Center
Richard Gibbs, Baylor College of Medicine
Jonathan Greer, Abbott Laboratories
Keith O. Hodgson, Stanford University
Michael L. Knotek, Sandia National Laboratory
Richard Lerner, The Scripps Research Institute
Roger O. McClellan, President Emeritus, Chemical Industry Institute of Toxicology
Miriam H. Meisler, University of Michigan
Ari Patrinos, Biological and Environmental Research/DOE
Eugene M. Rinchik, Oak Ridge National Laboratory
Gerry Rubin, University of California at Berkeley
Lloyd M. Smith, University of Wisconsin-Madison
Randall F. Smith, SmithKline Beecham Pharmaceuticals
Lisa Stubbs, Lawrence Livermore National Laboratory
David Thomassen, Biological and Environmental Research/DOE
Jim Tiedje, University of Michigan
Barbara Wold, California Institute of Technology

Non-Technical Summary – Impacts and Opportunities

Industrial technicians exposed to a chemical solvent. Nuclear workers exposed to trace amounts of radiation. Citizens living next to a DOE cleanup site or an oil refinery concerned about their risks from radiation or chemical exposures. What are the health risks? Who should be worried? Who can have peace of mind? All very real concerns that health officials, policy makers, workers and the general public have to wrestle with every day.

Today, science can only provide general answers. Risks are estimated for populations. Statistical likelihoods are calculated. What about the individual who was or might be exposed? What is his or her risk from exposure?

Equally real are DOE’s challenges of finding long-term supplies of clean energy, of cleaning up environments contaminated by years of nuclear weapons research, and of developing strategies that use nature’s own biology to reduce the levels of atmospheric CO2 that threaten the stability of earth’s climate.

But science is changing – providing us with new opportunities only previously imagined. Opportunities once discussed in scientific journals or in science fiction movies are now front page news - The New York Times, The Washington Post, Time Magazine. Even the President has joined the excitement by announcing that the draft sequence of the entire human genome is now available and ready for use. There are so much new data that even today’s computers aren’t powerful enough to study them all. New, more powerful computers once reserved for modeling atomic bombs, understanding global climate, or even playing chess are now being developed and used to study our DNA sequence and our proteins with the promise and hope of new, better, faster solutions to many of today’s medical challenges.

But the opportunities and needs don’t end with designer drugs and faster diagnoses. The DOE’s new research program, Bringing the Genome to Life: Energy Related Biology in the New Genomic World, will use this same information and technology to help us understand biology so well that we can actually predict the behavior and response of biological systems – from cells to organisms, from microbes to people – under a variety of environmental or normal physiological conditions. What are the health risks from an industrial or environmental exposure? Who should be worried? Who can have peace of mind? Can microbes be directed to give us clean energy, to clean up environmental contamination, to remove excess CO2 from the atmosphere? Very real questions in need of real answers that will come from this new DOE research program.

This new program is central to DOE’s mission and addresses one of the founding challenges made, in 1947, to the predecessor of DOE’s BER program - understanding and predicting the health impacts of nuclear materials.

Bringing the Genome to Life: Energy Related Biology in the New Genomic World, will challenge scientists to expand the broad research strategies that they used to determine the human DNA sequence to now develop predictive models for complex biological systems. Biological systems are not simply collections of many individual genes and proteins. They are complex, highly regulated networks much like our most sophisticated machines. Thus, we cannot understand how these biological machines work by studying them one gene or one protein at a time. In this new program, scientists will need to collect and use information on large numbers of genes and proteins to study, not the behaviors and properties of individual genes or proteins, but how entire groups or networks of genes and proteins work together to "make biology happen."

In addition to a new type of science, this new program will also require new types of scientists, new technical capabilities, and, remarkable computational capabilities that simply do not exist today. As the biology that we study gets more and more complex, scientists will need to develop more sophisticated ways of understanding that biology. Computational modeling is key. We need to understand the complexity of biology across a range of systems. Biology happens from the DNA sequence, through the structure and function of proteins, through the interactions of DNA and proteins in simple pairs and, most importantly, as parts of complex networks involving the hundreds or thousands of genes and proteins that control complex biological responses.

DOE has already taken the necessary first steps toward this grand challenge with its Microbial Cell Project, to be initiated in FY 2001, aimed at understanding the complete workings of a single cell. Bringing the Genome to Life: Energy Related Biology in the New Genomic World, takes another big step forward, beginning where the Microbial Cell Project leaves off and moving toward understanding the regulation and behavior of complex multicellular systems and the responses of those biological systems to environmental cues.

Bringing the Genome to Life: Energy Related Biology in the New Genomic World, promises broad and unimaginable discoveries for biotechnology, pharmaceuticals and medicine in addition to those described above. The knowledge and tools developed in this program go far beyond the objective of being able to predict the behavior and response of a biological system. It will, ultimately, lead to new tools for promotion of human health, for new therapies, for new predictive capability of human susceptibilities, for new sources of energy, and for new strategies to monitor and clean up the environment.