United States of America
NIH Roadmap Epigenomics Program
The NIH Roadmap Epigenomics Program began in 2008. Under the umbrella of this program, the NIH Common Fund and NIH Institutes and Centers have supported a total of 68 grants in the areas of epigenetic technology development, identification of novel epigenetic marks, reference epigenome mapping, and disease epigenomics investigations. Details concerning the funded projects, resources, protocols generated, and scientific publications (around 200 to date) can be found at https://commonfund.nih.gov/epigenomics/.
Mapping the human genome: A community resource
Epigenetic modifications are chemical modifications to the genome that play a role in development, aging, health, and disease, and are therefore targets for therapeutic interventions. The Reference Epigenomic Mapping Consortium, funded through the Common Fund’s Roadmap Epigenomics Program, is generating genome-wide epigenomic maps for a variety of cell and tissue types.
The majority of the reference epigenomes generated will contain information on epigenetic modifications including a core set of histone marks, DNA methylation, chromatin accessibility, and gene expression information. A subset of reference epigenomes will also contain an expanded set of at least twenty additional histone modifications. For a description of the NIH Roadmap Epigenomics Program mapping efforts please refer to The NIH Roadmap Epigenomics Mapping Consortium. Bernstein et al. Nat Biotechnol, 2010. 28(10):1045-1048.
Data for 52 complete epigenomes and many partial datasets for a diversity of “normal” human cells and tissues are currently available http://www.roadmapepigenomics.org/. Some of the cells and tissues mapped thus far include embryonic stem (ES) cells, ES-cell derivatives, induced pluripotent stem cells, multiple fetal tissues, several varieties of blood and immune cells, breast cell types, placenta, and solid tissues (e.g. adipose, gastrointestinal tract, skin, and brain). There are plans to complete 50-100 additional epigenomes by the end of the program. Assay protocols and recommended data standards are also available.
Analysis of this data will help us predict functional genomic elements, understand cross-talk between epigenetic regulatory mechanisms, understand cellular programming and reprogramming, and provide baseline information to help human disease researchers.
Additional mapping center discoveries or publications
- Development of the protocol and completion of the first human methylome datasets. Lister, R., et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 2009. 462(7271):315-322.
- Development of a base resolution assay enabling genomewide characterization of hydroxymethylation (hmC). hmC is enriched in embryonic stem cells and some types of neurons. hmC functions still not clear. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Yu et al. Cell. 2012 Jun 8;149(6):1368-80.
- New protocol allowing whole genome profiling from 10,000 cells using nano-ChIP-seq, resulting in a 2-3 fold order of magnitude improvement in sensitivity. Whole-genome chromatin profiling from limited numbers of cells using nano-ChIP-seq. Adli and Bernstein. Nat Protoc. 2011 Sep 29;6(10):1656-68.
- A comparison of the epigenomes of pluripotent and lineage-committed hESCs, a comparison of the epigenomes of pluripotent and lineage-committed hESCs, Hawkins et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell, 2010. 6(5):479-491.
- Epigenomic data sets such as those mentioned above can be used to predict cell-type specific enhancer elements. Recent suggests that disease SNP variants identified by GWAS are frequently positioned in enhancer elements active in cell types relevant to the disease. Mapping and analysis of chromatin state dynamics in nine human cell types. Ernst et al. Nature. 2011 May 5;473(7345):43-9.
Selected addtitional epigenomic program publications
- Identification of 67 new histone modifications and discovery that one of these, lysine crotonylation, marks active promoters and enhancers as well as testis-specific genes. The functions of many of these marks are completely unknown. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Tan et al. Cell. 2011 Sep 16;146(6):1016-28.
- The development of a strategy for affinity pulldown of tagged nucleosomes containing newly synthesized histones, allows the rates of histone turnover to be measured throughout the genome. Deal et al. Genome-wide kinetics of nucleosome turnover determined by metabolic labeling of histones. Science, 2010. May 28;328(5982):1161-4.
- Multiplex padlock probes used to capture bisulfite converted DNA allowing efficient and inexpensive identification of DNA methylation sites only in genomic regions of interest to the investigator. Deng et al. Targeted bisulfite sequencing reveals changing in DNA methylation associated with nuclear reprogramming. Nat Biotechnol, 2009. Apr;27(4):353-60.
- Development of a method for identification of proteins and histone posttranslational modifications at a single genomic locus. ChAP-MS: A Method for Identification of Proteins and Histone Posttranslational Modifications at a Single Genomic Locus. Cell Rep. 2012 Jul 26;2(1):198-205.
NHGRI ENCODE Project (ENCyclopedia Of DNA Elements)
Following the success of the Human Genome Project, in 2003 the National Human Genome Research Institute (NHGRI, part of the United States National Institutes of Health/NIH) launched a public research consortium named ENCODE (the Encyclopedia Of DNA Elements), to identify all candidate functional elements in the human genome sequence. ENCODE has expanded to incorporate mouse data, while modENCODE and modERN (sibling projects) collect data on fly and worm. All data and analyses generated by ENCODE are available through the project's portal (https://www.encodeproject.org) and at GEO. ENCODE is organized as an open consortium and includes investigators with diverse backgrounds and expertise in the production and analysis of data. Details concerning the funded projects and resources as well as tutorials can be found at the project site while software, protocols, data standards and quality metrics, ENCODE publications, and community publications using ENCODE data can be found at the project portal.
A critical step in moving from genome sequence to understanding the impact of genetic variation on biology, health and disease is the identification of the parts of the genome that contribute to function. To facilitate this understanding, the National Human Genome Research Institute (NHGRI) has been supporting the Encyclopedia of DNA Elements (ENCODE) Project (www.genome.gov/ENCODE). The long-term goals of ENCODE are to identify all the sequence-based functional elements in the human genome and to share catalogs of these elements freely with the research community. ENCODE aims to annotate both protein-coding and non-coding regions of the genome. While to date, fewer resources have been devoted to the study of non-coding variation, evidence for its important role in establishing healthy and disease phenotypes continues to grow. Genome-wide association studies (GWAS) have revealed that most disease-associated variants map to non-coding regions, and most of the heritability of common diseases has been imputed to non-coding regions. Specific examples support these trends; for example, non-coding variation accounts for most of the heritability in the Mendelian disorder Fragile X Syndrome, and is a major source of heritability in polygenic disorders such as Amyotrophic Lateral Sclerosis. As whole genome sequencing is increasingly being undertaken to understand the genetic basis of disease, the importance of being able to interpret these data in their entirety (including non-coding regions) will increase. NHGRI and ENCODE are thus especially interested in creating resources to help researchers interpret non-coding genome variation.
The main strategy employed by ENCODE has been the identification of candidate functional elements using genome-wide biochemical assays associated with specific classes of DNA elements. This approach has been augmented by comparison of sequences and candidate elements across species. The term functional elements is used here to refer to genes (protein-coding and non-coding) and regulatory regions. Decades of gene regulation studies have identified mechanistic events (such as changes in chromatin structure and protein occupancy) that are being used to make predictions of regulatory elements; given that the accuracy of any untested prediction is unknown, the term candidate functional elements is used here. As many functional elements are manifest only in specific cellular contexts, multiple cell types have been interrogated to maximize discovery of candidate elements. A portion of ENCODE’s effort has been devoted to analysis of the mouse genome, as annotation of the mouse genome has facilitated understanding of the human genome.
Candidate functional elements that have been identified by ENCODE include genes, RNA transcripts, regulatory elements encoded in DNA (including enhancers, promoters, and insulators) and regulatory elements acting at the RNA level (including those that regulate splicing, translation, and RNA stability). ENCODE has used genomic methods (e.g., RNA-seq, ChIP-seq, DNase-seq) based on biochemical assays that have been developed and widely used by the research community to study gene regulation. These assays, which map features that have been mechanistically linked to gene regulation, have been used by epigenomics projects (such as ENCODE, the Roadmap Epigenomics Mapping Centers [REMC], and the International Human Epigenome Consortium [IHEC]), as well as many individual investigators in the research community to identify candidate functional regions of the genome.
The current phase of ENCODE includes these components:
- Mapping centers are conducting high-throughput data-generation experiments to map biochemical activities to identify candidate functional elements in the human and mouse genomes.
- Characterization centers are developing and applying generalizable approaches to characterize the role of candidate functional elements in specific biological contexts.
- Computational analysis projects are piloting new applications of ENCODE data.
- The Data Coordination Center (DCC) processes and shares metadata and data, and provides a portal for the community to visualize and download data.
- The Data Analysis Center (DAC) specifies and updates data processing pipelines and quality metrics for major data types, designs and also performs integrative analysis of ENCODE data to update and refine the Encyclopedia (a major ENCODE product for the research community).
- NHGRI continues to support technology development efforts (see: PAR-16-014, PAR-16-015, PAR-16-016, PAR-16-017).
ENCODE resource: Data, analyses, software, methods, and publications:
Any researcher may freely download, analyze and publish results based on any ENCODE data (without embargo or restrictions) as soon as the data are released (See: https://www.encodeproject.org/about/data-use-policy). The data and catalogs of candidate functional elements are intended to complement ongoing efforts to understand the functions resident in the genome and to serve as the basis for hypothesis generation and refinement for more focused studies conducted by other. The primary site to access ENCODE data and metadata is the ENCODE portal. As of June 2017, ENCODE has released more than 6,500 experiments, examining more than 300 human cell types (cell lines, explants, primary cells, and cells differentiated in culture), and more than 1700 experiments in more than 150 mouse cell types. modENCODE and modERN have released more than 1600 fly and worm experiments. In addition, the ENCODE portal also hosts data and metadata from the NIH Commonfund Epigenomics REMC, and the NHGRI project Genomics of Gene Regulation.
To date, ENCODE and modENCODE data have been used in approximately 2,000 papers published by researchers outside of ENCODE, including investigations of the role of the genome in human disease. These publications are shared as examples of applications of ENCODE data and analyses.
ENCODE is integrating these data to produce an “Encyclopedia,” a compendium of candidate functional elements designed to enable exploration of the role of functional elements in disease mechanisms and basic biological processes. A developmental version of the Encyclopedia is available at https://www.encodeproject.org/data/annotations/.