United States of America

4DN

Launched in 2015, the 4DN program brings together researchers from various fields – several of whom have already been involved with other IHEC-related projects – to investigate the principles behind the three-dimensional organization of the nucleus in space and time (the 4th dimension), the role nuclear organization plays in gene expression and cellular function, and how changes in the nuclear organization affect normal development as well as various diseases.

NIH Roadmap Epigenomics Program

Overview

The NIH Roadmap Epigenomics Program began in 2008. Under the umbrella of this program, the NIH Common Fund and NIH Institutes and Centers have supported a total of 68 grants in the areas of epigenetic technology development, identification of novel epigenetic marks, reference epigenome mapping, and disease epigenomics investigations. Details concerning the funded projects, resources, protocols generated, and scientific publications (around 200 to date) can be found at https://commonfund.nih.gov/epigenomics/.

Mapping the human genome: A community resource

Epigenetic modifications are chemical modifications to the genome that play a role in development, aging, health, and disease, and are therefore targets for therapeutic interventions. The Reference Epigenomic Mapping Consortium, funded through the Common Fund’s Roadmap Epigenomics Program, is generating genome-wide epigenomic maps for a variety of cell and tissue types.

The majority of the reference epigenomes generated will contain information on epigenetic modifications including a core set of histone marks, DNA methylation, chromatin accessibility, and gene expression information. A subset of reference epigenomes will also contain an expanded set of at least twenty additional histone modifications. For a description of the NIH Roadmap Epigenomics Program mapping efforts please refer to The NIH Roadmap Epigenomics Mapping Consortium. Bernstein et al. Nat Biotechnol, 2010. 28(10):1045-1048.

Data for 52 complete epigenomes and many partial datasets for a diversity of “normal” human cells and tissues are currently available http://www.roadmapepigenomics.org/. Some of the cells and tissues mapped thus far include embryonic stem (ES) cells, ES-cell derivatives, induced pluripotent stem cells, multiple fetal tissues, several varieties of blood and immune cells, breast cell types, placenta, and solid tissues (e.g. adipose, gastrointestinal tract, skin, and brain). There are plans to complete 50-100 additional epigenomes by the end of the program. Assay protocols and recommended data standards are also available.

Analysis of this data will help us predict functional genomic elements, understand cross-talk between epigenetic regulatory mechanisms, understand cellular programming and reprogramming, and provide baseline information to help human disease researchers.

Additional mapping center discoveries or publications

Development of the protocol and completion of the first human methylome datasets. Lister, R., et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature, 2009. 462(7271):315-322.
Development of a base resolution assay enabling genomewide characterization of hydroxymethylation (hmC). hmC is enriched in embryonic stem cells and some types of neurons. hmC functions still not clear. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Yu et al. Cell. 2012 Jun 8;149(6):1368-80.
New protocol allowing whole genome profiling from 10,000 cells using nano-ChIP-seq, resulting in a 2-3 fold order of magnitude improvement in sensitivity. Whole-genome chromatin profiling from limited numbers of cells using nano-ChIP-seq. Adli and Bernstein. Nat Protoc. 2011 Sep 29;6(10):1656-68.
A comparison of the epigenomes of pluripotent and lineage-committed hESCs, a comparison of the epigenomes of pluripotent and lineage-committed hESCs, Hawkins et al. Distinct epigenomic landscapes of pluripotent and lineage-committed human cells. Cell Stem Cell, 2010. 6(5):479-491.
Epigenomic data sets such as those mentioned above can be used to predict cell-type specific enhancer elements. Recent suggests that disease SNP variants identified by GWAS are frequently positioned in enhancer elements active in cell types relevant to the disease. Mapping and analysis of chromatin state dynamics in nine human cell types. Ernst et al. Nature. 2011 May 5;473(7345):43-9.

Selected addtitional epigenomic program publications

Identification of 67 new histone modifications and discovery that one of these, lysine crotonylation, marks active promoters and enhancers as well as testis-specific genes. The functions of many of these marks are completely unknown. Identification of 67 histone marks and histone lysine crotonylation as a new type of histone modification. Tan et al. Cell. 2011 Sep 16;146(6):1016-28.
The development of a strategy for affinity pulldown of tagged nucleosomes containing newly synthesized histones, allows the rates of histone turnover to be measured throughout the genome. Deal et al. Genome-wide kinetics of nucleosome turnover determined by metabolic labeling of histones. Science, 2010. May 28;328(5982):1161-4.
Multiplex padlock probes used to capture bisulfite converted DNA allowing efficient and inexpensive identification of DNA methylation sites only in genomic regions of interest to the investigator. Deng et al. Targeted bisulfite sequencing reveals changing in DNA methylation associated with nuclear reprogramming. Nat Biotechnol, 2009. Apr;27(4):353-60.
Development of a method for identification of proteins and histone posttranslational modifications at a single genomic locus. ChAP-MS: A Method for Identification of Proteins and Histone Posttranslational Modifications at a Single Genomic Locus. Cell Rep. 2012 Jul 26;2(1):198-205.

NHGRI ENCODE Project (ENCyclopedia Of DNA Elements)

Overview

Following the success of the Human Genome Project, in 2003 the National Human Genome Research Institute (NHGRI, part of the United States National Institutes of Health/NIH) launched a public research consortium named ENCODE (the Encyclopedia Of DNA Elements), to identify all candidate functional elements in the human genome sequence. ENCODE has expanded to incorporate mouse data, while modENCODE and modERN (sibling projects) collect data on fly and worm. All data and analyses generated by ENCODE are available through the project's portal (https://www.encodeproject.org) and at GEO. ENCODE is organized as an open consortium and includes investigators with diverse backgrounds and expertise in the production and analysis of data. Details concerning the funded projects and resources as well as tutorials can be found at the project site while software, protocols, data standards and quality metrics, ENCODE publications, and community publications using ENCODE data can be found at the project portal.

Rationale

A critical step in moving from genome sequence to understanding the impact of genetic variation on biology, health and disease is the identification of the parts of the genome that contribute to function. To facilitate this understanding, the National Human Genome Research Institute (NHGRI) has been supporting the Encyclopedia of DNA Elements (ENCODE) Project (www.genome.gov/ENCODE). The long-term goals of ENCODE are to identify all the sequence-based functional elements in the human genome and to share catalogs of these elements freely with the research community. ENCODE aims to annotate both protein-coding and non-coding regions of the genome. While to date, fewer resources have been devoted to the study of non-coding variation, evidence for its important role in establishing healthy and disease phenotypes continues to grow. Genome-wide association studies (GWAS) have revealed that most disease-associated variants map to non-coding regions, and most of the heritability of common diseases has been imputed to non-coding regions. Specific examples support these trends; for example, non-coding variation accounts for most of the heritability in the Mendelian disorder Fragile X Syndrome, and is a major source of heritability in polygenic disorders such as Amyotrophic Lateral Sclerosis. As whole genome sequencing is increasingly being undertaken to understand the genetic basis of disease, the importance of being able to interpret these data in their entirety (including non-coding regions) will increase. NHGRI and ENCODE are thus especially interested in creating resources to help researchers interpret non-coding genome variation.

Approach

The main strategy employed by ENCODE has been the identification of candidate functional elements using genome-wide biochemical assays associated with specific classes of DNA elements. This approach has been augmented by comparison of sequences and candidate elements across species. The term functional elements is used here to refer to genes (protein-coding and non-coding) and regulatory regions. Decades of gene regulation studies have identified mechanistic events (such as changes in chromatin structure and protein occupancy) that are being used to make predictions of regulatory elements; given that the accuracy of any untested prediction is unknown, the term candidate functional elements is used here. As many functional elements are manifest only in specific cellular contexts, multiple cell types have been interrogated to maximize discovery of candidate elements. A portion of ENCODE’s effort has been devoted to analysis of the mouse genome, as annotation of the mouse genome has facilitated understanding of the human genome.

Candidate functional elements that have been identified by ENCODE include genes, RNA transcripts, regulatory elements encoded in DNA (including enhancers, promoters, and insulators) and regulatory elements acting at the RNA level (including those that regulate splicing, translation, and RNA stability). ENCODE has used genomic methods (e.g., RNA-seq, ChIP-seq, DNase-seq) based on biochemical assays that have been developed and widely used by the research community to study gene regulation. These assays, which map features that have been mechanistically linked to gene regulation, have been used by epigenomics projects (such as ENCODE, the Roadmap Epigenomics Mapping Centers [REMC], and the International Human Epigenome Consortium [IHEC]), as well as many individual investigators in the research community to identify candidate functional regions of the genome.

The current phase of ENCODE includes these components:

Mapping centers are conducting high-throughput data-generation experiments to map biochemical activities to identify candidate functional elements in the human and mouse genomes.
Characterization centers are developing and applying generalizable approaches to characterize the role of candidate functional elements in specific biological contexts.
Computational analysis projects are piloting new applications of ENCODE data.
The Data Coordination Center (DCC) processes and shares metadata and data, and provides a portal for the community to visualize and download data.
The Data Analysis Center (DAC) specifies and updates data processing pipelines and quality metrics for major data types, designs and also performs integrative analysis of ENCODE data to update and refine the Encyclopedia (a major ENCODE product for the research community).
NHGRI continues to support technology development efforts (see: PAR-16-014, PAR-16-015, PAR-16-016, PAR-16-017).

ENCODE resource: Data, analyses, software, methods, and publications:

Any researcher may freely download, analyze and publish results based on any ENCODE data (without embargo or restrictions) as soon as the data are released (See: https://www.encodeproject.org/about/data-use-policy). The data and catalogs of candidate functional elements are intended to complement ongoing efforts to understand the functions resident in the genome and to serve as the basis for hypothesis generation and refinement for more focused studies conducted by other. The primary site to access ENCODE data and metadata is the ENCODE portal. As of June 2017, ENCODE has released more than 6,500 experiments, examining more than 300 human cell types (cell lines, explants, primary cells, and cells differentiated in culture), and more than 1700 experiments in more than 150 mouse cell types. modENCODE and modERN have released more than 1600 fly and worm experiments. In addition, the ENCODE portal also hosts data and metadata from the NIH Commonfund Epigenomics REMC, and the NHGRI project Genomics of Gene Regulation.

To date, ENCODE and modENCODE data have been used in approximately 2,000 papers published by researchers outside of ENCODE, including investigations of the role of the genome in human disease. These publications are shared as examples of applications of ENCODE data and analyses.

ENCODE is integrating these data to produce an “Encyclopedia,” a compendium of candidate functional elements designed to enable exploration of the role of functional elements in disease mechanisms and basic biological processes. A developmental version of the Encyclopedia is available at https://www.encodeproject.org/data/annotations/.

Epigenetics of Aging and Disease Initiative (EADI)

The Epigenetics of Aging and Disease Initiative (EADI) represented by Drs. Karen Conneely (Emory University School of Medicine), Andrea Baccarelli (Columbia University Mailman School of Public Health), and Joanne Murabito (Boston University School of Medicine and Framingham Heart Study), joined IHEC as Associate Member in 2018.

The group is in the process of carrying out a National Institute of Aging-funded pilot sequencing project in peripheral blood mononuclear cells from 20 female participants at the high and low ends of the adult age distribution (10 women between 20 and 30 years of age, and 10 between 68 and 80). As part of this project, the researchers are generating data on DNA methylation via whole-genome bisulfite sequencing, hydroxymethylation via 5hmC capture-seq, chromatin accessibility via ATAC-seq, three histone modifications via ChIP-seq, and gene expression via RNA-seq. When the data are ready for dissemination, the multi-omic datasets shall be made available as part of IHEC human reference epigenome datasets.

The laboratory of Dr Conneely focuses on the application and development of statistical methods for genetic and epigenetic association studies, with a particular interest in the epigenetics of aging. Ongoing work in the Conneely lab uses computational approaches to understand the relationship between multiple epigenetic mechanisms, gene expression, and human aging, with particular interest in the evolutionary origins of this relationship and its contribution to risk for age-related disease.

The laboratory of Dr Baccarelli explores epigenetic and molecular mechanisms as potential functional pathways linking exposures to environmental pollutants to human disease. His laboratory research activities are specifically focused on epigenetics, mitochondriomics, and computational epigenomics. Recent and ongoing projects investigate health effects from environmental exposures, including particulate air pollution, metals, Bisphenol A, phthalates, and pesticides, and common risk factors, such as psychosocial violence, second-hand smoking, and maternal diet and metabolic alterations.

The laboratory of Dr Murabito focuses on identifying the determinants of healthy aging and longevity and reproductive aging in the community, including the investigation of genetic and genomic factors. She leads highly productive multidisciplinary international consortia including the Cohorts for Heart and Aging Research in Genomics Epidemiology (CHARGE) Aging and Longevity working group and the ReproGen Consortium.

DNA Zoo

The DNA Zoo consortium represented by Dr Erez Lieberman Aiden joined IHEC as Associate Member in September 2019.

The DNA Zoo consortium is focused on facilitating conservation efforts through collaborations between academic institutions, zoos, and conservation organizations, resulting in rapid generation and release of high-quality genomics resources. The team believes that these efforts can not only aid threatened nonhuman populations, but will enhance our understanding of life, its varieties, and its origins, and will greatly facilitate our understanding of our own species – Homo sapiens. The mission of the DNA Zoo is to advance conservation genomics, comparative genomics, and epigenomics, through the following specific goals, which harmonize with those of IHEC:

Development of cost-effective methods for genome assembly
Participation in sampling efforts to create high quality genomics, epigenomics, and 3D genomics resources for large numbers of species, cultivars, and populations
Using the resulting genomics resources to facilitate conservation efforts
Unrestricted sharing of data publicly once the genomic resources for any particular species are produced
Sharing all protocols and software produced as part of the collaboration via open source repositories

To date, DNA Zoo has released end-to-end assemblies – both generated de novo, as well as upgraded from existing fragmentary genomes – for over 100 species, including over 70 mammals. In each case, the consortium have also generated deep Hi-C data and maps, which are available to the public, too. Moreover, DNA Zoo is now beginning to generate additional epigenetic datasets, such as ATAC-Seq.

The data generated by DNA Zoo will be of great use to IHEC insofar as it will facilitate the study of human epigenomes by enabling the examination of conservation patterns of epigenomic features throughout the mammals.