IHEC Recommendations for Epigenomic Analysis
The epigenome of a cell refers to the collection of DNA methylation, histone modifications, and chromatin accessibility present throughout its genome and the set of coding and non-coding RNA molecules it expresses (Reference Bernstein et al. Nature Biotechnology, 2010).
Each cell type possesses a unique epigenome, which defines its gene regulatory program. The goal of the International Human Epigenome Consortium (IHEC) is to generate 1000 or more reference epigenomes for a broad spectrum of human cell types and a wide range of developmental stages, laying the foundation to study the epigenetic mechanisms of human diseases.
To profile the epigenome of a cell type, a variety of experimental approaches have been developed in recent years.
After carefully considering the existing methods in terms of their resolution, comprehensiveness, accuracy, sensitivity and cost-effectiveness in detecting epigenomic features, the IHEC Scientific Steering Committee (SSC) makes the following recommendations for epigenomic analysis of human cells.
These recommendations provide a practical definition of the epigenome, suggest the most appropriate experimental approaches to attain it, and outline key parameters for experimental design and data analysis.
IHEC recommends the following data and metadata models developed by the IHEC Metadata Standards Workgroup.
These assay standards developed/in use by the NIH Roadmap Reference Epigenome Mapping Centers are being considered for potential use by IHEC.
Definition of a minimal Epigenome and Recommendations for Experimental Approaches
Methylation of cytosine (mC) is a stable epigenetic mark that contributes to cell fate determination, imprinting, and gene silencing. In the human genome mC is widespread, occurring predominantly in the CG context. Recently, it has been shown that, in pluripotent stem cells, mC also exists in non-CG context. Additionally, mC may be converted to hydroxymethyl Cytosine (hmC) in certain human cell types, although the function of hmC has yet to be elucidated. The IHEC SSC recommends that in the current stage, whole genome, nucleotide resolution map of mC be obtained for each cell type under study. This can be accomplished by MethylC-seq, which involves bisulfite conversion of the cell’s genomic DNA followed by whole-genome shotgun sequencing (see Roadmap for protocol).
This approach is robust, sensitive and accurate, detecting both mC and hmC in the genome at nucleotide resolution. While at the current stage it is not yet feasible to distinguish mC from hmC, investigators may be able to do so when appropriate methods become available. Investigators should plan to use 5ug of genomic DNA as starting materials, and obtain 6x1010bp of sequencing reads (or 20x genome coverage) so that the methylation status for at least 90% of cytosines in the human genome can be reliably determined. Two biological replicates should be conducted for each cell type, and the concordance between the two replicates should exceed 95%.
Histone proteins are subject to a large number of post-translational modifications, including acetylation, methylation, ubiquitylation, phosphorylation, and more. A subset of these modifications is clearly involved in regulating gene expression and cell fate determination.
The IHEC SSC recommends the generation of whole genome, nucleosome resolution (200bp) maps for at least six histone modification, including H3K4me3, H3K9me3, H3K27me3, H3K27ac, H3K4me1 and H3K36me3. These marks’ presence could indicate active promoters (H3K4me3, H3K27ac), active enhancers (H3K4me1, H3K27ac), actively transcribed genes (H3K36me3), or heterochromatin regions (H3K9me3, H3K27me3). The suggested method to accomplish this is ChIP-seq, involving chromatin immunoprecipitation followed by next-gen DNA sequencing (see Roadmap for protocol).
Investigators should begin with at least 1 million formaldehyde crosslinked cells for each chromatin mark. A minimum of 10 million unique mapped short sequencing reads should be obtained for each ChIP-seq experiment, with the amount of tags in the enriched regions exceeding 5% of the total reads. Two biological replicates should be conducted, and the concordance between the two biological replicates should be over 0.90. The quality of antibodies also needs to be validated for specificity by Western and peptide blotting assays.
The non-coding RNA species include both small non-coding RNA such as microRNA and long non-coding RNA such as lincRNA. Accumulating evidence points out that both species can regulate gene expression.
The IHEC SSC recommends that identities and abundance of non-coding RNA species in a cell type be determined. The suggested method to accomplish this is RNA-seq, involving isolation of the large or small RNA species followed by next-gen DNA sequencing (see Roadmap for protocol).
The investigator should begin the experiment with at least 1 million cells, or 5ug of total RNA. It is recommended that two biological replicates be performed, and for each replicate, 50 million 50nt or longer uniquely mapped reads be obtained. The DNA strand from which the transcripts are made from should also be detected. The results should accurately represent the abundance of transcripts ranging form 100 copies/cell to 0.1 copies/cell.
Chromatin accessibilities are excellent indicators of transcription factor binding and nucleosome dynamics occurring at genomic regions participating in gene regulation or other nuclear processes. It is measured by a greater chromatin accessibility and propensity towards nuclease digestion.
The IHEC SSC recommends that genome-wide, nucleotide resolution map of chromatin accessibilities be obtained for each cell type. The suggested method to accomplish this is DNase-Seq, involving the treatment of isolated nuclei with DNase I followed by next-gen sequencing of the freed short DNA fragments. A protocol of DNase-seq can be found here.
At least two biological replicates should be performed for each cell type, each producing 20 million or more uniquely mapped short DNA reads (36nt). The amount of sequence tags in the enriched regions should be more than 30% of the total. The concordance between the two replicates should exceed 0.90.
While the objective of the current research is for the generation of epigenome profiles, it is recognized that variations in epigenomic landscape may be closely linked to variations in genomic sequences. Therefore, it is recommended that an aliquot of 5ug or more genomic DNA be isolated from the cell sample and saved for genome sequencing experiments.
To ensure the accuracy and general utility of the reference epigenome generated from each cell type, the IHEC SSC recommends the use of pure cell population in these studies. The homogeneity of the cell population should be over 95% or more, as indicated by staining of certain cell type specific biomark.
Reference cell types
To facilitate cross-laboratory comparisons of experimental protocols used for generating the above epigenome maps, the IHEC SSC recommends that one of the following three primary cell types be analyzed in each participating laboratory for epigenomic analysis as outlined in the previous section (“definition of a miminal epigenome”), and the results are compared to each other to ensure that minimal experimental bias is introduced.
- The human H1 embryonic stem cell line. This cell line can be obtained from WiCell and cultured in individual laboratories. Alternatively, these cells can be purchased in bulk from Cellular Dynamics (Madison, Wisconsin, USA). Before epigenome analyses, the cells should be karyotyped to ensure that no gross chromosomal changes have occurred during the culturing process.
- Human fibroblast cell line IMR90. This primary human fetal lung fibroblast cell line can be obtained from ATCC and cultured under standard conditions.
- CD4+ T cells. This type of cells can be obtained from peripheral blood of healthy donors using standard protocols.
Data release, format, software programs and bioinformatic tools
The IHEC SSC recommends the timely release of epigenomic data by each consortium member. The raw sequencing reads and associated metadata should be publicly available through one of several public databases, such as
The data to be made publicly available should include the raw sequence reads of each epigenomic assay, as well as the metadata such as experimental design, protocols, parameters, and reagent information. Each data producer should also store the primary data (reads) for a minimum of five years after their generation.