Interpreting Hi-C Heatmaps: A Guide to Genomic Interactions

Comments · 16 Views

Unravel the complexities of genomic interactions with a guide on interpreting Hi-C sequencing heatmaps. Learn the Hi-C technology principle, how to read and analyze heatmaps, and their types. Discover real-world applications in cancer and bacterial genome research.

The rapid growth of the whole genome chromosome conformation capture data has brought great opportunities and challenges to the computational modeling and interpretation of the three-dimensional genome. High-throughput chromosome conformation capture (Hi-C) is a method to study the three-dimensional structure. Hi-C mainly takes the whole nucleus as the research object, uses high-throughput sequencing technology and biological information analysis methods to study the spatial relationship of the whole chromatin in the whole genome, obtains a high-resolution interaction map of chromatin regulatory elements, and more comprehensively expounds the three-dimensional structure of chromatin. Similarly, Hi-C can be combined with RNA-Seq, ChIP-Seq and other omics data, and then the related mechanism of biological trait formation can be expounded from the gene regulation network and epigenetic network.

Principles and steps of Hi-C technology

Formaldehyde immobilization: The first step of Hi-C technology is to use formaldehyde to coagulate the protein involved in chromatin interaction in the genome. Generally, living samples are treated with 1-3% formaldehyde at room temperature for 10-30 minutes. This step is very important for fixing the conformation of DNA.

Enzymatic digestion sequence: Next, the genome is cut by a restriction endonuclease, and the size of the interrupted fragment will affect the sequencing resolution. Commonly used restriction enzymes, such as EcoR1 or HindIII, are used to cut the genome every 4000bp, resulting in about 1 million fragments in the human genome.

End repair: the obtained fragment has a flat end or a sticky end, and then the end is patched and repaired, and biotin is added.

Linking and de-crosslinking: T4 DNA ligase is used to connect the interacting fragments to form a loop. A protein of the linked DNA fragment is digested to obtain a cross-linked fragment.

Sequence interruption: Interrupt the fragment again by ultrasonic or other means.

On-line sequencing: Capture biotin with magnetic beads, make a library, and sequence it online.

Workflow of the Hi-C method (Houda et al., 2017)

Overview of the Hi-C method (Houda et al., 2017)

What is Hi-C Heatmap

Hi-C heatmap is a graphical representation method to show the interaction between different regions of chromosomes or genomes. It indicates the contact frequency or correlation between these regions by color coding, thus helping researchers to identify high interaction regions and potential biological functions in the genome.

Hi-C heatmap is specially used to display Hi-C experimental data. Hi-C experiment generates a matrix by measuring the physical contact frequency of chromosomes in the nucleus, in which each element represents the intensity of interaction between two genes or chromosome regions. These data are usually presented in the form of a two-dimensional matrix, and different levels of interaction intensity are represented by color coding.

The core of the Hi-C heatmap is a matrix, in which rows and columns respectively represent different regions in the genome. Each element in the matrix represents the interaction frequency or correlation value between two regions. Color coding is a key component of heatmap, which is used to represent the values in the matrix. Common color codes include the change from a cool color (such as blue) to a warm color (such as red), indicating the interaction intensity from low to high. For example, red may indicate high interaction and blue may indicate low interaction. The axis of a heat map usually represents the physical location of a genome or a specific region on a chromosome. These axes can help users locate and understand data in heat maps.

How to Read Hi-C Heatmap

In order to read the Hi-C heatmap, it is necessary to understand its axis and color scale and identify significant genomic interactions, such as topological-related domains (TADs), loops, and domains.

Understand axis and color scale

Axis: Hi-C heatmap usually has two axes, the horizontal axis and the vertical axis, which respectively represent the sequence or position of the genome. Each point on these axes represents a chromatin fragment or locus. For example, the following heatmap shows the intensity of interaction between chromatin fragments, the horizontal axis and the vertical axis represent the different positions of chromosomes respectively.

Example of Hi-C heatmap data results (Houda et al., 2017)

Heatmaps generated from 100 kb binned Hi-C data for chromosome 14 (Houda et al., 2017)

Color scale: Color scale is used to indicate the interaction of different intensities. Generally, darker colors (such as blue or green) indicate lower interaction intensity, while lighter colors (such as yellow or red) indicate higher interaction intensity. For example, as shown in the following heatmap, red indicates that the intensity of Hi-C interaction is high.

Different colors show the compartmentalization of chr1 in two types of chromosomal domains (Belton et al., 2012)

Red and blue plaid patterns show the compartmentalization of chr1 in two types of chromosomal domains (Belton et al., 2012)

Identify significant genomic interactions

TADs: TADs is a highly self-interacting region in chromatin, with less interaction with other regions. In heatmaps, TADs usually appear as high-density areas on the diagonal. TADs are contiguous regions that display high levels of self-association and that are separated from adjacent regions by distinct boundaries. The locations of TADs can be determined when interaction data is binned at 40 kb or less.

An example of a TADs (Mora et al., 2016)

An example of a Hi-C contact map (Mora et al., 2016)

Loops: Loops are significant interaction points within chromatin or across regions. In the heatmap, many loops appear as off-diagonal "dots" in a heatmap. Typically, a 10 kb resolution or higher is required for identifying looping interactions. Mapping to smaller bins will allow for more specific interactions, but this comes at the cost of a decreased number of reads per bin. Specific interactions between for instance pairs of CTCF sites are expected to show up as increased signals compared to their surrounding area.

Compartments: Compartments are defined as groups of domains, located along the same chromosome or on different chromosomes that display increased interactions with each other. In heatmaps generated from 100kb bins, this is visible as a specific plaid pattern. These alternating blocks of high and low interaction frequencies represent A and B compartments. Principal component analysis (PCA) readily identifies these compartments that tend to be captured by the first component. The active "A" compartments are gene-dense euchromatic regions, whereas the inactive "B" compartments are gene-poor heterochromatic regions.

Types of Hi-C Heatmap

Hi-C heatmaps is an important tool for analyzing chromatin interaction, which is mainly used to visualize the interaction frequency between different regions in the genome. Hi-C heatmap can be divided into the following types.

Contact Maps: It is the most common type of heat map in Hi-C data, which is used to show the interaction frequency between all regions in the genome. These heatmaps are usually presented in the form of a two-dimensional matrix, in which each cell represents the number of interactions between two genomic locations. High-value regions represent high-frequency interactions, while low-value regions represent low-frequency interactions.

Example of contact maps (Zhang et al., 2017)

Imputed high-resolution contacts are close to experiment data (Zhang et al., 2017)

Interaction maps: It is a variant of a contact diagram, which is usually used to show chromatin interaction under specific experimental conditions. These heat maps can display the interactive data of multiple experiments for comparison and analysis.

Example of heatmaps transformation for analysis (Bkhetan et al., 2017)

Heatmap transformation for analysis (Bkhetan et al., 2017)

Heatmaps for different genomic regions: This type of heatmap focuses on the interaction between specific genomic regions (such as promoters, enhancers, etc.). For example, the capture of Hi-C by a promoter can generate a heatmap for a group of specific genomic regions, which are usually the key regions for gene expression regulation. In addition, the interaction patterns in specific chromatin domains (such as TADs) can be analyzed by heatmap.

Heatmaps results for analysing different genomic regions (Li et al., 2018)

Heatmaps for analyzing different genomic regions (Li et al., 2018)

High-resolution heatmaps: It shows chromatin interaction through finer-grained boxes (such as 25 kb or 1 kb), thus providing more detailed information on genome structure.

Heatmaps results of the stripes (Feng et al., 2022)

The Venn diagram and pile-up visualization of the stripes (Feng et al., 2022)

Multi-condition comparative heatmaps: This heatmap is used to compare chromatin interactions between different conditions or different cell types. For example, Hi-C data of different cell types (neurons, astrocytes and oligodendrocytes) are displayed, and their similarities and differences are shown in the form of a heatmap.

Application of scHi-C data in cancer (Hua et al., 2024)

Application of scHi–C data to characterize cell-to-population and cell-to-cell variability of TADs (Hua et al., 2024)

Analysing Hi-C Heatmap

Analysis of Hi-C heatmap is an important means to study chromatin interaction and genome structure. The following are some commonly used tools and techniques, as well as their applications in thermal map analysis, common patterns and their biological significance.

Tools and techniques

Juicer: Juicer is a widely used tool for processing Hi-C data and generating heat maps. It provides a one-button system to analyze the Hi-C experiment of ring resolution. Juicer can load matrices in ".h5" format and support many normalization methods. It can also be combined with other tools such as Cooler to enhance the interactivity and visualization of heat maps.

HiCExplorer: HiCExplorer is an open source toolkit, which is suitable for the processing, normalization, analysis and visualization of Hi-C data. It provides many functions, such as extracting cross interaction, setting matrix resolution, merging duplicates, calculating Compartment Score, etc. HiCExplorer also supports TAD boundary detection and score feature difference analysis.

Cooler: Cooler is a tool for storing and visualizing large-scale Hi-C data, which supports multi-resolution and multi-file heat map display. It can be used in combination with Juicer to provide more flexible heatmap generation and analysis functions.

HiGlass: HiGlass is a web-based tool, which supports the dynamic arrangement and visualization of multi-scale contact diagrams. It can load multiple Hi-C dataset views and support synchronous visualization with other genome data (such as ChIP-seq).

Read more
Comments