Abstract:<p>Hi-C is a high-throughput chromosome conformation capture technology that is becoming routine in the literature. Although the price of sequencing has been dropping dramatically, high-resolution Hi-C data are not always an option for many studies, such as in single cells. However, the performance of current computational methods based on Hi-C at the ultra-sparse data condition has yet to be fully assessed. Therefore, in this paper, after briefly surveying the primary computational methods for Hi-C data analysis, we assess the performance of representative methods on data normalization, identification of compartments, Topologically Associating Domains (TADs) and chromatin loops under the condition of ultra-low resolution. We showed that most state-of-the-art methods do not work properly for that condition. Then, we applied the three best-performing methods on real single-cell Hi-C data, and their performance indicates that compartments may be a statistical feature emerging from the cell population, while TADs and chromatin loops may dynamically exist in single cells.</p>

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is how to evaluate and compare the performance of different computational methods under the condition of low - resolution (especially single - cell - level) Hi - C data. Specifically, the research focuses on the following aspects: 1. **Data Normalization**: The paper evaluates the performance of current mainstream Hi - C data normalization methods (such as ICE and HiCNorm) under the condition of ultra - sparse data. The results show that the ICE method is very sensitive to data resolution and may fail in single - cell Hi - C data, while HiCNorm is stable but time - consuming in calculation and difficult to be applied to large - scale single - cell research. 2. **Compartments Identification**: The research evaluates the performance of three compartment identification methods (Juicer, CscoreTool and GeSICA) in down - sampled data. The results show that these methods cannot work properly at the sparse level of single - cell Hi - C data, among which Cscore is relatively more stable. 3. **Topologically Associating Domains (TADs) Detection**: The author selects several representative TAD detection algorithms (such as IS, deDoc, etc.) for evaluation. The results show that under the condition of ultra - low resolution, IS and deDoc perform better than other methods, but at the single - cell level, all methods may fail. 4. **Chromatin Loops Detection**: The paper tests six representative chromatin loop detection tools (such as HiCCUPS, diffHiC, etc.). Most methods show an exponential performance decline when the amount of data is reduced, and only fastHiC still has certain functionality in single - cell Hi - C data. 5. **Actual Performance in Single - Cell Hi - C Data Analysis**: Finally, the author applies the better - performing IS and fastHiC to real single - cell Hi - C data. The results show that: - Compartments are difficult to be clearly identified in single - cell data. - TADs can be meaningfully identified in single - cells and there are differences between cells. - Chromatin loops are almost invisible in single - cell data, and even the best - performing fastHiC does not significantly outperform the simple baseline predictor. ### Summary The main objective of the paper is to reveal the limitations of existing computational methods under the condition of low - resolution (especially single - cell) Hi - C data, and to provide references for the future development of new methods suitable for single - cell Hi - C data analysis. The research shows that although some methods still have certain performance under the condition of ultra - low resolution, their applicability at the single - cell level is still limited. This suggests that we need to further improve algorithms or develop new methods to better analyze the three - dimensional genome structure at the single - cell level. Formula Summary: - **Adjusted Mutual Information (AMI)**: $$ AMI(T, K)=\frac{MI(T, K)-E\{MI(T, K)\}}{\max\{H(T), H(K)\}-E\{MI(T, K)\}} $$ where $MI(T, K)$ is the mutual information, defined as: $$ MI(T, K)=\sum_{i = 1}^{n}\sum_{j = 1}^{m}P(i, j)\log\left(\frac{P(i, j)}{P(i)P'(j)}\right) $$ $P(i)=\frac{|T_i|}{N}$, $P'(j)=\frac{|K_j|}{N}$, $P(i, j)=\frac{|T_i\cap K_j|}{N}$. - **Weight Similarity (WS)**: $$ WS(T, K)=\frac{\sum_{j = 1}^{m}S_{TK}(j)*|K_j|}{\sum_{j = 1}^{m}|K_j|} $$ where $S_{TK}(j)=\max_{i = 1}^{n}\left\{\frac{|T_i\cap K_j|}{|T_i|*|K_j|}\right\}$.

Comparison of computational methods for 3D genome analysis at single-cell Hi-C level

Sci-Hi-C: A single-cell Hi-C method for mapping 3D genome organization in large number of single cells

Revisiting Assessment of Computational Methods for Hi-C Data Analysis

A review and performance evaluation of clustering frameworks for single-cell Hi-C data

Qb-13016-lj 156..174

Every gene everywhere all at once: High-precision measurement of 3D chromosome architecture with single-cell Hi-C

Understanding Spatial Organizations of Chromosomes Via Statistical Analysis of Hi-C Data

Comparison of computational methods for the identification of topologically associating domains

Reconstructing high-resolution chromosome three-dimensional structures by Hi-C complex networks

Computational methods for analyzing genome-wide chromosome conformation capture data

Two main stream methods analysis and visual 3D genome architecture

A Novel Method to Identify Topological Domains Using Hi-C Data

An Overview of Methods for Reconstructing 3-D Chromosome and Genome Structures from Hi-C Data

The Advancement of Analysis Methods of Chromosome Conformation Capture Data

A Comprehensive Evaluation of Generalizability of Deep Learning-Based Hi-C Resolution Improvement Methods

Computational Enhanced Hi-C data reveals the function of structural geometry in genomic regulation

Genome-wide mapping and analysis of chromosome architecture

Computational Methods for Assessing Chromatin Hierarchy

Using DNase Hi-C techniques to map global and local three-dimensional genome architecture at high resolution

From Hi-C Contact Map to Three-dimensional Organization of Interphase Human Chromosomes

Inferring Spatial Organization of Individual Topologically Associated Domains via Piecewise Helical Model.