SEM: sized-based expectation maximization for characterizing nucleosome positions and subtypes

Jianyu Yang,Kuangyu Yen,Shaun Mahony
DOI: https://doi.org/10.1101/2023.10.17.562727
2023-10-20
bioRxiv
Abstract:Genome-wide nucleosome profiles are predominantly characterized using MNase-seq, which involves extensive MNase digestion and size selection to enrich for mono-nucleosome-sized fragments. Most available MNase-seq analysis packages assume that nucleosomes uniformly protect 147bp DNA fragments. However, some nucleosomes with atypical histone or chemical compositions protect shorter lengths of DNA. The rigid assumptions imposed by current nucleosome analysis packages ignore variation in nucleosome lengths, potentially blinding investigators to regulatory roles played by atypical nucleosomes. To enable the characterization of different nucleosome types from MNase-seq data, we introduce the Size-based Expectation Maximization (SEM) nucleosome calling package. SEM employs a hierarchical Gaussian mixture model to estimate the positions and subtype identity of nucleosomes from MNase-seq fragments. Nucleosome subtypes are automatically identified based on the distribution of protected DNA fragment lengths at nucleosome positions. Benchmark analysis indicates that SEM is on par with existing packages in terms of standard nucleosome-calling accuracy metrics, while uniquely providing the ability to characterize nucleosome subtype identities. Using SEM on a low-dose MNase H2B MNase-ChIP-seq dataset from mouse embryonic stem cells, we identified three nucleosome types: short-fragment nucleosomes, canonical nucleosomes, and di-nucleosomes. The short-fragment nucleosomes can be divided further into two subtypes based on their chromatin accessibility. Interestingly, the subset of short-fragment nucleosomes in accessible regions exhibit high MNase sensitivity and display distribution patterns around transcription start sites (TSSs) and CTCF peaks, similar to the previously reported "fragile nucleosomes". These SEM-defined accessible short-fragment nucleosomes are found not just in promoters, but also in enhancers and other regulatory regions. Additional investigations reveal their co-localization with the chromatin remodelers Chd6, Chd8, and Ep400. In summary, SEM provides an effective platform for distinguishing various nucleosome subtypes, paving the way for future exploration of non-standard nucleosomes.
What problem does this paper attempt to address?