research-article
Authors: Meiqin Gong, Yun Yu, Zixuan Wang, Junming Zhang, + 4, Xiongyi Wang, Cheng Fu, Yongqing Zhang, and Xiaodong Wang (Less)
Volume 171, Issue C
Published: 09 July 2024 Publication History
- 0citation
- 0
- Downloads
Metrics
Total Citations0Total Downloads0Last 12 Months0
Last 6 weeks0
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
- View Options
- References
- Media
- Tables
- Share
Abstract
Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT’s methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence’s grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.
Highlights
•
Present a framework for single-cell chromatin accessibility analysis.
•
Develop an online analysis platform, scAuto.
•
Conduct extensive experiments and achieve the state-of-the-art performance.
References
[1]
Sinha Sarthak, Satpathy Ansuman T., Zhou Weiqiang, Ji Hongkai, Stratton Jo A., Jaffer Arzina, Bahlis Nizar, Morrissy Sorana, Biernaskie Jeff A., Profiling chromatin accessibility at single-cell resolution, Genomics, Proteomics & Bioinform. 19 (2) (2021) 172–190.
[2]
Preissl Sebastian, Gaulton Kyle J., Ren Bing, Characterizing cis-regulatory elements using single-cell epigenomics, Nature Rev. Genet. 24 (1) (2023) 21–43.
[3]
de Boer Carl G., Regev Aviv, BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization, BMC Bioinform. 19 (1) (2018) 1–13.
[4]
Lal Avantika, Chiang Zachary D., Yakovenko Nikolai, Duarte Fabiana M., Israeli Johnny, Buenrostro Jason D., Deep learning-based enhancement of epigenomics data with AtacWorks, Nature Commun. 12 (1) (2021) 1507.
[5]
Yuan Han, Kelley David R., Scbasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks, Nature Methods 19 (9) (2022) 1088–1096.
[6]
Chen Xiaoyang, Chen Shengquan, Song Shuang, Gao Zijing, Hou Lin, Zhang Xuegong, Lv Hairong, Jiang Rui, Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding, Nat. Mach. Intell. 4 (2) (2022) 116–126.
[7]
Xiong Lei, Xu Kui, Tian Kang, Shao Yanqiu, Tang Lei, Gao Ge, Zhang Michael, Jiang Tao, Zhang Qiangfeng Cliff, SCALE method for single-cell ATAC-seq analysis via latent feature extraction, Nature Commun. 10 (1) (2019) 4576.
[8]
Cao Yingxin, Fu Laiyi, Wu Jie, Peng Qinke, Nie Qing, Zhang Jing, Xie Xiaohui, SAILER: scalable and accurate invariant representation learning for single-cell ATAC-seq processing and integration, Bioinformatics 37 (Supplement_1) (2021) i317–i326.
[9]
Ashuach Tal, Reidenbach Daniel A., Gayoso Adam, Yosef Nir, PeakVI: A deep generative model for single-cell chromatin accessibility analysis, Cell Rep. Methods 2 (3) (2022).
[10]
Hao Yuhan, Stuart Tim, Kowalski Madeline H., Choudhary Saket, Hoffman Paul, Hartman Austin, Srivastava Avi, Molla Gesmira, Madad Shaista, Fernandez-Granda Carlos, et al., Dictionary learning for integrative, multimodal and scalable single-cell analysis, Nature Biotechnol. (2023) 1–12.
[11]
Stuart Tim, Butler Andrew, Hoffman Paul, Hafemeister Christoph, Papalexi Efthymia, Mauck William M., Hao Yuhan, Stoeckius Marlon, Smibert Peter, Satija Rahul, Comprehensive integration of single-cell data, Cell 177 (7) (2019) 1888–1902.
[12]
Granja Jeffrey M., Corces M. Ryan, Pierce Sarah E., Bagdatli S. Tansu, Choudhry Hani, Chang Howard Y., Greenleaf William J., Archr is a scalable software package for integrative single-cell chromatin accessibility analysis, Nature Genetics 53 (3) (2021) 403–411.
[13]
Ji Yanrong, Zhou Zhihan, Liu Han, Davuluri Ramana V., DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome, Bioinformatics 37 (15) (2021) 2112–2120.
[14]
Hu Edward J., Shen Yelong, Wallis Phillip, Allen-Zhu Zeyuan, Li Yuanzhi, Wang Shean, Wang Lu, Chen Weizhu, Lora: Low-rank adaptation of large language models, 2021, arXiv preprint arXiv:2106.09685.
[15]
Li Xiang Lisa, Liang Percy, Prefix-tuning: Optimizing continuous prompts for generation, 2021, arXiv preprint arXiv:2101.00190.
[16]
Buenrostro Jason D., Corces M. Ryan, Lareau Caleb A., Wu Beijing, Schep Alicia N., Aryee Martin J., Majeti Ravindra, Chang Howard Y., Greenleaf William J., Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation, Cell 173 (6) (2018) 1535–1548.
[17]
Developers Pysam, Pysam: a python module for reading and manipulating files in the sam/bam format, 2018, Preprint at.
[18]
Zhang Yongqing, Liu Yuhang, Wang Zixuan, Wang Maocheng, Xiong Shuwen, Huang Guo, Gong Meiqin, Uncovering the relationship between tissue-specific TF-DNA binding and chromatin features through a transformer-based model, Genes 13 (11) (2022) 1952.
[19]
Wu Haixu, Xu Jiehui, Wang Jianmin, Long Mingsheng, Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, Adv. Neural Inf. Process. Syst. 34 (2021) 22419–22430.
[20]
De Kanter Jurrian K., Lijnzaad Philip, Candelli Tito, Margaritis Thanasis, Holstege Frank C.P., CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing, Nucleic Acids Res. 47 (16) (2019) e95.
[21]
Tsuyuzaki Koki, Sato Hiroyuki, Sato Kenta, Nikaido Itoshi, Benchmarking principal component analysis for large-scale single-cell RNA-sequencing, Genome Biol. 21 (1) (2020) 1–17.
[22]
Yu Tianwei, A new dynamic correlation algorithm reveals novel functional aspects in single cell and bulk RNA-seq data, PLoS Comput. Biol. 14 (8) (2018).
[23]
Alquicira-Hernandez Jose, Sathe Anuja, Ji Hanlee P., Nguyen Quan, Powell Joseph E., Scpred: accurate supervised method for cell-type classification from single-cell RNA-seq data, Genome Biol. 20 (1) (2019) 1–17.
[24]
Xiang Ruizhi, Wang Wencan, Yang Lei, Wang Shiyuan, Xu Chaohan, Chen Xiaowen, A comparison for dimensionality reduction methods of single-cell RNA-seq data, Front. Genet. 12 (2021).
[25]
Peng Lihong, Tian Xiongfbei, Tian Geng, Xu Junlin, Huang Xin, Weng Yanbin, Yang Jialiang, Zhou Liqian, Single-cell RNA-seq clustering: datasets, models, and algorithms, RNA Biol. 17 (6) (2020) 765–783.
[26]
Traag Vincent A., Waltman Ludo, Van Eck Nees Jan, From Louvain to Leiden: guaranteeing well-connected communities, Sci. Rep. 9 (1) (2019) 5233.
[27]
Zhu Xiaoshu, Zhang Jie, Xu Yunpei, Wang Jianxin, Peng Xiaoqing, Li Hong-Dong, Single-cell clustering based on shared nearest neighbor and graph partitioning, Interdiscip. Sci. Comput. Life Sci. 12 (2020) 117–130.
[28]
Kobak Dmitry, Berens Philipp, The art of using t-SNE for single-cell transcriptomics, Nature Commun. 10 (1) (2019) 5416.
[29]
Becht Etienne, McInnes Leland, Healy John, Dutertre Charles-Antoine, Kwok Immanuel WH, Ng Lai Guan, Ginhoux Florent, Newell Evan W., Dimensionality reduction for visualizing single-cell data using UMAP, Nature Biotechnol. 37 (1) (2019) 38–44.
[30]
Sud Kanika, Sud Kanika, Understanding REST APIs, Practical hapi: Build Your Own hapi Apps and Learn from Industry Case Studies, Springer, 2020, pp. 1–11.
[31]
Lotfollahi Mohammad, Naghipourfar Mohsen, Luecken Malte D., Khajavi Matin, Büttner Maren, Wagenstetter Marco, Avsec Žiga, Gayoso Adam, Yosef Nir, Interlandi Marta, et al., Mapping single-cell data to reference atlases by transfer learning, Nature Biotechnol. 40 (1) (2022) 121–130.
[32]
Kelley David R., Reshef Yakir A., Bileschi Maxwell, Belanger David, McLean Cory Y., Snoek Jasper, Sequential regulatory activity prediction across chromosomes with convolutional neural networks, Genome Res. 28 (5) (2018) 739–750.
[33]
Zhou Jian, Troyanskaya Olga G., Predicting effects of noncoding variants with deep learning–based sequence model, Nature Methods 12 (10) (2015) 931–934.
[34]
Kelley David R., Snoek Jasper, Rinn John L., Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks, Genome Res. 26 (7) (2016) 990–999.
[35]
Dey Rahul, Salem Fathi M., Gate-variants of gated recurrent unit (GRU) neural networks, in: 2017 IEEE 60th International Midwest Symposium on Circuits and Systems, MWSCAS, IEEE, 2017, pp. 1597–1600.
[36]
Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.
[37]
Beltagy Iz, Peters Matthew E., Cohan Arman, Longformer: The long-document transformer, 2020, arXiv preprint arXiv:2004.05150.
[38]
Wang Zixuan, Zhang Yongqing, Yu Yun, Zhang Junming, Liu Yuhang, Zou Quan, A unified deep learning framework for single-cell ATAC-seq analysis based on ProdDep transformer encoder, Int. J. Mol. Sci. 24 (5) (2023) 4784.
[39]
Bravo González-Blas Carmen, Minnoye Liesbeth, Papasokrati Dafni, Aibar Sara, Hulselmans Gert, Christiaens Valerie, Davie Kristofer, Wouters Jasper, Aerts Stein, cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data, Nature Methods 16 (5) (2019) 397–400.
[40]
Van Dijk David, Sharma Roshan, Nainys Juozas, Yim Kristina, Kathail Pooja, Carr Ambrose J., Burdziak Cassandra, Moon Kevin R., Chaffer Christine L, Pattabiraman Diwakar, et al., Recovering gene interactions from single-cell data using data diffusion, Cell 174 (3) (2018) 716–729.
[41]
Li Zhijian, Kuppe Christoph, Ziegler Susanne, Cheng Mingbo, Kabgani Nazanin, Menzel Sylvia, Zenke Martin, Kramann Rafael, Costa Ivan G., Chromatin-accessibility estimation from single-cell ATAC-seq data with scopen, Nature Commun. 12 (1) (2021) 6386.
[42]
Xiong Lei, Tian Kang, Li Yuzhe, Ning Weixi, Gao Xin, Zhang Qiangfeng Cliff, Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space, Nature Commun. 13 (1) (2022) 6118.
[43]
Cao Zhi-Jie, Gao Ge, Multi-omics single-cell data integration and regulatory inference with graph-linked embedding, Nature Biotechnol. 40 (10) (2022) 1458–1466.
[44]
Liu Yuhang, Wang Zixuan, Yuan Hao, Zhu Guiquan, Zhang Yongqing, HEAP: a task adaptive-based explainable deep learning framework for enhancer activity prediction, Brief. Bioinform. 24 (5) (2023) bbad286.
Recommendations
- RoboCOP: Multivariate State Space Model Integrating Epigenomic Accessibility Data to Elucidate Genome-Wide Chromatin Occupancy
Research in Computational Molecular Biology
Abstract
Chromatin is the tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and level of ...
Read More
- Genome‐wide prediction of chromatin accessibility based on gene expression
Abstract
Decoding gene regulation in a biological system requires information from both transcriptome and regulome. While multiple high‐throughput transcriptome and regulome mapping technologies are available, transcriptome profiling is more widely used. ...
Open chromatin marks active regulatory elements in the genome and is important for understanding gene regulation. This article reviews methods for predicting chromatin accessibility using widely available gene expression data and discusses their ...
Read More
- Attentive gated neural networks for identifying chromatin accessibility
Abstract
Accessible chromatin is associated strongly with active gene regulatory regions. Enhancers and promoters commonly occur in accessible chromatin, and systematically discovering functional sites is indispensable at the whole genome level. However, ...
Read More
Comments
Information & Contributors
Information
Published In
Computers in Biology and Medicine Volume 171, Issue C
Mar 2024
1547 pages
ISSN:0010-4825
Issue’s Table of Contents
Elsevier Ltd.
Publisher
Pergamon Press, Inc.
United States
Publication History
Published: 09 July 2024
Author Tags
- Single-cell genomics
- Chromatin accessibility
- Data analysis tools
- Web server
- Deep learning
Qualifiers
- Research-article
Contributors
Other Metrics
View Article Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
Total Citations
Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Citations
View Options
View options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
Media
Figures
Other
Tables