scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis (2024)

research-article

Authors: Meiqin Gong, Yun Yu, Zixuan Wang, Junming Zhang, + 4, Xiongyi Wang, Cheng Fu, Yongqing Zhang, and Xiaodong Wang (Less)

Published: 09 July 2024 Publication History

  • 0citation
  • 0
  • Downloads

Metrics

Total Citations0Total Downloads0

Last 12 Months0

Last 6 weeks0

  • Get Citation Alerts

    New Citation Alert added!

    This alert has been successfully added and will be sent to:

    You will be notified whenever a record that you have chosen has been cited.

    To manage your alert preferences, click on the button below.

    Manage my Alerts

    New Citation Alert!

    Please log in to your account

      • View Options
      • References
      • Media
      • Tables
      • Share

    Abstract

    Interpreting single-cell chromatin accessibility data is crucial for understanding intercellular heterogeneity regulation. Despite the progress in computational methods for analyzing this data, there is still a lack of a comprehensive analytical framework and a user-friendly online analysis tool. To fill this gap, we developed a pre-trained deep learning-based framework, single-cell auto-correlation transformers (scAuto), to overcome the challenge. Following DNABERT’s methodology of pre-training and fine-tuning, scAuto learns a general understanding of DNA sequence’s grammar by being pre-trained on unlabeled human genome via self-supervision; it is then transferred to the single-cell chromatin accessibility analysis task of scATAC-seq data for supervised fine-tuning. We extensively validated scAuto on the Buenrostro2018 dataset, demonstrating its superior performance on chromatin accessibility prediction, single-cell clustering, and data denoising. Based on scAuto, we further developed an interactive web server for single-cell chromatin accessibility data analysis. It integrates tutorial-style interfaces for those with limited programming skills. The platform is accessible at http://zhanglab.icaup.cn. To our knowledge, this work is expected to help analyze single-cell chromatin accessibility data and facilitate the development of precision medicine.

    Highlights

    Present a framework for single-cell chromatin accessibility analysis.

    Develop an online analysis platform, scAuto.

    Conduct extensive experiments and achieve the state-of-the-art performance.

    References

    [1]

    Sinha Sarthak, Satpathy Ansuman T., Zhou Weiqiang, Ji Hongkai, Stratton Jo A., Jaffer Arzina, Bahlis Nizar, Morrissy Sorana, Biernaskie Jeff A., Profiling chromatin accessibility at single-cell resolution, Genomics, Proteomics & Bioinform. 19 (2) (2021) 172–190.

    [2]

    Preissl Sebastian, Gaulton Kyle J., Ren Bing, Characterizing cis-regulatory elements using single-cell epigenomics, Nature Rev. Genet. 24 (1) (2023) 21–43.

    [3]

    de Boer Carl G., Regev Aviv, BROCKMAN: deciphering variance in epigenomic regulators by k-mer factorization, BMC Bioinform. 19 (1) (2018) 1–13.

    [4]

    Lal Avantika, Chiang Zachary D., Yakovenko Nikolai, Duarte Fabiana M., Israeli Johnny, Buenrostro Jason D., Deep learning-based enhancement of epigenomics data with AtacWorks, Nature Commun. 12 (1) (2021) 1507.

    [5]

    Yuan Han, Kelley David R., Scbasset: sequence-based modeling of single-cell ATAC-seq using convolutional neural networks, Nature Methods 19 (9) (2022) 1088–1096.

    [6]

    Chen Xiaoyang, Chen Shengquan, Song Shuang, Gao Zijing, Hou Lin, Zhang Xuegong, Lv Hairong, Jiang Rui, Cell type annotation of single-cell chromatin accessibility data via supervised Bayesian embedding, Nat. Mach. Intell. 4 (2) (2022) 116–126.

    [7]

    Xiong Lei, Xu Kui, Tian Kang, Shao Yanqiu, Tang Lei, Gao Ge, Zhang Michael, Jiang Tao, Zhang Qiangfeng Cliff, SCALE method for single-cell ATAC-seq analysis via latent feature extraction, Nature Commun. 10 (1) (2019) 4576.

    [8]

    Cao Yingxin, Fu Laiyi, Wu Jie, Peng Qinke, Nie Qing, Zhang Jing, Xie Xiaohui, SAILER: scalable and accurate invariant representation learning for single-cell ATAC-seq processing and integration, Bioinformatics 37 (Supplement_1) (2021) i317–i326.

    [9]

    Ashuach Tal, Reidenbach Daniel A., Gayoso Adam, Yosef Nir, PeakVI: A deep generative model for single-cell chromatin accessibility analysis, Cell Rep. Methods 2 (3) (2022).

    [10]

    Hao Yuhan, Stuart Tim, Kowalski Madeline H., Choudhary Saket, Hoffman Paul, Hartman Austin, Srivastava Avi, Molla Gesmira, Madad Shaista, Fernandez-Granda Carlos, et al., Dictionary learning for integrative, multimodal and scalable single-cell analysis, Nature Biotechnol. (2023) 1–12.

    [11]

    Stuart Tim, Butler Andrew, Hoffman Paul, Hafemeister Christoph, Papalexi Efthymia, Mauck William M., Hao Yuhan, Stoeckius Marlon, Smibert Peter, Satija Rahul, Comprehensive integration of single-cell data, Cell 177 (7) (2019) 1888–1902.

    [12]

    Granja Jeffrey M., Corces M. Ryan, Pierce Sarah E., Bagdatli S. Tansu, Choudhry Hani, Chang Howard Y., Greenleaf William J., Archr is a scalable software package for integrative single-cell chromatin accessibility analysis, Nature Genetics 53 (3) (2021) 403–411.

    [13]

    Ji Yanrong, Zhou Zhihan, Liu Han, Davuluri Ramana V., DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome, Bioinformatics 37 (15) (2021) 2112–2120.

    [14]

    Hu Edward J., Shen Yelong, Wallis Phillip, Allen-Zhu Zeyuan, Li Yuanzhi, Wang Shean, Wang Lu, Chen Weizhu, Lora: Low-rank adaptation of large language models, 2021, arXiv preprint arXiv:2106.09685.

    [15]

    Li Xiang Lisa, Liang Percy, Prefix-tuning: Optimizing continuous prompts for generation, 2021, arXiv preprint arXiv:2101.00190.

    [16]

    Buenrostro Jason D., Corces M. Ryan, Lareau Caleb A., Wu Beijing, Schep Alicia N., Aryee Martin J., Majeti Ravindra, Chang Howard Y., Greenleaf William J., Integrated single-cell analysis maps the continuous regulatory landscape of human hematopoietic differentiation, Cell 173 (6) (2018) 1535–1548.

    [17]

    Developers Pysam, Pysam: a python module for reading and manipulating files in the sam/bam format, 2018, Preprint at.

    [18]

    Zhang Yongqing, Liu Yuhang, Wang Zixuan, Wang Maocheng, Xiong Shuwen, Huang Guo, Gong Meiqin, Uncovering the relationship between tissue-specific TF-DNA binding and chromatin features through a transformer-based model, Genes 13 (11) (2022) 1952.

    [19]

    Wu Haixu, Xu Jiehui, Wang Jianmin, Long Mingsheng, Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting, Adv. Neural Inf. Process. Syst. 34 (2021) 22419–22430.

    [20]

    De Kanter Jurrian K., Lijnzaad Philip, Candelli Tito, Margaritis Thanasis, Holstege Frank C.P., CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing, Nucleic Acids Res. 47 (16) (2019) e95.

    [21]

    Tsuyuzaki Koki, Sato Hiroyuki, Sato Kenta, Nikaido Itoshi, Benchmarking principal component analysis for large-scale single-cell RNA-sequencing, Genome Biol. 21 (1) (2020) 1–17.

    [22]

    Yu Tianwei, A new dynamic correlation algorithm reveals novel functional aspects in single cell and bulk RNA-seq data, PLoS Comput. Biol. 14 (8) (2018).

    [23]

    Alquicira-Hernandez Jose, Sathe Anuja, Ji Hanlee P., Nguyen Quan, Powell Joseph E., Scpred: accurate supervised method for cell-type classification from single-cell RNA-seq data, Genome Biol. 20 (1) (2019) 1–17.

    [24]

    Xiang Ruizhi, Wang Wencan, Yang Lei, Wang Shiyuan, Xu Chaohan, Chen Xiaowen, A comparison for dimensionality reduction methods of single-cell RNA-seq data, Front. Genet. 12 (2021).

    [25]

    Peng Lihong, Tian Xiongfbei, Tian Geng, Xu Junlin, Huang Xin, Weng Yanbin, Yang Jialiang, Zhou Liqian, Single-cell RNA-seq clustering: datasets, models, and algorithms, RNA Biol. 17 (6) (2020) 765–783.

    [26]

    Traag Vincent A., Waltman Ludo, Van Eck Nees Jan, From Louvain to Leiden: guaranteeing well-connected communities, Sci. Rep. 9 (1) (2019) 5233.

    [27]

    Zhu Xiaoshu, Zhang Jie, Xu Yunpei, Wang Jianxin, Peng Xiaoqing, Li Hong-Dong, Single-cell clustering based on shared nearest neighbor and graph partitioning, Interdiscip. Sci. Comput. Life Sci. 12 (2020) 117–130.

    [28]

    Kobak Dmitry, Berens Philipp, The art of using t-SNE for single-cell transcriptomics, Nature Commun. 10 (1) (2019) 5416.

    [29]

    Becht Etienne, McInnes Leland, Healy John, Dutertre Charles-Antoine, Kwok Immanuel WH, Ng Lai Guan, Ginhoux Florent, Newell Evan W., Dimensionality reduction for visualizing single-cell data using UMAP, Nature Biotechnol. 37 (1) (2019) 38–44.

    [30]

    Sud Kanika, Sud Kanika, Understanding REST APIs, Practical hapi: Build Your Own hapi Apps and Learn from Industry Case Studies, Springer, 2020, pp. 1–11.

    [31]

    Lotfollahi Mohammad, Naghipourfar Mohsen, Luecken Malte D., Khajavi Matin, Büttner Maren, Wagenstetter Marco, Avsec Žiga, Gayoso Adam, Yosef Nir, Interlandi Marta, et al., Mapping single-cell data to reference atlases by transfer learning, Nature Biotechnol. 40 (1) (2022) 121–130.

    [32]

    Kelley David R., Reshef Yakir A., Bileschi Maxwell, Belanger David, McLean Cory Y., Snoek Jasper, Sequential regulatory activity prediction across chromosomes with convolutional neural networks, Genome Res. 28 (5) (2018) 739–750.

    [33]

    Zhou Jian, Troyanskaya Olga G., Predicting effects of noncoding variants with deep learning–based sequence model, Nature Methods 12 (10) (2015) 931–934.

    [34]

    Kelley David R., Snoek Jasper, Rinn John L., Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks, Genome Res. 26 (7) (2016) 990–999.

    [35]

    Dey Rahul, Salem Fathi M., Gate-variants of gated recurrent unit (GRU) neural networks, in: 2017 IEEE 60th International Midwest Symposium on Circuits and Systems, MWSCAS, IEEE, 2017, pp. 1597–1600.

    [36]

    Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.

    [37]

    Beltagy Iz, Peters Matthew E., Cohan Arman, Longformer: The long-document transformer, 2020, arXiv preprint arXiv:2004.05150.

    [38]

    Wang Zixuan, Zhang Yongqing, Yu Yun, Zhang Junming, Liu Yuhang, Zou Quan, A unified deep learning framework for single-cell ATAC-seq analysis based on ProdDep transformer encoder, Int. J. Mol. Sci. 24 (5) (2023) 4784.

    [39]

    Bravo González-Blas Carmen, Minnoye Liesbeth, Papasokrati Dafni, Aibar Sara, Hulselmans Gert, Christiaens Valerie, Davie Kristofer, Wouters Jasper, Aerts Stein, cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data, Nature Methods 16 (5) (2019) 397–400.

    [40]

    Van Dijk David, Sharma Roshan, Nainys Juozas, Yim Kristina, Kathail Pooja, Carr Ambrose J., Burdziak Cassandra, Moon Kevin R., Chaffer Christine L, Pattabiraman Diwakar, et al., Recovering gene interactions from single-cell data using data diffusion, Cell 174 (3) (2018) 716–729.

    [41]

    Li Zhijian, Kuppe Christoph, Ziegler Susanne, Cheng Mingbo, Kabgani Nazanin, Menzel Sylvia, Zenke Martin, Kramann Rafael, Costa Ivan G., Chromatin-accessibility estimation from single-cell ATAC-seq data with scopen, Nature Commun. 12 (1) (2021) 6386.

    [42]

    Xiong Lei, Tian Kang, Li Yuzhe, Ning Weixi, Gao Xin, Zhang Qiangfeng Cliff, Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space, Nature Commun. 13 (1) (2022) 6118.

    [43]

    Cao Zhi-Jie, Gao Ge, Multi-omics single-cell data integration and regulatory inference with graph-linked embedding, Nature Biotechnol. 40 (10) (2022) 1458–1466.

    [44]

    Liu Yuhang, Wang Zixuan, Yuan Hao, Zhu Guiquan, Zhang Yongqing, HEAP: a task adaptive-based explainable deep learning framework for enhancer activity prediction, Brief. Bioinform. 24 (5) (2023) bbad286.

    Recommendations

    • RoboCOP: Multivariate State Space Model Integrating Epigenomic Accessibility Data to Elucidate Genome-Wide Chromatin Occupancy

      Research in Computational Molecular Biology

      Abstract

      Chromatin is the tightly packaged structure of DNA and protein within the nucleus of a cell. The arrangement of different protein complexes along the DNA modulates and is modulated by gene expression. Measuring the binding locations and level of ...

      Read More

    • Genome‐wide prediction of chromatin accessibility based on gene expression

      Abstract

      Decoding gene regulation in a biological system requires information from both transcriptome and regulome. While multiple high‐throughput transcriptome and regulome mapping technologies are available, transcriptome profiling is more widely used. ...

        Open chromatin marks active regulatory elements in the genome and is important for understanding gene regulation. This article reviews methods for predicting chromatin accessibility using widely available gene expression data and discusses their ...

        Read More

      • Attentive gated neural networks for identifying chromatin accessibility

        Abstract

        Accessible chromatin is associated strongly with active gene regulatory regions. Enhancers and promoters commonly occur in accessible chromatin, and systematically discovering functional sites is indispensable at the whole genome level. However, ...

        Read More

      Comments

      Information & Contributors

      Information

      Published In

      scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis (1)

      Computers in Biology and Medicine Volume 171, Issue C

      Mar 2024

      1547 pages

      ISSN:0010-4825

      Issue’s Table of Contents

      Elsevier Ltd.

      Publisher

      Pergamon Press, Inc.

      United States

      Publication History

      Published: 09 July 2024

      Author Tags

      1. Single-cell genomics
      2. Chromatin accessibility
      3. Data analysis tools
      4. Web server
      5. Deep learning

      Qualifiers

      • Research-article

      Contributors

      scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis (2)

      Other Metrics

      View Article Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Total Citations

      • Total Downloads

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0

      Other Metrics

      View Author Metrics

      Citations

      View Options

      View options

      Get Access

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      Get this Publication

      Media

      Figures

      Other

      Tables

      scAuto as a comprehensive framework for single-cell chromatin accessibility data analysis (2024)

      FAQs

      What methods would you use to study chromatin accessibility? ›

      Chromatin accessibility profiling methods are used to assess the 'openness' of chromatin and to identify candidate regulatory regions in a tissue or cell type. These methods involve enzymatic cleavage, transposition or DNA methylation, and they can be applied to bulk samples or single cells.

      What is the single cell ATAC-Seq method? ›

      The market offers technologies focusing on such features, like a cell's set of proteins, metabolic states, or epigenetic profiles. Chief among the latter type is single-cell ATAC-seq (scATAC-seq). This technology enables researchers to capture the chromatin accessibility profiles of tissues in single-cell resolution.

      What does chromatin accessibility tell you? ›

      Chromatin accessibility is the degree to which nuclear macromolecules are able to physically contact chromatinized DNA and is determined by the occupancy and topological organization of nucleosomes as well as other chromatin-binding factors that occlude access to DNA.

      What does ChIP seq tell you? ›

      ChIP-Seq identifies the binding sites of DNA-associated proteins and can be used to map global binding sites for a given protein. ChIP-Seq typically starts with crosslinking of DNA-protein complexes. Samples are then fragmented and treated with an exonuclease to trim unbound oligonucleotides.

      What does Atac-Seq tell you? ›

      What is ATAC-Seq? The assay for transposase-accessible chromatin with sequencing (ATAC-Seq) is a popular method for determining chromatin accessibility across the genome.

      How is RNA-seq different from single-cell seq? ›

      So, Bulk RNA sequencing (bulk RNA-seq) provides an average gene expression profile for a population of cells, while Single-cell RNA sequencing (scRNA-seq) allows for the study of gene expression in individual cells.

      What are the advantages of single-cell atac-seq? ›

      The main advantages of ATAC-Seq compared to other techniques, such as FAIRE-Seq or DNase-Seq that investigate similar chromatin features, are the lower number of cells that are required for the assay and the relative simplicity of its two-step protocol.

      How to measure DNA accessibility? ›

      Quantitative DNA accessibility assay (qDA-seq)381 uses restriction enzyme AluI to measure absolute accessibility and the rate at which accessible sites are cut.

      What type of microscopy is needed to see chromatin? ›

      The nucleosomes are the smallest repeating unit of the chromatin fibers and have dimension of ∼11 nm, therefore well below the resolution limit of conventional fluorescent microscopy. For this reason, super resolution microscopy techniques are extremely useful to study chromatin structure.

      What techniques are used to study transcriptomics? ›

      There are two key contemporary techniques in the field: microarrays, which quantify a set of predetermined sequences, and RNA-Seq, which uses high-throughput sequencing to record all transcripts.

      What method is used to detect alien chromatin? ›

      A key method is in situ hybridization, allowing alien chromatin to be identified in chromosome preparations of alien-cross derived plants.

      References

      Top Articles
      Rincón De La Consultora
      FlexShaft® Maschinen | RIDGID Werkzeuge
      Dragon Age Inquisition War Table Operations and Missions Guide
      Global Foods Trading GmbH, Biebesheim a. Rhein
      Tabc On The Fly Final Exam Answers
      Obor Guide Osrs
      Delectable Birthday Dyes
      Which Is A Popular Southern Hemisphere Destination Microsoft Rewards
      Seafood Bucket Cajun Style Seafood Restaurant in South Salt Lake - Restaurant menu and reviews
      Dexter Gomovies
      Sony E 18-200mm F3.5-6.3 OSS LE Review
      Apne Tv Co Com
      Testberichte zu E-Bikes & Fahrrädern von PROPHETE.
      If you bought Canned or Pouched Tuna between June 1, 2011 and July 1, 2015, you may qualify to get cash from class action settlements totaling $152.2 million
      Long Island Jobs Craigslist
      Transactions (zipForm Edition) | Lone Wolf | Real Estate Forms Software
      The Ultimate Guide to Extras Casting: Everything You Need to Know - MyCastingFile
      Homeaccess.stopandshop
      Craigslist Ludington Michigan
      FAQ's - KidCheck
      As families searched, a Texas medical school cut up their loved ones
      Times Narcos Lied To You About What Really Happened - Grunge
      Vht Shortener
      Cosas Aesthetic Para Decorar Tu Cuarto Para Imprimir
      A Man Called Otto Showtimes Near Carolina Mall Cinema
      Airg Com Chat
      Nikki Catsouras: The Tragic Story Behind The Face And Body Images
      How often should you visit your Barber?
      Wheeling Matinee Results
      Vip Lounge Odu
      2430 Research Parkway
      Diana Lolalytics
      Craigslist Car For Sale By Owner
      Staar English 1 April 2022 Answer Key
      My.lifeway.come/Redeem
      Bartow Qpublic
      Silive Obituary
      10 Rarest and Most Valuable Milk Glass Pieces: Value Guide
      Unlock The Secrets Of "Skip The Game" Greensboro North Carolina
      Stosh's Kolaches Photos
      Studentvue Calexico
      Gas Buddy Il
      Sara Carter Fox News Photos
      Premiumbukkake Tour
      Www Pig11 Net
      Mawal Gameroom Download
      Vcuapi
      Coldestuknow
      4015 Ballinger Rd Martinsville In 46151
      Scholar Dollar Nmsu
      Obituaries in Westchester, NY | The Journal News
      Latest Posts
      Article information

      Author: Kimberely Baumbach CPA

      Last Updated:

      Views: 6427

      Rating: 4 / 5 (41 voted)

      Reviews: 88% of readers found this page helpful

      Author information

      Name: Kimberely Baumbach CPA

      Birthday: 1996-01-14

      Address: 8381 Boyce Course, Imeldachester, ND 74681

      Phone: +3571286597580

      Job: Product Banking Analyst

      Hobby: Cosplaying, Inline skating, Amateur radio, Baton twirling, Mountaineering, Flying, Archery

      Introduction: My name is Kimberely Baumbach CPA, I am a gorgeous, bright, charming, encouraging, zealous, lively, good person who loves writing and wants to share my knowledge and understanding with you.