demuxSNP: supervised demultiplexing single-cell RNA sequencing using cell hashing and SNPs
Background: Multiplexing single-cell RNA sequencing experiments reduces sequencing cost and facilitates larger-scale studies. However, factors such as cell hashing quality and class size imbalance impact demultiplexing algorithm performance, reducing cost effectiveness. Findings: We propose a supervised algorithm, demuxSNP, which leverages both cell hashing and genetic variation between individuals (single-nucletotide polymorphisms [SNPs]). demuxSNP addresses fundamental limitations in demultiplexing methods that use only one data modality. Some cells may be confidently demultiplexed using probabilistic hashing methods. demuxSNP uses these data to infer the genotype of singlet and doublet clusters and predict on cells assigned as negative, uncertain, or doublet using a nearest-neighbor approach adapted for missing data. We benchmarked demuxSNP against hashing, genotype-free SNP and hybrid methods on simulated and real data from renal cell cancer. demuxSNP outperformed standalone hashing methods on low-quality hashing data benchmark, improved overall classifica?tion accuracy, and allowed more high RNA quality cells to be recovered. Through varying simulated doublet rates, we showed that genotype-free SNP and hybrid methods that leverage them were impacted by class size imbalance and doublet rate. demuxSNP’s supervised approach was more robust to doublet rate in experiments with class size imbalance. Conclusions: demuxSNP uses hashing and SNP data to demultiplex datasets with low hashing quality where biological samples are genetically distinct. Unassigned or negative cells with high RNA quality are recovered, making more cells available for analysis. Data simulation and benchmarking pipelines as well as processed benchmarking data for 5–50% doublets are publicly available. demuxSNP is available as an R/Bioconductor package (https://doi.org/doi:10.18129/B9.bioc.demuxSNP)
History
Publication
GigaScience 13, pp.1–12Publisher
Oxford University Press GigaScienceOther Funding information
This project has been made possible in part by grant number CZF 2019-002,443 (Lead PI: Martin Morgan, Co-PI: A.C.C.) from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation of which A.C.C. and M.P.L. are grantees, as well as by startup funding from the School of Medicine, University of Limerick to A.C.C. In addition, this project was supported by the Assistant Secretary of Defense for Health Affairs endorsed by the US Department of Defense, Kidney Cancer Research Pro?gram (KCRP) through the FY21 Translational Research Partnership Award (W81XWH-21-1-0442, lead PI: Wayne A. Maraso) and FY21 Idea Development Award (W81XWH-21-1-0482, lead PI: Wayne A. Maraso) of which Y.W., A.C.C., and M.P.L. are grantees Wong Family Award and Kidney Cancer Association Trailblazer Award to Y.WAlso affiliated with
- Health Research Institute (HRI)
External identifier
Department or School
- School of Medicine