Recognition of CRISPR Off-Target Cleavage Sites with SeqGAN

Wen      Li; Xiao-Bo      Wang; Yan      Xu

Abstract

Background: The CRISPR system can quickly achieve the editing of different gene loci by changing a small sequence on a single guide RNA. But the off-target event limits the further development of the CRISPR system. How to improve the efficiency and specificity of this technology and minimize the risk of off-target have always been a challenge. For genome-wide CRISPR Off-Target Cleavage Sites (OTS) prediction, an important issue is data imbalance, that is, the number of true OTS identified is much less than that of all possible nucleotide mismatch loci.

Methods: In this work, based on the sequence-generating adversarial network (SeqGAN), positive offtarget sequences were generated to amplify the off-target gene locus OTS dataset of Cpf1. Then we trained the data by a deep Convolutional Neural Network (CNN) to obtain a predictor with stronger generalization ability and better performance.

Results: In 10-fold cross-validation, the AUC value of the CNN classifier after SeqGAN balance was 0.941, which was higher than that of the original 0.863 and over-sampling 0.929. In independence testing, the AUC value of the CNN classifier after SeqGAN balance was 0.841, which was higher than that of the original 0.833 and over-sampling 0.836. The PR value was 0.722 after SeqGAN, which was also about higher 0.16 than the original data and higher about 0.03 than over-sampling.

Conclusion: The sequence generation antagonistic network SeqGAN was firstly used to deal with data imbalance processing on CRISPR data. All the results showed that the SeqGAN can effectively generate positive data for CRISPR off-target sites.

Keywords: SeqGAN, CRISPR, off-target, data imbalance, CNN, single-guide RNA.

Graphical Abstract

[1] 
Hille F, Charpentier E. CRISPR-Cas: biology, mechanisms and rele-vance. Philos Trans R Soc Lond B Biol Sci  2016; 371(1707): 371-.
[http://dx.doi.org/10.1098/rstb.2015.0496] [PMID: 27672148] 
[2] 
Fagerlund RD, Staals RH, Fineran PC. The Cpf1 CRISPR-Cas protein expands genome-editing tools. Genome Biol  2015; 16: 251.
[http://dx.doi.org/10.1186/s13059-015-0824-9] [PMID: 26578176] 
[3] 
Yang F, Li Y. The new generation tool for CRISPR genome editing: CRISPR/Cpf1. Sheng Wu Gong Cheng Xue Bao  2017; 33(3): 361-71.
[PMID: 28941336] 
[4] 
Zhang C, Lei Z, Li K, Shang Y, Xu WT. Research progress of off-target effect and detection technology in CRISPR/Cas9 system. Shengwu Jishu Tongbao  2020; 36(3): 1-10.
[5] 
Wang J, Zhang X, Cheng L, Luo Y. An overview and metanalysis of machine and deep learning-based CRISPR gRNA design tools. RNA Biol  2020; 17(1): 13-22.
[http://dx.doi.org/10.1080/15476286.2019.1669406] [PMID: 31533522] 
[6] 
Chuai G, Ma H, Yan J, et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol  2018; 19(1): 80.
[http://dx.doi.org/10.1186/s13059-018-1459-4] [PMID: 29945655] 
[7] 
Abadi S, Yan WX, Amar D, Mayrose I. A machine learning approach for predicting CRISPR-Cas9 cleavage efficiencies and patterns un-derlying its mechanism of action. PLOS Comput Biol  2017; 13(10)e1005807
[http://dx.doi.org/10.1371/journal.pcbi.1005807] [PMID: 29036168] 
[8] 
Lin J, Wong KC. Off-target predictions in CRISPR-Cas9 gene editing using deep learning. Bioinformatics  2018; 34(17): i656-63.
[http://dx.doi.org/10.1093/bioinformatics/bty554] [PMID: 30423072] 
[9] 
Gao Y, Chuai G, Yu W, Qu S, Liu Q. Data imbalance in CRISPR off-target prediction. Brief Bioinform  2020; 21(4): 1448-54.
[PMID: 31267129] 
[10] 
Tsai SQ, Zheng Z, Nguyen NT, et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat Biotechnol  2015; 33(2): 187-97.
[http://dx.doi.org/10.1038/nbt.3117] [PMID: 25513782] 
[11] 
Kim D, Bae S, Park J, et al. Digenome-seq: Genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nat Methods  2015; 12(3): 237-43.
[http://dx.doi.org/10.1038/nmeth.3284] 
[12] 
Tsai SQ, Nguyen NT, Malagon-Lopez J, Topkar VV, Aryee MJ, Joung JK. CIRCLE-seq: A highly sensitive in vitro screen for genome-wide CRISPR-Cas9 nuclease off-targets. Nat Methods  2017; 14(6): 607-14.
[http://dx.doi.org/10.1038/nmeth.4278] [PMID: 28459458] 
[13] 
Listgarten J, Weinstein M, Kleinstiver BP, et al. Prediction of off-target activities for the end-to-end design of CRISPR guide RNAs. Nat Biomed Eng  2018; 2(1): 38-47.
[http://dx.doi.org/10.1038/s41551-017-0178-6] [PMID: 29998038] 
[14] 
Goodfellow IJ, Pouget-Abadie J, Mirza M, et al. Generative adversa-rial nets. Arvix 2014.
[15] 
Mirza M, Osindero S. Conditional generative adversarial nets. ArXiv 2014.
[16] 
Radford A, Metz L. Unsupervised representation learning with deep convolutional generative adversarial networks. Arvix 2015.
[17] 
Zhang G, Tu EDC. Stable and improved generative adversarial nets (GANS): A constructive survey. IEEE International Conference on Image Processing 2017.
[18] 
Quan TM, Nguyen-Duc T, Jeong WK. Compressed sensing MRI reconstruction using a generative adversarial network with a cyclic loss. IEEE Trans Med Imaging  2018; 37(6): 1488-97.
[http://dx.doi.org/10.1109/TMI.2018.2820120] [PMID: 29870376] 
[19] 
Spurr A, Aksan E, Hilliges O. Guiding InfoGAN with Semi-supervision.Ceci M, Hollmén J, Todorovski L, Vens C, Džeroski S. (eds) Machine Learning and Knowledge Discovery in Databases ECML PKDD. Lecture Notes in Computer Science vol 10534Springer: Cham 2017.
[http://dx.doi.org/10.1007/978-3-319-71249-9_8] 
[20] 
Zhao J, Mathieu M, Lecun Y. Energy-based generative adversarial network. Arvix 2017.
[21] 
Yu L, Zhang W, Wang J, Yu Y. SeqGAN: Sequence generative adversa-rial nets with policy gradient. ArXiv 2017.
[22] 
Kim HK, Song M, Lee J, et al. In vivo high-throughput profiling of CRISPR-Cpf1 activity. Nat Methods  2017; 14(2): 153-9.
[http://dx.doi.org/10.1038/nmeth.4104] [PMID: 27992409] 
[23] 
Yin BC, Wang WT, Wang LC. Summary of deep learning re-search. J Beijing University of Technology 2015.
[24] 
Wainberg M, Merico D, Delong A, Frey BJ. Deep learning in biomedi-cine. Nat Biotechnol  2018; 36(9): 829-38.
[http://dx.doi.org/10.1038/nbt.4233] [PMID: 30188539] 
[25] 
Khan A, Sohail A, Zahoora U, Qureshi AS. A survey of the recent archi-tectures of deep convolutional neural networks. Artif Intell Rev 2020.
[http://dx.doi.org/10.1007/s10462-020-09825-6] 
[26] 
Suo SB, Qiu JD, Shi SP, et al. Position-specific analysis and prediction for protein lysine acetylation based on multiple features. PLoS One  2012; 7(11)e49108
[http://dx.doi.org/10.1371/journal.pone.0049108] [PMID: 23173045] 

Cite As

Current Bioinformatics

Recognition of CRISPR Off-Target Cleavage Sites with SeqGAN

Abstract

Graphical Abstract