Combinatorial Chemistry & High Throughput Screening

Author(s): Ning Li, Jialiang Yang, Wen Zhu* and Ying Liang*

DOI: 10.2174/1386207323666200317121136

MVSC: A Multi-variation Simulator of Cancer Genome

Page: [326 - 333] Pages: 8

  • * (Excluding Mailing and Handling)

Abstract

Background: Many forms of variations exist in the genome, which are the main causes of individual phenotypic differences. The detection of variants, especially those located in the tumor genome, still faces many challenges due to the complexity of the genome structure. Thus, the performance assessment of variation detection tools using next-generation sequencing platforms is urgently needed.

Method: We have created a software package called the Multi-Variation Simulator of Cancer genomes (MVSC) to simulate common genomic variants, including single nucleotide polymorphisms, small insertion and deletion polymorphisms, and structural variations (SVs), which are analogous to human somatically acquired variations. Three sets of variations embedded in genomic sequences in different periods were dynamically and sequentially simulated one by one.

Results: In cancer genome simulation, complex SVs are important because this type of variation is characteristic of the tumor genome structure. Overlapping variations of different sizes can also coexist in the same genome regions, adding to the complexity of cancer genome architecture. Our results show that MVSC can efficiently simulate a variety of genomic variants that cannot be simulated by existing software packages.

Conclusion: The MVSC-simulated variants can be used to assess the performance of existing tools designed to detect SVs in next-generation sequencing data, and we also find that MVSC is memory and time-efficient compared with similar software packages.

Keywords: Next-generation sequencing, single nucleotide polymorphism, structural variation, cancer genome, variation simulator, variation detection algorithm.

[1]
Wang, D.G.; Fan, J-B.; Siao, C-J.; Berno, A.; Young, P.; Sapolsky, R.; Ghandour, G.; Perkins, N.; Winchester, E.; Spencer, J.; Kruglyak, L.; Stein, L.; Hsie, L.; Topaloglou, T.; Hubbell, E.; Robinson, E.; Mittmann, M.; Morris, M.S.; Shen, N.; Kilburn, D.; Rioux, J.; Nusbaum, C.; Rozen, S.; Hudson, T.J.; Lipshutz, R.; Chee, M.; Lander, E.S. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science, 1998, 280(5366), 1077-1082.
[http://dx.doi.org/10.1126/science.280.5366.1077] [PMID: 9582121]
[2]
Serre, D.; Hudson, T.J. Resources for genetic variation studies. Annu. Rev. Genomics Hum. Genet., 2006, 7, 443-457.
[http://dx.doi.org/10.1146/annurev.genom.7.080505.115806] [PMID: 16759172]
[3]
Mills, R.E.; Luttig, C.T.; Larkins, C.E.; Beauchamp, A.; Tsui, C.; Pittard, W.S.; Devine, S.E. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res., 2006, 16(9), 1182-1190.
[http://dx.doi.org/10.1101/gr.4565806] [PMID: 16902084]
[4]
Feuk, L.; Carson, A.R.; Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet., 2006, 7(2), 85-97.
[http://dx.doi.org/10.1038/nrg1767] [PMID: 16418744]
[5]
Huang, G. A novel neighborhood model to predict protein function from protein-protein interaction data. Curr. Proteomics, 2014, 11, 237-244.
[http://dx.doi.org/10.2174/157016461104150121113959]
[6]
Huang, G.; Zeng, W. A discrete hidden Markov model for detecting histone crotonyllysine sites. JMCMCC, 2016, 75, 717-730.
[7]
Sun, Y.; Shi, N.; Lu, H.; Zhang, J.; Ma, Y.; Qiao, Y.; Mao, Y.; Jia, K.; Han, L.; Liu, F.; Li, H.; Lin, Z.; Li, X.; Zhao, X. ABCC4 copy number variation is associated with susceptibility to esophageal squamous cell carcinoma. Carcinogenesis, 2014, 35(9), 1941-1950.
[http://dx.doi.org/10.1093/carcin/bgu043] [PMID: 24510239]
[8]
Zhang, L.; Zhou, Y.; Cheng, C.; Cui, H.; Cheng, L.; Kong, P.; Wang, J.; Li, Y.; Chen, W.; Song, B.; Wang, F.; Jia, Z.; Li, L.; Li, Y.; Yang, B.; Liu, J.; Shi, R.; Bi, Y.; Zhang, Y.; Wang, J.; Zhao, Z.; Hu, X.; Yang, J.; Li, H.; Gao, Z.; Chen, G.; Huang, X.; Yang, X.; Wan, S.; Chen, C.; Li, B.; Tan, Y.; Chen, L.; He, M.; Xie, S.; Li, X.; Zhuang, X.; Wang, M.; Xia, Z.; Luo, L.; Ma, J.; Dong, B.; Zhao, J.; Song, Y.; Ou, Y.; Li, E.; Xu, L.; Wang, J.; Xi, Y.; Li, G.; Xu, E.; Liang, J.; Yang, X.; Guo, J.; Chen, X.; Zhang, Y.; Li, Q.; Liu, L.; Li, Y.; Zhang, X.; Yang, H.; Lin, D.; Cheng, X.; Guo, Y.; Wang, J.; Zhan, Q.; Cui, Y. Genomic analyses reveal mutational signatures and frequently altered genes in esophageal squamous cell carcinoma. Am. J. Hum. Genet., 2015, 96(4), 597-611.
[http://dx.doi.org/10.1016/j.ajhg.2015.02.017] [PMID: 25839328]
[9]
Quinlan, A.R.; Hall, I.M. Characterizing complex structural variation in germline and somatic genomes. Trends Genet., 2012, 28(1), 43-53.
[http://dx.doi.org/10.1016/j.tig.2011.10.002] [PMID: 22094265]
[10]
Yang, L.; Luquette, L.J.; Gehlenborg, N.; Xi, R.; Haseley, P.S.; Hsieh, C.H.; Zhang, C.; Ren, X.; Protopopov, A.; Chin, L.; Kucherlapati, R.; Lee, C.; Park, P.J. Diverse mechanisms of somatic structural variations in human cancer genomes. Cell, 2013, 153(4), 919-929.
[http://dx.doi.org/10.1016/j.cell.2013.04.010] [PMID: 23663786]
[11]
Liang, Y.; Liao, B.; Zhu, W. An improved binary differential evolution algorithm to infer tumor phylogenetic trees. BioMed Res. Int., 2017, 20175482750
[http://dx.doi.org/10.1155/2017/5482750] [PMID: 29279850]
[12]
Gu, W.; Zhang, F.; Lupski, J.R. Mechanisms for human genomic rearrangements. PathoGenetics, 2008, 1(1), 4.
[http://dx.doi.org/10.1186/1755-8417-1-4] [PMID: 19014668]
[13]
Kidd, J.M.; Graves, T.; Newman, T.L.; Fulton, R.; Hayden, H.S.; Malig, M.; Kallicki, J.; Kaul, R.; Wilson, R.K.; Eichler, E.E. A human genome structural variation sequencing resource reveals insights into mutational mechanisms. Cell, 2010, 143(5), 837-847.
[http://dx.doi.org/10.1016/j.cell.2010.10.027] [PMID: 21111241]
[14]
Tattini, L.; D’Aurizio, R.; Magi, A. Detection of genomic structural variants from next-generation sequencing data. Front. Bioeng. Biotechnol., 2015, 3, 92.
[http://dx.doi.org/10.3389/fbioe.2015.00092] [PMID: 26161383]
[15]
Greenman, C.D.; Pleasance, E.D.; Newman, S.; Yang, F.; Fu, B.; Nik-Zainal, S.; Jones, D.; Lau, K.W.; Carter, N.; Edwards, P.A.W.; Futreal, P.A.; Stratton, M.R.; Campbell, P.J. Estimation of rearrangement phylogeny for cancer genomes. Genome Res., 2012, 22(2), 346-361.
[http://dx.doi.org/10.1101/gr.118414.110] [PMID: 21994251]
[16]
McPherson, A.; Wu, C.; Wyatt, A.W.; Shah, S.; Collins, C.; Sahinalp, S.C. nFuse: discovery of complex genomic rearrangements in cancer using high-throughput sequencing. Genome Res., 2012, 22(11), 2250-2261.
[http://dx.doi.org/10.1101/gr.136572.111] [PMID: 22745232]
[17]
Zhao, X.; Emery, S.B.; Myers, B.; Kidd, J.M.; Mills, R.E. Resolving complex structural genomic rearrangements using a randomized approach. Genome Biol., 2016, 17(1), 126.
[http://dx.doi.org/10.1186/s13059-016-0993-1] [PMID: 27287201]
[18]
Qin, M.; Liu, B.; Conroy, J.M.; Morrison, C.D.; Hu, Q.; Cheng, Y.; Murakami, M.; Odunsi, A.O.; Johnson, C.S.; Wei, L. SCNVSim: somatic copy number variation and structure variation simulator. BMC Bioinformatics, 2015, 16, 66.
[19]
Zhao, M.; Liu, D.; Qu, H. Systematic review of next-generation sequencing simulators: computational tools, features and perspectives. Brief. Funct. Genomics, 2017, 16(3), 121-128.
[PMID: 27069250]
[20]
Pattnaik, S.; Gupta, S.; Rao, A.A.; Panda, B. SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data. BMC Bioinformatics, 2014, 15, 40.
[http://dx.doi.org/10.1186/1471-2105-15-40] [PMID: 24495296]
[21]
Bruno, A.E.; Miecznikowski, J.C.; Qin, M.; Wang, J.; Liu, S. FUSIM: a software tool for simulating fusion transcripts. BMC Bioinformatics, 2013, 14, 13.
[http://dx.doi.org/10.1186/1471-2105-14-13] [PMID: 23323884]
[22]
Bartenhagen, C.; Dugas, M. RSVSim: an R/Bioconductor package for the simulation of structural variations. Bioinformatics, 2013, 29(13), 1679-1681.
[http://dx.doi.org/10.1093/bioinformatics/btt198] [PMID: 23620362]
[23]
Mu, J.C.; Mohiyuddin, M.; Li, J.; Bani Asadi, N.; Gerstein, M.B.; Abyzov, A.; Wong, W.H.; Lam, H.Y. VarSim: a high-fidelity simulation and validation framework for high-throughput genome sequencing with cancer applications. Bioinformatics, 2015, 31(9), 1469-1471.
[http://dx.doi.org/10.1093/bioinformatics/btu828] [PMID: 25524895]
[24]
Poduri, A.; Evrony, G.D.; Cai, X.; Walsh, C.A. Somatic mutation, genomic variation, and neurological disease. Science, 2013, 341(6141)1237758
[http://dx.doi.org/10.1126/science.1237758] [PMID: 23828942]
[25]
Abecasis, G.R.; Auton, A.; Brooks, L.D.; DePristo, M.A.; Durbin, R.M.; Handsaker, R.E.; Kang, H.M.; Marth, G.T.; McVean, G.A. An integrated map of genetic variation from 1,092 human genomes. Nature, 2012, 491(7422), 56-65.
[http://dx.doi.org/10.1038/nature11632] [PMID: 23128226]
[26]
Cartwright, R.A. Problems and solutions for estimating indel rates and length distributions. Mol. Biol. Evol., 2009, 26(2), 473-480.
[http://dx.doi.org/10.1093/molbev/msn275] [PMID: 19042944]
[27]
Fan, Y.; Wang, W.; Ma, G.; Liang, L.; Shi, Q.; Tao, S. Patterns of insertion and deletion in Mammalian genomes. Curr. Genomics, 2007, 8(6), 370-378.
[http://dx.doi.org/10.2174/138920207783406479] [PMID: 19412437]
[28]
Zhang, Z.; Gerstein, M. Patterns of nucleotide substitution, insertion and deletion in the human genome inferred from pseudogenes. Nucleic Acids Res., 2003, 31(18), 5338-5348.
[http://dx.doi.org/10.1093/nar/gkg745] [PMID: 12954770]
[29]
Sakasegawa, H. Stratified rejection and squeeze method for generating beta random numbers. Ann. Inst. Stat. Math., 1983, 35, 291-302.
[http://dx.doi.org/10.1007/BF02480984]
[30]
Hung, Y-C.; Balakrishnan, N.; Lin, Y-T. Evaluation of beta generation algorithms. Commun. Stat. Simul. Comput., 2009, 38, 750-770.
[http://dx.doi.org/10.1080/03610910802645347]
[31]
MacDonald, J.R.; Ziman, R.; Yuen, R.K.; Feuk, L.; Scherer, S.W. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res., 2014, 42(Database issue), D986-D992.
[http://dx.doi.org/10.1093/nar/gkt958] [PMID: 24174537]
[32]
Yang, H.; Zhong, Y.; Peng, C.; Chen, J-Q.; Tian, D. Important role of indels in somatic mutations of human cancer genes. BMC Med. Genet., 2010, 11, 128.
[http://dx.doi.org/10.1186/1471-2350-11-128] [PMID: 20807447]
[33]
Lee, J-K.; Choi, Y-L.; Kwon, M.; Park, P.J. Mechanisms and consequences of cancer genome instability: lessons from genome sequencing studies. Annu. Rev. Pathol., 2016, 11, 283-312.
[http://dx.doi.org/10.1146/annurev-pathol-012615-044446] [PMID: 26907526]
[34]
Huang, W.; Li, L.; Myers, J.R.; Marth, G.T. ART: a next-generation sequencing read simulator. Bioinformatics, 2012, 28(4), 593-594.
[http://dx.doi.org/10.1093/bioinformatics/btr708] [PMID: 22199392]
[35]
Li, H.; Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009, 25(14), 1754-1760.
[http://dx.doi.org/10.1093/bioinformatics/btp324] [PMID: 19451168]
[36]
Li, R.; Li, Y.; Fang, X.; Yang, H.; Wang, J.; Kristiansen, K.; Wang, J. SNP detection for massively parallel whole-genome resequencing. Genome Res., 2009, 19(6), 1124-1132.
[http://dx.doi.org/10.1101/gr.088013.108] [PMID: 19420381]
[37]
Koboldt, D.C.; Zhang, Q.; Larson, D.E.; Shen, D.; McLellan, M.D.; Lin, L.; Miller, C.A.; Mardis, E.R.; Ding, L.; Wilson, R.K. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res., 2012, 22(3), 568-576.
[http://dx.doi.org/10.1101/gr.129684.111] [PMID: 22300766]
[38]
Simola, D.F.; Kim, J. Sniper: improved SNP discovery by multiply mapping deep sequenced reads. Genome Biology, 2011, 12, R55-R55.
[39]
Zhu, X.; Peng, S.; Liu, S.; Cui, Y.; Gu, X.; Gao, M.; Fang, L.; Fang, X. A massively parallel computational method of reading index files for SOAPsnv. Interdiscip. Sci., 2015, 7(4), 397-404.
[http://dx.doi.org/10.1007/s12539-015-0123-x] [PMID: 26343781]
[40]
Jiang, Y.; Wang, Y.; Brudno, M. PRISM: pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants. Bioinformatics, 2012, 28(20), 2576-2583.
[http://dx.doi.org/10.1093/bioinformatics/bts484] [PMID: 22851530]
[41]
Rausch, T.; Zichner, T.; Schlattl, A.; Stütz, A.M.; Benes, V.; Korbel, J.O. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics, 2012, 28(18), i333-i339.
[http://dx.doi.org/10.1093/bioinformatics/bts378] [PMID: 22962449]
[42]
Liang, Y.; Qiu, K.; Liao, B.; Zhu, W.; Huang, X.; Li, L.; Chen, X.; Li, K. Seeksv: an accurate tool for somatic structural variation and virus integration detection. Bioinformatics, 2017, 33(2), 184-191.
[http://dx.doi.org/10.1093/bioinformatics/btw591] [PMID: 27634948]
[43]
Wang, J.; Mullighan, C.G.; Easton, J.; Roberts, S.; Heatley, S.L.; Ma, J.; Rusch, M.C.; Chen, K.; Harris, C.C.; Ding, L.; Holmfeldt, L.; Payne-Turner, D.; Fan, X.; Wei, L.; Zhao, D.; Obenauer, J.C.; Naeve, C.; Mardis, E.R.; Wilson, R.K.; Downing, J.R.; Zhang, J. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat. Methods, 2011, 8(8), 652-654.
[http://dx.doi.org/10.1038/nmeth.1628] [PMID: 21666668]
[44]
Layer, R.M.; Chiang, C.; Quinlan, A.R.; Hall, I.M. LUMPY: a probabilistic framework for structural variant discovery. Genome Biol., 2014, 15(6), R84.
[http://dx.doi.org/10.1186/gb-2014-15-6-r84] [PMID: 24970577]
[45]
Yamagata, K.; Yamanishi, A.; Kokubu, C.; Takeda, J.; Sese, J. COSMOS: accurate detection of somatic structural variations through asymmetric comparison between tumor and normal samples. Nucleic Acids Res., 2016, 44(8), e78-e78.
[http://dx.doi.org/10.1093/nar/gkw026] [PMID: 26833260]