Recommendations for Bioinformatic Tools in lncRNA Research

Rebecca      Distefano; Mirolyuba      Ilieva; Sarah      Rennie; Shizuka      Uchida
Abstract

Long non-coding RNAs (lncRNAs) typically refer to non-protein coding RNAs that are longer than 200 nucleotides. Historically dismissed as junk DNA, over two decades of research have revealed that lncRNAs bind to other macromolecules (e.g., DNA, RNA, and/or proteins) to modulate signaling pathways and maintain organism viability. Their discovery has been significantly aided by the development of bioinformatics tools in recent years. However, the diversity of tools for lncRNA discovery and functional prediction can present a challenge for researchers, especially bench scientists and clinicians. This Perspective article aims to navigate the current landscape of bioinformatic tools suitable for both protein-coding and lncRNA genes. It aims to provide a guide for bench scientists and clinicians to select the appropriate tools for their research questions and experimental designs.
Keywords: Bioinformatics, gene expression, lncRNA, RNA-seq, screening, tools.
[1]
Palazzo AF, Koonin EV. Functional long non-coding RNAs Evolve from junk transcripts. Cell  2020; 183(5): 1151-61.
 [http://dx.doi.org/10.1016/j.cell.2020.09.047 ] [PMID: 33068526]

[2]
Miller HE, Ilieva M, Bishop AJR, Uchida S. Current status of epitranscriptomic marks affecting lncRNA structures and functions. Noncoding RNA  2022; 8(2): 23.
 [http://dx.doi.org/10.3390/ncrna8020023 ] [PMID: 35447886]

[3]
Statello L, Guo CJ, Chen LL, Huarte M. Gene regulation by long non-coding RNAs and its biological functions. Nat Rev Mol Cell Biol  2021; 22(2): 96-118.
 [http://dx.doi.org/10.1038/s41580-020-00315-9 ] [PMID: 33353982]

[4]
Lee H, Zhang Z, Krause HM. Long noncoding RNAs and repetitive elements: Junk or intimate evolutionary partners? Trends Genet  2019; 35(12): 892-902.
 [http://dx.doi.org/10.1016/j.tig.2019.09.006 ] [PMID: 31662190]

[5]
Shabalina SA, Spiridonov NA. The mammalian transcriptome and the function of non-coding DNA sequences. Genome Biol  2004; 5(4): 105.
 [http://dx.doi.org/10.1186/gb-2004-5-4-105 ] [PMID: 15059247]

[6]
Ezkurdia I, Juan D, Rodriguez JM, et al. Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes. Hum Mol Genet  2014; 23(22): 5866-78.
 [http://dx.doi.org/10.1093/hmg/ddu309 ] [PMID: 24939910]

[7]
Zhao L, Wang J, Li Y, et al. NONCODEV6: an updated database dedicated to long non-coding RNA annotation in both animals and plants. Nucleic Acids Res  2021; 49(D1): D165-71.
 [http://dx.doi.org/10.1093/nar/gkaa1046 ] [PMID: 33196801]

[8]
Snyder M, Iraola-Guzmán S, Saus E, Gabaldón T. Discovery and validation of clinically relevant long non-coding RNAs in colorectal cancer. Cancers (Basel)  2022; 14(16): 3866.
 [http://dx.doi.org/10.3390/cancers14163866 ] [PMID: 36010859]

[9]
Chakraborty C, Sharma AR, Sharma G, Lee SS. Therapeutic advances of miRNAs: A preclinical and clinical update. J Adv Res  2021; 28: 127-38.
 [http://dx.doi.org/10.1016/j.jare.2020.08.012 ] [PMID: 33364050]

[10]
Liang L, He X. A narrative review of microRNA therapeutics: understanding the future of microRNA research. Precis Cancer Med  2021; 4: 33.
 [http://dx.doi.org/10.21037/pcm-21-28]

[11]
Ponting CP, Haerty W. Genome-wide analysis of human long noncoding RNAs: A provocative review. Annu Rev Genomics Hum Genet  2022; 23(1): 153-72.
 [http://dx.doi.org/10.1146/annurev-genom-112921-123710 ] [PMID: 35395170]

[12]
Chen Y, Li Z, Chen X, Zhang S. Long non-coding RNAs: From disease code to drug role. Acta Pharm Sin B  2021; 11(2): 340-54.
 [http://dx.doi.org/10.1016/j.apsb.2020.10.001 ] [PMID: 33643816]

[13]
Galaxy C. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res  2022; 50(W1): W345-51.

[14]
Quinn TP, Crowley TM, Richardson MF. Benchmarking differential expression analysis tools for RNA-Seq: normalization-based vs. log-ratio transformation-based methods. BMC Bioinformatics  2018; 19(1): 274.
 [http://dx.doi.org/10.1186/s12859-018-2261-8 ] [PMID: 30021534]

[15]
Teng M, Love MI, Davis CA, et al. A benchmark for RNA-seq quantification pipelines. Genome Biol  2016; 17(1): 74.
 [http://dx.doi.org/10.1186/s13059-016-0940-1 ] [PMID: 27107712]

[16]
Han H, Men K. How does normalization impact RNA-seq disease diagnosis? J Biomed Inform  2018; 85: 80-92.
 [http://dx.doi.org/10.1016/j.jbi.2018.07.016 ] [PMID: 30041017]

[17]
Baruzzo G, Hayer KE, Kim EJ, Di Camillo B, FitzGerald GA, Grant GR. Simulation-based comprehensive benchmarking of RNA-seq aligners. Nat Methods  2017; 14(2): 135-9.
 [http://dx.doi.org/10.1038/nmeth.4106 ] [PMID: 27941783]

[18]
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics  2018; 34(17): i884-90.
 [http://dx.doi.org/10.1093/bioinformatics/bty560 ] [PMID: 30423086]

[19]
A quality control tool for high throughput sequence data.  Available from: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

[20]
Dobin A, Davis CA, Schlesinger F, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics  2013; 29(1): 15-21.
 [http://dx.doi.org/10.1093/bioinformatics/bts635 ] [PMID: 23104886]

[21]
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat Biotechnol  2019; 37(8): 907-15.
 [http://dx.doi.org/10.1038/s41587-019-0201-4 ] [PMID: 31375807]

[22]
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol  2013; 14(4): R36.
 [http://dx.doi.org/10.1186/gb-2013-14-4-r36 ] [PMID: 23618408]

[23]
Cunningham F, Allen JE, Allen J, et al. Ensembl 2022. Nucleic Acids Res  2022; 50(D1): D988-95.
 [http://dx.doi.org/10.1093/nar/gkab1049 ] [PMID: 34791404]

[24]
Weirick T, Militello G, Uchida S. Long non-coding RNAs in endothelial biology. Front Physiol  2018; 9: 522.
 [http://dx.doi.org/10.3389/fphys.2018.00522 ] [PMID: 29867565]

[25]
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics  2010; 26(1): 139-40.
 [http://dx.doi.org/10.1093/bioinformatics/btp616 ] [PMID: 19910308]

[26]
Lawrence M, Huber W, Pagès H, et al. Software for computing and annotating genomic ranges. PLOS Comput Biol  2013; 9(8): e1003118.
 [http://dx.doi.org/10.1371/journal.pcbi.1003118 ] [PMID: 23950696]

[27]
Ginestet C. Elegant graphics for data analysis. Jroyal stat soc ser A  2011; 174: 245-5.

[28]
Distefano R, Ilieva M, Madsen JH, Uchida S, Crohn DB. CrohnDB: A web database for expression profiling of protein-coding and long non-coding RNA genes in crohn disease. Computation (Basel)  2023; 11(6): 105.
 [http://dx.doi.org/10.3390/computation11060105]

[29]
Distefano R, Ilieva M, Madsen JH, et al. T2DB: A web database for long non-coding RNA genes in type II diabetes. Noncoding RNA  2023; 9(3): 30.
 [http://dx.doi.org/10.3390/ncrna9030030 ] [PMID: 37218990]

[30]
Ilieva M, Dao J, Miller HE, et al. Systematic analysis of long non-coding RNA genes in nonalcoholic fatty liver disease. Noncoding RNA  2022; 8(4): 56.
 [http://dx.doi.org/10.3390/ncrna8040056 ] [PMID: 35893239]

[31]
Ilieva M, Miller HE, Agarwal A, et al. FibroDB: Expression analysis of protein-coding and long non-coding RNA genes in fibrosis. Noncoding RNA  2022; 8(1): 13.
 [http://dx.doi.org/10.3390/ncrna8010013 ] [PMID: 35202087]

[32]
Zhao S, Ye Z, Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA  2020; 26(8): 903-9.
 [http://dx.doi.org/10.1261/rna.074922.120 ] [PMID: 32284352]

[33]
Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics  2012; 28(8): 1086-92.
 [http://dx.doi.org/10.1093/bioinformatics/bts094 ] [PMID: 22368243]

[34]
Xie Y, Wu G, Tang J, et al. SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics  2014; 30(12): 1660-6.
 [http://dx.doi.org/10.1093/bioinformatics/btu077 ] [PMID: 24532719]

[35]
Shumate A, Wong B, Pertea G, Pertea M. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLOS Comput Biol  2022; 18(6): e1009730.
 [http://dx.doi.org/10.1371/journal.pcbi.1009730 ] [PMID: 35648784]

[36]
Grabherr MG, Haas BJ, Yassour M, et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol  2011; 29(7): 644-52.
 [http://dx.doi.org/10.1038/nbt.1883 ] [PMID: 21572440]

[37]
Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform  2022; 23(2): bbab563.
 [http://dx.doi.org/10.1093/bib/bbab563 ] [PMID: 35076693]

[38]
Hölzer M, Marz M. De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience  2019; 8(5): giz039.
 [http://dx.doi.org/10.1093/gigascience/giz039 ] [PMID: 31077315]

[39]
Yang L, Duff MO, Graveley BR, Carmichael GG, Chen LL. Genomewide characterization of non-polyadenylated RNAs. Genome Biol  2011; 12(2): R16.
 [http://dx.doi.org/10.1186/gb-2011-12-2-r16 ] [PMID: 21324177]

[40]
Zhang Y, Yang L, Chen LL. Life without A tail: New formats of long noncoding RNAs. Int J Biochem Cell Biol  2014; 54: 338-49.
 [http://dx.doi.org/10.1016/j.biocel.2013.10.009 ] [PMID: 24513732]

[41]
Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc  2009; 4(1): 44-57.
 [http://dx.doi.org/10.1038/nprot.2008.211 ] [PMID: 19131956]

[42]
Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res  2009; 37(1): 1-13.
 [http://dx.doi.org/10.1093/nar/gkn923 ] [PMID: 19033363]

[43]
Kolberg L, Raudvere U, Kuzmin I, Vilo J, Peterson H. gprofiler2 -- an R package for gene list functional enrichment analysis and namespace conversion toolset g:Profiler. F1000 Res  2020; 9: ELIXIR-709.

[44]
Zhou G, Soufan O, Ewald J, Hancock REW, Basu N, Xia J. NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis. Nucleic Acids Res  2019; 47(W1): W234-41.
 [http://dx.doi.org/10.1093/nar/gkz240 ] [PMID: 30931480]

[45]
Shannon P, Markiel A, Ozier O, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res  2003; 13(11): 2498-504.
 [http://dx.doi.org/10.1101/gr.1239303 ] [PMID: 14597658]

[46]
Kent WJ, Sugnet CW, Furey TS, et al. The human genome browser at UCSC. Genome Res  2002; 12(6): 996-1006.
 [http://dx.doi.org/10.1101/gr.229102 ] [PMID: 12045153]

[47]
Jonas K, Calin GA, Pichler M. RNA-binding proteins as important regulators of long non-coding RNAs in cancer. Int J Mol Sci  2020; 21(8): 2969.
 [http://dx.doi.org/10.3390/ijms21082969 ] [PMID: 32340118]

[48]
Yao ZT, Yang YM, Sun MM, et al. New insights into the interplay between long non‐coding RNAs and RNA‐binding proteins in cancer. Cancer Commun (Lond)  2022; 42(2): 117-40.
 [http://dx.doi.org/10.1002/cac2.12254 ] [PMID: 35019235]

[49]
López-Urrutia E, Bustamante Montes LP, Ladrón de Guevara Cervantes D, Pérez-Plasencia C, Campos-Parra AD. Crosstalk Between long non-coding RNAs, micro-RNAs and mRNAs: Deciphering molecular mechanisms of master regulators in cancer. Front Oncol  2019; 9: 669.
 [http://dx.doi.org/10.3389/fonc.2019.00669 ] [PMID: 31404273]

[50]
Furió-Tarí P, Tarazona S, Gabaldón T, Enright AJ, Conesa A. spongeScan: A web for detecting microRNA binding elements in lncRNA sequences. Nucleic Acids Res  2016; 44(W1): W176-80.
 [http://dx.doi.org/10.1093/nar/gkw443 ] [PMID: 27198221]

[51]
Militello G, Weirick T, John D, Döring C, Dimmeler S, Uchida S. Screening and validation of lncRNAs and circRNAs as miRNA sponges. Brief Bioinform  2017; 18(5): 780-8.
 [PMID: 27373735]

[52]
Bugnon LA, Edera AA, Prochetto S, et al. Secondary structure prediction of long noncoding RNA: review and experimental comparison of existing approaches. Brief Bioinform  2022; 23(4): bbac205.
 [http://dx.doi.org/10.1093/bib/bbac205 ] [PMID: 35692094]

[53]
Chillón I, Marcia M. The molecular structure of long non-coding RNAs: emerging patterns and functional implications. Crit Rev Biochem Mol Biol  2020; 55(6): 662-90.
 [http://dx.doi.org/10.1080/10409238.2020.1828259 ] [PMID: 33043695]

[54]
Vicens Q, Kieft JS. Thoughts on how to think (and talk) about RNA structure. Proc Natl Acad Sci USA  2022; 119(17): e2112677119.
 [http://dx.doi.org/10.1073/pnas.2112677119 ] [PMID: 35439059]

[55]
Schroeder R, Barta A, Semrad K. Strategies for RNA folding and assembly. Nat Rev Mol Cell Biol  2004; 5(11): 908-19.
 [http://dx.doi.org/10.1038/nrm1497 ] [PMID: 15520810]

[56]
Li Y, Sun H, Feng S, Zhang Q, Han S, Du W. Capsule-LPI: a LncRNA–protein interaction predicting tool based on a capsule network. BMC Bioinformatics  2021; 22(1): 246.
 [http://dx.doi.org/10.1186/s12859-021-04171-y ] [PMID: 33985444]

[57]
Peng L, Liu F, Yang J, et al. Probing lncRNA–Protein Interactions: Data Repositories, Models, and Algorithms. Front Genet  2020; 10: 1346.
 [http://dx.doi.org/10.3389/fgene.2019.01346 ] [PMID: 32082358]

[58]
Pinkney HR, Wright BM, Diermeier SD. The lncRNA toolkit: databases and in silico tools for lncRNA analysis. Noncoding RNA  2020; 6(4): 49.
 [http://dx.doi.org/10.3390/ncrna6040049 ] [PMID: 33339309]

[59]
Rincón-Riveros A, Morales D, Rodríguez JA, Villegas VE, López-Kleine L. Bioinformatic tools for the analysis and prediction of ncRNA interactions. Int J Mol Sci  2021; 22(21): 11397.
 [http://dx.doi.org/10.3390/ijms222111397 ] [PMID: 34768830]

[60]
Sun S, Yang J, Zhang Z. RNALigands: a database and web server for RNA–ligand interactions. RNA  2022; 28(2): 115-22.
 [http://dx.doi.org/10.1261/rna.078889.121 ] [PMID: 34732566]

[61]
Berman H, Henrick K, Nakamura H. Announcing the worldwide Protein Data Bank. Nat Struct Mol Biol  2003; 10(12): 980.
 [http://dx.doi.org/10.1038/nsb1203-980 ] [PMID: 14634627]

[62]
Morgan BS, Sanaba BG, Donlic A, et al. R-BIND: An interactive database for exploring and developing RNA-targeted chemical probes. ACS Chem Biol  2019; 14(12): 2691-700.
 [http://dx.doi.org/10.1021/acschembio.9b00631 ] [PMID: 31589399]

[63]
Kalvari I, Nawrocki EP, Ontiveros-Palacios N, et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res  2021; 49(D1): D192-200.
 [http://dx.doi.org/10.1093/nar/gkaa1047 ] [PMID: 33211869]

[64]
Li Z, Liu L, Feng C, et al. LncBook 2.0: integrating human long non-coding RNAs with multi-omics annotations. Nucleic Acids Res  2023; 51(D1): D186-91.
 [http://dx.doi.org/10.1093/nar/gkac999 ] [PMID: 36330950]

[65]
Sweeney BA, Petrov AI, Burkov B, et al. RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res  2019; 47(D1): D221-9.
 [http://dx.doi.org/10.1093/nar/gky1034 ] [PMID: 30395267]

[66]
Stelzer G, Rosen N, Plaschkes I, et al. The genecards suite: From gene data mining to disease genome sequence analyses. Curr Protoc Bioinformatics  2016; 54: 30-3.

[67]
Weirick T, Militello G, Ponomareva Y, et al. Logic programming to infer complex RNA expression patterns from RNA-seq data. Brief Bioinform  2018; 19(2): 199-209.
 [PMID: 28011754]

[68]
Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J  2011; 17(1): 3.
 [http://dx.doi.org/10.14806/ej.17.1.200]

[69]
Schmieder R, Edwards R. Quality control and preprocessing of metagenomic datasets. Bioinformatics  2011; 27(6): 863-4.
 [http://dx.doi.org/10.1093/bioinformatics/btr026 ] [PMID: 21278185]

[70]
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics  2014; 30(15): 2114-20.
 [http://dx.doi.org/10.1093/bioinformatics/btu170 ] [PMID: 24695404]

[71]
Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol  2016; 34(5): 525-7.
 [http://dx.doi.org/10.1038/nbt.3519 ] [PMID: 27043002]

[72]
Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics  2009; 25(9): 1105-11.
 [http://dx.doi.org/10.1093/bioinformatics/btp120 ] [PMID: 19289445]

[73]
Martin J, Bruno VM, Fang Z, et al. Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics  2010; 11(1): 663.
 [http://dx.doi.org/10.1186/1471-2164-11-663 ] [PMID: 21106091]

[74]
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol  2014; 15(12): 550.
 [http://dx.doi.org/10.1186/s13059-014-0550-8 ] [PMID: 25516281]

[75]
Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res  2015; 43(7): e47.
 [http://dx.doi.org/10.1093/nar/gkv007 ] [PMID: 25605792]

[76]
Wu T, Hu E, Xu S, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation  2021; 2(3): 100141.
 [http://dx.doi.org/10.1016/j.xinn.2021.100141 ] [PMID: 34557778]

[77]
Sergushichev AA. An algorithm for fast preranked gene set enrichment analysis using cumulative statistic calculation. bioRxiv  2016; 060012.

[78]
Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA  2005; 102(43): 15545-50.
 [http://dx.doi.org/10.1073/pnas.0506580102 ] [PMID: 16199517]

[79]
Baik B, Yoon S, Nam D. Benchmarking RNA-seq differential expression analysis methods using spike-in and simulation data. PLoS One  2020; 15(4): e0232271.
 [http://dx.doi.org/10.1371/journal.pone.0232271 ] [PMID: 32353015]

[80]
Łabaj PP, Kreil DP. Sensitivity, specificity, and reproducibility of RNA-Seq differential expression calls. Biol Direct  2016; 11(1): 66.
 [http://dx.doi.org/10.1186/s13062-016-0169-7] [PMID: 27993156]

[81]
Bushmanova E, Antipov D, Lapidus A, Suvorov V, Prjibelski AD. rnaQUAST: a quality assessment tool for de novo transcriptome assemblies. Bioinformatics  2016; 32(14): 2210-2.
 [http://dx.doi.org/10.1093/bioinformatics/btw218 ] [PMID: 27153654]

[82]
Chandramohan R, Wu PY, Phan JH, Wang MD. Benchmarking RNA-Seq quantification tools. Annu Int Conf IEEE Eng Med Biol Soc  2013; 2013: 647-50.
 [PMID: 24109770]

[83]
Conesa A, Madrigal P, Tarazona S, et al. A survey of best practices for RNA-seq data analysis. Genome Biol  2016; 17(1): 13.
 [http://dx.doi.org/10.1186/s13059-016-0881-8 ] [PMID: 26813401]

[84]
Moreton J, Izquierdo A, Emes RD. Assembly, Assessment, and availability of de novo generated eukaryotic transcriptomes. Front Genet  2016; 6: 361.
 [http://dx.doi.org/10.3389/fgene.2015.00361 ] [PMID: 26793234]

[85]
Ilieva M, Uchida S. Perspectives of LncRNAs for therapy. Cell Biol Toxicol  2022; 38(6): 915-7.
 [http://dx.doi.org/10.1007/s10565-022-09779-1 ] [PMID: 36399196]

[86]
Pan J, Wang R, Shang F, Ma R, Rong Y, Zhang Y. Functional micropeptides encoded by long non-coding RNAs: A comprehensive review. Front Mol Biosci  2022; 9: 817517.
 [http://dx.doi.org/10.3389/fmolb.2022.817517 ] [PMID: 35769907]
Cite As
Current Bioinformatics

Recommendations for Bioinformatic Tools in lncRNA Research

Abstract