Advanced Multivariable Statistical Analysis Interactive Tool for Handling Missing Data and Confounding Covariates for Label-free LC-MS Proteomics Experiments

Page: [440 - 447] Pages: 8

  • * (Excluding Mailing and Handling)

Abstract

Background: Careful consideration is required for detecting significant features (proteins or peptides) in LC-MS proteomics studies using multivariable regression analyses. In proteomics data, missing values can arise due to random errors, bad samples, features below the detection limit in specific samples, etc. Further, expression data are always prone to heterogeneity due to technical/biological reasons. Missing values and heterogeneity in proteomics studies can confound important findings. Moreover, there is additional information in these studies, such as pre-clinical and clinical information (e.g., sex, exposure, etc.), which can be used to supplement the inference.

Methods: We introduce a user-friendly web application SATP (Statistical Analysis interactive Tool for label-free LC-MS Proteomics experiments) for differential expression analysis of proteomics data that is scalable to large clinical proteomic studies. Appropriate normalization and imputation methods have been provided. Apart from these, several statistical tests such as t-test, moderated t-test, linear fixed effect model, and linear mixed model with adjustment of effect of extra covariates have also been provided for users' benefit.

Results: Our intuitive tool has several advantages over the existing ones, including an extension to multiple factor comparisons after adjusting for covariates.

Conclusion: This is a comprehensive tool for analysis of complex experiments with multiple covariates, whereas most of the existing tools were developed for comparing simple experiments mostly with two groups without covariates.

Availability: The tool can be accessed freely by the users from https://ulbbf.shinyapps.io/satp/.

[1]
Anderson NL, Anderson NG. Proteome and proteomics: New technologies, new concepts, and new words. Electrophoresis 1998; 19(11): 1853-61.
[http://dx.doi.org/10.1002/elps.1150191103] [PMID: 9740045]
[2]
Zhang G, Annan RS, Carr SA, Neubert TA. Overview of peptide and protein analysis by mass spectrometry. Curr Protoc Mol Biol 2014; 108: 1-30.
[http://dx.doi.org/10.1002/0471142727.mb1021s108]
[3]
Piehowski PD, Petyuk VA, Orton DJ, et al. Sources of technical variability in quantitative LC-MS proteomics: Human brain tissue sample analysis. J Proteome Res 2013; 12(5): 2128-37.
[http://dx.doi.org/10.1021/pr301146m] [PMID: 23495885]
[4]
Goeminne LJE, Gevaert K, Clement L. Experimental design and data-analysis in label-free quantitative LC/MS proteomics: A tutorial with MSqRob. J Proteomics 2018; 171: 23-36.
[http://dx.doi.org/10.1016/j.jprot.2017.04.004] [PMID: 28391044]
[5]
Wieczorek S, Combes F, Lazar C, et al. DAPAR & ProStaR: Software to perform statistical analyses in quantitative discovery proteomics. Bioinformatics 2017; 33(1): 135-6.
[http://dx.doi.org/10.1093/bioinformatics/btw580] [PMID: 27605098]
[6]
Glaab E, Schneider R. RepExplore: Addressing technical replicate variance in proteomics and metabolomics data analysis. Bioinformatics 2015; 31(13): 2235-7.
[http://dx.doi.org/10.1093/bioinformatics/btv127] [PMID: 25717197]
[7]
Choi M, Chang CY, Clough T, et al. MSstats: An R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments. Bioinformatics 2014; 30(17): 2524-6.
[http://dx.doi.org/10.1093/bioinformatics/btu305] [PMID: 24794931]
[8]
Polpitiya AD, Qian WJ, Jaitly N, et al. DAnTE: A statistical tool for quantitative analysis of -omics data. Bioinformatics 2008; 24(13): 1556-8.
[http://dx.doi.org/10.1093/bioinformatics/btn217] [PMID: 18453552]
[9]
Serang O, Käll L. Solution to statistical challenges in proteomics is more statistics, not less. J Proteome Res 2015; 14(10): 4099-103.
[http://dx.doi.org/10.1021/acs.jproteome.5b00568] [PMID: 26257019]
[10]
Webb-Robertson BJM, Wiberg HK, Matzke MM, et al. Review, evaluation, and discussion of the challenges of missing value imputation for mass spectrometry-based label-free global proteomics. J Proteome Res 2015; 14(5): 1993-2001.
[http://dx.doi.org/10.1021/pr501138h] [PMID: 25855118]
[11]
Rubin DB. Inference and missing data. Biometrika 1976; 63(3): 581-92.
[http://dx.doi.org/10.1093/biomet/63.3.581]
[12]
R Core Team. A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria. 2020. Available from: https://www.R-project.org/
[13]
Chang W, Cheng J, Allaire JJ, Sievert C, Schloerke B, Xie Y, et al. Shiny: Web Application Framework for R. R package version 1.7.0. 2021. Available from: https://CRAN.R-project.org/package=shiny
[14]
Lazar C. MputeLCMD: A collection of methods for left-censored missing data imputation. R package version 2.0. 2015. Available from: https://CRAN.R-project.org/package=imputeLCMD
[15]
Karpievitch YV, Dabney AR, Smith RD. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 2012; 13 (Suppl. 16): S5.
[http://dx.doi.org/10.1186/1471-2105-13-S16-S5] [PMID: 23176322]
[16]
Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19(2): 185-93.
[http://dx.doi.org/10.1093/bioinformatics/19.2.185] [PMID: 12538238]
[17]
Bolstad B. Preprocess Core: A collection of pre-processing functions. R package version 1.52.1. 2021. Available from: https://github.com/bmbolstad/preprocessCore
[18]
Huber W, von Heydebreck A, Sültmann H, Poustka A, Vingron M. Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 2002; 18 (Suppl. 1): S96-S104.
[http://dx.doi.org/10.1093/bioinformatics/18.suppl_1.S96] [PMID: 12169536]
[19]
Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015; 43(7): e47.
[http://dx.doi.org/10.1093/nar/gkv007] [PMID: 25605792]
[20]
Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann Appl Stat 2016; 10(2): 946-63.
[http://dx.doi.org/10.1214/16-AOAS920] [PMID: 28367255]