LimROTS
: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust Analysis of Proteomics and Metabolomics DataR/LimROTS.r
LimROTS.Rd
LimROTS
: A Hybrid Method Integrating Empirical Bayes and
Reproducibility-Optimized Statistics for Robust Analysis of Proteomics and
Metabolomics Data
LimROTS(
x,
niter = 1000,
K = NULL,
a1 = NULL,
a2 = NULL,
log = TRUE,
verbose = TRUE,
meta.info,
BPPARAM = NULL,
group.name,
formula.str,
robust = TRUE,
trend = TRUE,
permutating.group = FALSE
)
A SummarizedExperiment
object, where rows
represent features (e.g., proteins, metabolites) and columns
represent samples.
The values should be log-transformed.
An integer representing the amount of bootstrap iterations. Default is 1000.
An optional integer representing the top list size for ranking. If not specified, it is set to one-fourth of the number of features.
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.
Logical, indicating whether the data is already log-transformed.
Default is TRUE
.
Logical, indicating whether to display messages during the
function's execution. Default is TRUE
.
a character vector of the metadata needed for the
model to run and can be retrieved using colData()
.
A BiocParallelParam
object specifying the
parallelization backend (e.g., MulticoreParam
, SnowParam
).
The default depends on the operating system: if the user is on Windows,
SnowParam(workers = 2)
is used; otherwise,
MulticoreParam(workers = 2)
.
A string specifying the column in meta.info
that
represents the groups or conditions for comparison.
A formula string for modeling.
It should include "~ 0 + ..." to exclude the intercept from the model.
All the model parameters must be present in meta.info
.
indicating whether robust fitting should be used. Default is TRUE, see eBayes.
indicating whether to include trend fitting in the differential expression analysis. Default is TRUE. see eBayes.
Logical, If TRUE
, the permutation for
calculating the null distribution is performed by permuting the target group
only specified in group.name
Preserving all the other sample
information. If FALSE
, the entire sample information retrieved from
meta.info
will be permuted (recommended to be set to FALSE).
An object of class "SummarizedExperiment"
with the
following elements:
The original data matrix.
The number of bootstrap samples used.
The optimized statistics for each feature.
Log-fold change values between groups.
P-values computed based on the permutation samples.
False discovery rate estimates.
Optimized parameter used in differential expression ranking.
Optimized parameter used in differential expression ranking.
Top list size used for ranking.
estimate of the log2-fold-change corresponding to the effect corrected by the s model see topTable.
Estimated q-values using the qvalue
package.
Benjamini-Hochberg adjusted p-values.
The optimized null statistics for each feature.
The LimROTS approach initially uses limma package functionality to simulate the intensity data of proteins and metabolites. A linear model is subsequently fitted using the design matrix. Empirical Bayes variance shrinking is then implemented. To obtain the moderated t-statistics, the adjusted standard error \(SE_{post} = \sqrt{s^2_{\text{post}} } \times unscaled SD\) for each feature is computed, along with the regression coefficient for each feature (indicating the impact of variations in the experimental settings). Then, by adapting a reproducibility-optimized technique known as ROTS to establish an optimality based on the largest overlap of top-ranked features within group-preserving bootstrap datasets, Finally based on the optimized parameters \(\alpha1\) and \(\alpha2\) this equation used to calculates the final statistics:
$$t_{\alpha_{(p)}} = \frac{\beta_{(p)}} {\alpha1 + \alpha2 \times SEpost_{(p)}}$$where \(t_{\alpha_{(p)}}\) is the final statistics for each feature, \(\beta_{(p)}\) is the coefficient, and \(SEpost_{(p)}\) is the the adjusted standard error. LimROTS generates p-values from permutation samples using the implementation available in qvalue package, along with internal implementation of FDR adapted from ROTS package. Additionally, the qvalue package is used to calculate q-values, were the proportion of true null p-values is set to the bootstrap method pi0est. We recommend using permutation-derived p-values and qvalues.
This function processes a dataset using parallel computation. It leverages the BiocParallel framework to distribute tasks across multiple workers, which can significantly reduce runtime for large datasets.
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47
Suomi T, Seyednasrollah F, Jaakkola M, Faux T, Elo L (2017). “ROTS: An R package for reproducibility-optimized statistical testing. ” PLoS computational biology, 13(5), e1005562. doi:10.1371/journal.pcbi.1005562 https://doi.org/10.1371/journal.pcbi.1005562, http://www.ncbi.nlm.nih.gov/pubmed/28542205
Elo LL, Filen S, Lahesmaa R, Aittokallio T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans Comput Biol Bioinform. 2008;5(3):423-431. doi:10.1109/tcbb.2007.1078
# Example usage:
data <- data.frame(matrix(rnorm(500), nrow = 100, ncol = 10))
# Simulated data
meta.info <- data.frame(
group = factor(rep(1:2, each = 5)),
row.names = colnames(data)
)
formula.str <- "~ 0 + group"
result <- LimROTS(data,
meta.info = meta.info, group.name = "group",
formula.str = formula.str, niter = 10
)
#> No top list size K given, using 25
#> Initiating limma on bootstrapped samples
#> Using MulticoreParam (Unix-like OS) with two workers.
#> Optimizing a1 and a2
#> Computing p-values and FDR
#> qvalue() failed (return NULL): missing values and NaN's not allowed if 'na.rm' is FALSE