LimROTS
: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust Analysis of Proteomics and Metabolomics Data
Source: R/LimROTS.r
LimROTS.Rd
LimROTS
: A Hybrid Method Integrating Empirical Bayes and
Reproducibility-Optimized Statistics for Robust Analysis of Proteomics and
Metabolomics Data
Usage
LimROTS(
x,
B = 1000,
K = NULL,
a1 = NULL,
a2 = NULL,
log = TRUE,
progress = FALSE,
verbose = TRUE,
meta.info = NULL,
cluster = NULL,
group.name = NULL,
formula.str = NULL,
seed.cl = 1234,
robust = TRUE,
trend = TRUE,
permutating.group = FALSE
)
Arguments
- x
A
SummarizedExperiment
object or a matrix where rows represent features (e.g., proteins, metabolites) and columns represent samples. The values should be log-transformed.- B
An integer representing the amount of bootstrap iterations. Default is 1000.
- K
An optional integer representing the top list size for ranking. If not specified, it is set to one-fourth of the number of features.
- a1
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.
- a2
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.
- log
Logical, indicating whether the data is already log-transformed. Default is
TRUE
.- progress
Logical, indicating whether to display a progress bar the function's execution. Default is
FALSE
.- verbose
Logical, indicating whether to display messages during the function's execution. Default is
TRUE
.- meta.info
A data frame containing sample-level metadata, where each row corresponds to a sample. It should include the grouping variable specified in
group.name
. Ifx
is aSummarizedExperiment
object,meta.info
must be a vector of the metadata needed for the model to run and can be retrieved usingcolData()
.- cluster
A parallel cluster object for distributed computation, e.g., created by
makeCluster()
. Default is 2.- group.name
A string specifying the column in
meta.info
that represents the groups or conditions for comparison.- formula.str
A formula string for modeling. It should include "~ 0 + ..." to exclude the intercept from the model. All the model parameters must be present in
meta.info
.- seed.cl
An integer specifying the seed for randomization; if not provided, the default is 1234.
- robust
indicating whether robust fitting should be used. Default is TRUE, see eBayes.
- trend
indicating whether to include trend fitting in the differential expression analysis. Default is TRUE. see eBayes.
- permutating.group
Logical, If
TRUE
, the permutation for calculating the null distribution is performed by permuting the target group only specified ingroup.name
Preserving all the other sample information. IfFALSE
, the entire sample information retrieved frommeta.info
will be permuted (recommended to be set to TRUE).
Value
An object of class "list"
with the following elements:
- data
The original data matrix.
- B
The number of bootstrap samples used.
- d
The optimized statistics for each feature.
- logfc
Log-fold change values between groups.
- pvalue
P-values computed based on the permutation samples.
- FDR
False discovery rate estimates.
- a1
Optimized parameter used in differential expression ranking.
- a2
Optimized parameter used in differential expression ranking.
- k
Top list size used for ranking.
- corrected.logfc
estimate of the log2-fold-change corresponding to the effect corrected by the s model see topTable.
- q_values
Estimated q-values using the
qvalue
package.- BH.pvalue
Benjamini-Hochberg adjusted p-values.
Details
The LimROTS approach initially uses limma package functionality to simulate the intensity data of proteins and metabolites. A linear model is subsequently fitted using the design matrix. Empirical Bayes variance shrinking is then implemented. To obtain the moderated t-statistics, the adjusted standard error \(SEpost = √(s2.post) \times unscaled SD\) for each feature is computed, along with the regression coefficient for each feature (indicating the impact of variations in the experimental settings). Then, by adapting a reproducibility-optimized technique known as ROTS to establish an optimality based on the largest overlap of top-ranked features within group-preserving bootstrap datasets, Finally based on the optimized parameters \(\alpha1\) and \(\alpha2\) this equation used to calculates the final statistics:
$$t_{\alpha_{(p)}} = \frac{\beta_{(p)}} {\alpha1 + \alpha2 \times SEpost_{(p)}}$$where \(t_{\alpha_{(p)}}\) is the final statistics for each feature, \(\beta_{(p)}\) is the coefficient, and \(SEpost_{(p)}\) is the the adjusted standard error. LimROTS generates p-values from permutation samples using the implementation available in qvalue package, along with internal implementation of FDR adapted from ROTS package. Additionally, the qvalue package is used to calculate q-values, were the proportion of true null p-values is set to the bootstrap method pi0est. We recommend using permutation-derived p-values and qvalues.
References
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47
Suomi T, Seyednasrollah F, Jaakkola M, Faux T, Elo L (2017). “ROTS: An R package for reproducibility-optimized statistical testing. ” PLoS computational biology, 13(5), e1005562. doi:10.1371/journal.pcbi.1005562 https://doi.org/10.1371/journal.pcbi.1005562, http://www.ncbi.nlm.nih.gov/pubmed/28542205
Elo LL, Filen S, Lahesmaa R, Aittokallio T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans Comput Biol Bioinform. 2008;5(3):423-431. doi:10.1109/tcbb.2007.1078
Examples
# Example usage:
data <- data.frame(matrix(rnorm(500), nrow = 100, ncol = 10))
# Simulated data
meta.info <- data.frame(
group = factor(rep(1:2, each = 5)),
row.names = colnames(data)
)
formula.str <- "~ 0 + group"
result <- LimROTS(data,
meta.info = meta.info, group.name = "group",
formula.str = formula.str, B = 10
)
#> No top list size K given, using 25
#> Bootstrapping samples
#>
|
| | 0%
#> No cluster found; only two cores will be used!
#> Optimizing parameters
#> Calculating p-values
#> Calculating FDR
#> Error in quantile.default(pi0, prob = 0.1): missing values and NaN's not allowed if 'na.rm' is FALSE