Skip to contents

LimROTS: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust Analysis of Proteomics and Metabolomics Data

Usage

LimROTS(
  x,
  B = 1000,
  K = NULL,
  a1 = NULL,
  a2 = NULL,
  log = TRUE,
  progress = FALSE,
  verbose = TRUE,
  meta.info = NULL,
  cluster = NULL,
  group.name = NULL,
  formula.str = NULL,
  seed.cl = 1234,
  robust = TRUE,
  trend = TRUE,
  permutating.group = FALSE
)

Arguments

x

A SummarizedExperiment object or a matrix where rows represent features (e.g., proteins, metabolites) and columns represent samples. The values should be log-transformed.

B

An integer representing the amount of bootstrap iterations. Default is 1000.

K

An optional integer representing the top list size for ranking. If not specified, it is set to one-fourth of the number of features.

a1

Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.

a2

Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.

log

Logical, indicating whether the data is already log-transformed. Default is TRUE.

progress

Logical, indicating whether to display a progress bar the function's execution. Default is FALSE.

verbose

Logical, indicating whether to display messages during the function's execution. Default is TRUE.

meta.info

A data frame containing sample-level metadata, where each row corresponds to a sample. It should include the grouping variable specified in group.name. If x is a SummarizedExperiment object, meta.info must be a vector of the metadata needed for the model to run and can be retrieved using colData().

cluster

A parallel cluster object for distributed computation, e.g., created by makeCluster(). Default is 2.

group.name

A string specifying the column in meta.info that represents the groups or conditions for comparison.

formula.str

A formula string for modeling. It should include "~ 0 + ..." to exclude the intercept from the model. All the model parameters must be present in meta.info.

seed.cl

An integer specifying the seed for randomization; if not provided, the default is 1234.

robust

indicating whether robust fitting should be used. Default is TRUE, see eBayes.

trend

indicating whether to include trend fitting in the differential expression analysis. Default is TRUE. see eBayes.

permutating.group

Logical, If TRUE, the permutation for calculating the null distribution is performed by permuting the target group only specified in group.name Preserving all the other sample information. If FALSE, the entire sample information retrieved from meta.info will be permuted (recommended to be set to TRUE).

Value

An object of class "list" with the following elements:

data

The original data matrix.

B

The number of bootstrap samples used.

d

The optimized statistics for each feature.

logfc

Log-fold change values between groups.

pvalue

P-values computed based on the permutation samples.

FDR

False discovery rate estimates.

a1

Optimized parameter used in differential expression ranking.

a2

Optimized parameter used in differential expression ranking.

k

Top list size used for ranking.

corrected.logfc

estimate of the log2-fold-change corresponding to the effect corrected by the s model see topTable.

q_values

Estimated q-values using the qvalue package.

BH.pvalue

Benjamini-Hochberg adjusted p-values.

Details

The LimROTS approach initially uses limma package functionality to simulate the intensity data of proteins and metabolites. A linear model is subsequently fitted using the design matrix. Empirical Bayes variance shrinking is then implemented. To obtain the moderated t-statistics, the adjusted standard error \(SEpost = √(s2.post) \times unscaled SD\) for each feature is computed, along with the regression coefficient for each feature (indicating the impact of variations in the experimental settings). Then, by adapting a reproducibility-optimized technique known as ROTS to establish an optimality based on the largest overlap of top-ranked features within group-preserving bootstrap datasets, Finally based on the optimized parameters \(\alpha1\) and \(\alpha2\) this equation used to calculates the final statistics:

$$t_{\alpha_{(p)}} = \frac{\beta_{(p)}} {\alpha1 + \alpha2 \times SEpost_{(p)}}$$where \(t_{\alpha_{(p)}}\) is the final statistics for each feature, \(\beta_{(p)}\) is the coefficient, and \(SEpost_{(p)}\) is the the adjusted standard error. LimROTS generates p-values from permutation samples using the implementation available in qvalue package, along with internal implementation of FDR adapted from ROTS package. Additionally, the qvalue package is used to calculate q-values, were the proportion of true null p-values is set to the bootstrap method pi0est. We recommend using permutation-derived p-values and qvalues.

References

Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47

Suomi T, Seyednasrollah F, Jaakkola M, Faux T, Elo L (2017). “ROTS: An R package for reproducibility-optimized statistical testing. ” PLoS computational biology, 13(5), e1005562. doi:10.1371/journal.pcbi.1005562 https://doi.org/10.1371/journal.pcbi.1005562, http://www.ncbi.nlm.nih.gov/pubmed/28542205

Elo LL, Filen S, Lahesmaa R, Aittokallio T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans Comput Biol Bioinform. 2008;5(3):423-431. doi:10.1109/tcbb.2007.1078

Examples

# Example usage:

data <- data.frame(matrix(rnorm(500), nrow = 100, ncol = 10))
# Simulated data
meta.info <- data.frame(
    group = factor(rep(1:2, each = 5)),
    row.names = colnames(data)
)
formula.str <- "~ 0 + group"
result <- LimROTS(data,
    meta.info = meta.info, group.name = "group",
    formula.str = formula.str, B = 10
)
#> No top list size K given, using 25
#> Bootstrapping samples
#> 
  |                                                                            
  |                                                                      |   0%
#> No cluster found; only two cores will be used!
#> Optimizing parameters
#> Calculating p-values
#> Calculating FDR
#> Error in quantile.default(pi0, prob = 0.1): missing values and NaN's not allowed if 'na.rm' is FALSE