LimROTS: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust Differential Expression Analysis

LimROTS(
  x,
  assay.type = NULL,
  niter = 1000,
  K = NULL,
  a1 = NULL,
  a2 = NULL,
  log = TRUE,
  verbose = TRUE,
  meta.info,
  BPPARAM = NULL,
  group,
  formula.str,
  robust = TRUE,
  trend = TRUE,
  permutating.group = FALSE,
  correlation_block = NULL
)

Arguments

x

A SummarizedExperiment object, where rows represent features (e.g., proteins, metabolites) and columns represent samples. The values should be log-transformed.

assay.type

A character string or numeric index specifying the assay to use if x is a SummarizedExperiment. Default is NULL

niter

An integer representing the amount of bootstrap iterations. Default is 1000.

K

An optional integer representing the top list size for ranking. If not specified, it is set to one-fourth of the number of features.

a1

Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.

a2

Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.

log

Logical, indicating whether the data is already log-transformed. Default is TRUE.

verbose

Logical, indicating whether to display messages during the function's execution. Default is TRUE.

meta.info

a character vector of the metadata needed for the model to run and can be retrieved using colData().

BPPARAM

A BiocParallelParam object specifying the parallelization backend (e.g., MulticoreParam, SnowParam). The default depends on the operating system: if the user is on Windows, SnowParam(workers = 2) is used; otherwise, MulticoreParam(workers = 2).

group

A string specifying the column in meta.info that represents the groups or conditions for comparison. group should be retrieved using colData() as factor.

formula.str

A formula string for modeling. It should include "~ 0 + ..." to exclude the intercept from the model. All the model parameters must be present in meta.info.

robust

indicating whether robust fitting should be used. Default is TRUE, see eBayes.

trend

indicating whether to include trend fitting in the differential expression analysis. Default is TRUE. see eBayes.

permutating.group

Logical, If TRUE, the permutation for calculating the null distribution is performed by permuting the target group only specified in group Preserving all the other sample information. If FALSE, the entire sample information retrieved from meta.info will be permuted (recommended to be set to FALSE).

correlation_block

Character or NULL. The name of a column in meta.info that defines correlation blocks. Samples sharing the same value in this column are always resampled together as a unit, and within-block correlation is accounted for during model fitting via duplicateCorrelation. If NULL, standard independent resampling is used.

Value

An object of class "SummarizedExperiment" with the following elements:

data

The original data matrix.

niter

The number of bootstrap samples used.

statistics

The optimized statistics for each feature.

pvalue

P-values computed based on the permutation samples.

FDR

False discovery rate estimates.

a1

Optimized parameter used in differential expression ranking.

a2

Optimized parameter used in differential expression ranking.

k

Top list size used for ranking.

corrected.logfc

estimate of the log2-fold-change corresponding to the effect corrected by the s model see topTable.

q_values

Estimated q-values using the qvalue package.

BH.pvalue

Benjamini-Hochberg adjusted p-values.

null.statistics

The optimized null statistics for each feature.

Details

The LimROTS approach initially uses limma package functionality to simulate the intensity data of proteins and metabolites. A linear model is subsequently fitted using the design matrix. Empirical Bayes variance shrinking is then implemented. To obtain the moderated t-statistics, the adjusted standard error \(SE_{post} = \sqrt{s^2_{\text{post}} } \times unscaled SD\) for each feature is computed, along with the regression coefficient for each feature (indicating the impact of variations in the experimental settings). Then, by adapting a reproducibility-optimized technique known as ROTS to establish an optimality based on the largest overlap of top-ranked features within group-preserving bootstrap datasets, Finally based on the optimized parameters \(\alpha1\) and \(\alpha2\) this equation used to calculates the final statistics:

$$t_{\alpha_{(p)}} = \frac{\beta_{(p)}} {\alpha1 + \alpha2 \times SEpost_{(p)}}$$where \(t_{\alpha_{(p)}}\) is the final statistics for each feature, \(\beta_{(p)}\) is the coefficient, and \(SEpost_{(p)}\) is the the adjusted standard error. LimROTS generates p-values from permutation samples using the implementation available in qvalue package, along with internal implementation of FDR adapted from ROTS package. Additionally, the qvalue package is used to calculate q-values, were the proportion of true null p-values is set to the bootstrap method pi0est.

This function processes a dataset using parallel computation. It leverages the BiocParallel framework to distribute tasks across multiple workers, which can significantly reduce runtime for large datasets.

References

Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47

Suomi T, Seyednasrollah F, Jaakkola M, Faux T, Elo L (2017). “ROTS: An R package for reproducibility-optimized statistical testing. ” PLoS computational biology, 13(5), e1005562. doi:10.1371/journal.pcbi.1005562 https://doi.org/10.1371/journal.pcbi.1005562, http://www.ncbi.nlm.nih.gov/pubmed/28542205

Elo LL, Filen S, Lahesmaa R, Aittokallio T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans Comput Biol Bioinform. 2008;5(3):423-431. doi:10.1109/tcbb.2007.1078

Examples

# Example usage:

data <- data.frame(matrix(rnorm(500), nrow = 100, ncol = 10))
# Simulated data
meta.info <- data.frame(
    group = factor(rep(1:2, each = 5)),
    row.names = colnames(data)
)
formula.str <- "~ 0 + group"
result <- LimROTS(data,
    meta.info = meta.info, group = "group",
    formula.str = formula.str, niter = 10
)
#> Group Level:  1 & Group Level:  2
#> No top list size K given, using 25
#> Initiating limma on bootstrapped samples
#> Using MulticoreParam (Unix-like OS) with two workers.
#> Optimizing a1 and a2
#> Computing p-values and FDR