LimROTS_survival: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust survival analysis in Omics Data

LimROTS_survival(
  x,
  assay.type = NULL,
  niter = 1000,
  K = NULL,
  a1 = NULL,
  a2 = NULL,
  verbose = TRUE,
  meta.info,
  BPPARAM = NULL,
  formula.str,
  competing_risks = FALSE,
  correlation_block = NULL
)

Arguments

x

A SummarizedExperiment object, where rows represent features (e.g., proteins, metabolites) and columns represent samples. The values should be log-transformed.

assay.type

A character string or numeric index specifying the assay to use if x is a SummarizedExperiment. Default is NULL (uses the first assay).

niter

An integer representing the amount of bootstrap iterations. Default is 1000.

K

An optional integer representing the top list size for ranking. If not specified, it is set to one-fourth of the number of features.

a1

Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.

a2

Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.

verbose

Logical, indicating whether to display messages during the function's execution. Default is TRUE.

meta.info

a character vector of the metadata needed for the model to run and can be retrieved using colData().

BPPARAM

A BiocParallelParam object specifying the parallelization backend (e.g., MulticoreParam, SnowParam). The default depends on the operating system: if the user is on Windows, SnowParam(workers = 2) is used; otherwise, MulticoreParam(workers = 2).

formula.str

A formula string for modeling. It should include "~ 0 + ..." to exclude the intercept from the model. All the model parameters must be present in meta.info.

competing_risks

Logical. If TRUE, the function will fit a competing risks model using the crr function from the cmprsk package instead of the standard Cox proportional hazards model. Default is FALSE.

correlation_block

Character or NULL. The name of a column in meta.info that defines correlation blocks. Samples sharing the same value in this column are always resampled together as a unit. If NULL, the function behaves identically to bootstrapSamples_limRots.

Value

A SummarizedExperiment when x is a SummarizedExperiment (results added to rowData and metadata), or a list with the following elements:

data

The original data matrix.

niter

The number of bootstrap samples used.

statistics

The optimized statistics for each feature.

pvalue

P-values computed based on the permutation samples.

FDR

False discovery rate estimates.

a1

Optimized parameter used in survival ranking.

a2

Optimized parameter used in survival ranking.

k

Top list size used for ranking.

R

Reproducibility score for the optimized parameters.

Z

Z-score corresponding to the optimized parameters.

ztable

Table of z-scores across the parameter grid.

exp_coef

Exponentiated coefficient (hazard ratio estimate) from the Cox proportional hazards model for each feature.

q_values

Estimated q-values using the qvalue package.

BH.pvalue

Benjamini-Hochberg adjusted p-values.

null.statistics

The optimized null statistics for each feature.

Details

LimROTS_survival applies the reproducibility-optimized statistic framework to survival analysis. For each bootstrap resample, a Cox proportional hazards model (or a competing risks model via crr from cmprsk when competing_risks = TRUE) is fitted per feature using the supplied formula.str. The coefficient \(\beta_{(p)}\) and its standard error \(SE_{(p)}\) are extracted for each feature across bootstrap datasets.

Reproducibility-optimized parameters \(\alpha_1\) and \(\alpha_2\) are then selected by maximizing the overlap of top-ranked features across group-preserving bootstrap datasets, following the ROTS approach. The final statistic for each feature is:

$$t_{\alpha_{(p)}} = \frac{\beta_{(p)}} {\alpha_1 + \alpha_2 \times SE_{(p)}}$$

where \(t_{\alpha_{(p)}}\) is the final statistic, \(\beta_{(p)}\) is the Cox model coefficient, and \(SE_{(p)}\) is its standard error.

P-values are computed from permutation-based null distributions using empPvals from the qvalue package, alongside an internal FDR implementation adapted from the ROTS package. Q-values are estimated via qvalue with the proportion of true null p-values set using the bootstrap method pi0est.

This function processes a dataset using parallel computation. It leverages the BiocParallel framework to distribute tasks across multiple workers, which can significantly reduce runtime for large datasets.

References

Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47

Suomi T, Seyednasrollah F, Jaakkola M, Faux T, Elo L (2017). “ROTS: An R package for reproducibility-optimized statistical testing. ” PLoS computational biology, 13(5), e1005562. doi:10.1371/journal.pcbi.1005562 https://doi.org/10.1371/journal.pcbi.1005562, http://www.ncbi.nlm.nih.gov/pubmed/28542205

Elo LL, Filen S, Lahesmaa R, Aittokallio T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans Comput Biol Bioinform. 2008;5(3):423-431. doi:10.1109/tcbb.2007.1078

Examples

set.seed(123)
nsamples <- 20
nfeatures <- 50
sim_data <- matrix(rnorm(nfeatures * nsamples), nrow = nfeatures)
colnames(sim_data) <- paste0("sample", seq_len(nsamples))
rownames(sim_data) <- paste0("gene", seq_len(nfeatures))

meta_data <- data.frame(
    time = abs(rnorm(nsamples, mean = 5, sd = 2)),
    event = sample(0:1, nsamples, replace = TRUE),
    group = factor(rep(seq_len(2), each = nsamples / 2)),
    row.names = colnames(sim_data)
)

formula.str <- "~ Surv(time, event) + group"
result <- LimROTS_survival(
    x = sim_data,
    meta.info = meta_data,
    formula.str = formula.str,
    niter = 10,
    verbose = FALSE,
    competing_risks = FALSE
)
#> Sanity check completed successfully!
#> Using MulticoreParam (Unix-like OS) with two workers.