LimROTS_survival: A Hybrid Method Integrating Empirical Bayes and Reproducibility-Optimized Statistics for Robust survival analysis in Omics DataR/LimROTS_survival.R
LimROTS_survival.RdLimROTS_survival: A Hybrid Method Integrating Empirical Bayes and
Reproducibility-Optimized Statistics for Robust
survival analysis in Omics Data
LimROTS_survival(
x,
assay.type = NULL,
niter = 1000,
K = NULL,
a1 = NULL,
a2 = NULL,
verbose = TRUE,
meta.info,
BPPARAM = NULL,
formula.str,
competing_risks = FALSE,
correlation_block = NULL
)A SummarizedExperiment object, where rows
represent features (e.g., proteins, metabolites) and columns
represent samples.
The values should be log-transformed.
A character string or numeric index specifying the assay
to use if x is a SummarizedExperiment. Default is NULL
(uses the first assay).
An integer representing the amount of bootstrap iterations. Default is 1000.
An optional integer representing the top list size for ranking. If not specified, it is set to one-fourth of the number of features.
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.
Optional numeric value used in the optimization process. If defined by the user, no optimization occurs.
Logical, indicating whether to display messages during the
function's execution. Default is TRUE.
a character vector of the metadata needed for the
model to run and can be retrieved using colData().
A BiocParallelParam object specifying the
parallelization backend (e.g., MulticoreParam, SnowParam).
The default depends on the operating system: if the user is on Windows,
SnowParam(workers = 2) is used; otherwise,
MulticoreParam(workers = 2).
A formula string for modeling.
It should include "~ 0 + ..." to exclude the intercept from the model.
All the model parameters must be present in meta.info.
Logical. If TRUE, the function will fit a
competing risks model using the crr function from the
cmprsk package instead of the standard Cox proportional hazards
model. Default is FALSE.
Character or NULL. The name of a column in
meta.info that defines correlation blocks. Samples sharing the same
value in this column are always resampled together as a unit. If NULL,
the function behaves identically to bootstrapSamples_limRots.
A SummarizedExperiment when x is a
SummarizedExperiment (results added to rowData and
metadata), or a list with the following elements:
The original data matrix.
The number of bootstrap samples used.
The optimized statistics for each feature.
P-values computed based on the permutation samples.
False discovery rate estimates.
Optimized parameter used in survival ranking.
Optimized parameter used in survival ranking.
Top list size used for ranking.
Reproducibility score for the optimized parameters.
Z-score corresponding to the optimized parameters.
Table of z-scores across the parameter grid.
Exponentiated coefficient (hazard ratio estimate) from the Cox proportional hazards model for each feature.
Estimated q-values using the qvalue package.
Benjamini-Hochberg adjusted p-values.
The optimized null statistics for each feature.
LimROTS_survival applies the reproducibility-optimized
statistic framework to survival analysis. For each bootstrap resample,
a Cox proportional hazards model (or a competing risks model
via crr from cmprsk when competing_risks = TRUE)
is fitted per feature using the supplied formula.str. The
coefficient \(\beta_{(p)}\) and its standard error \(SE_{(p)}\) are
extracted for each feature across bootstrap datasets.
Reproducibility-optimized parameters \(\alpha_1\) and \(\alpha_2\) are then selected by maximizing the overlap of top-ranked features across group-preserving bootstrap datasets, following the ROTS approach. The final statistic for each feature is:
$$t_{\alpha_{(p)}} = \frac{\beta_{(p)}} {\alpha_1 + \alpha_2 \times SE_{(p)}}$$
where \(t_{\alpha_{(p)}}\) is the final statistic, \(\beta_{(p)}\) is the Cox model coefficient, and \(SE_{(p)}\) is its standard error.
P-values are computed from permutation-based null distributions using empPvals from the qvalue package, alongside an internal FDR implementation adapted from the ROTS package. Q-values are estimated via qvalue with the proportion of true null p-values set using the bootstrap method pi0est.
This function processes a dataset using parallel computation. It leverages the BiocParallel framework to distribute tasks across multiple workers, which can significantly reduce runtime for large datasets.
Ritchie, M.E., Phipson, B., Wu, D., Hu, Y., Law, C.W., Shi, W., and Smyth, G.K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7), e47
Suomi T, Seyednasrollah F, Jaakkola M, Faux T, Elo L (2017). “ROTS: An R package for reproducibility-optimized statistical testing. ” PLoS computational biology, 13(5), e1005562. doi:10.1371/journal.pcbi.1005562 https://doi.org/10.1371/journal.pcbi.1005562, http://www.ncbi.nlm.nih.gov/pubmed/28542205
Elo LL, Filen S, Lahesmaa R, Aittokallio T. Reproducibility-optimized test statistic for ranking genes in microarray studies. IEEE/ACM Trans Comput Biol Bioinform. 2008;5(3):423-431. doi:10.1109/tcbb.2007.1078
set.seed(123)
nsamples <- 20
nfeatures <- 50
sim_data <- matrix(rnorm(nfeatures * nsamples), nrow = nfeatures)
colnames(sim_data) <- paste0("sample", seq_len(nsamples))
rownames(sim_data) <- paste0("gene", seq_len(nfeatures))
meta_data <- data.frame(
time = abs(rnorm(nsamples, mean = 5, sd = 2)),
event = sample(0:1, nsamples, replace = TRUE),
group = factor(rep(seq_len(2), each = nsamples / 2)),
row.names = colnames(sim_data)
)
formula.str <- "~ Surv(time, event) + group"
result <- LimROTS_survival(
x = sim_data,
meta.info = meta_data,
formula.str = formula.str,
niter = 10,
verbose = FALSE,
competing_risks = FALSE
)
#> Sanity check completed successfully!
#> Using MulticoreParam (Unix-like OS) with two workers.