| Title: | Multiomics Data Integration |
|---|---|
| Description: | Provides functions to do 'O2PLS-DA' analysis for multiple omics data integration. The algorithm came from "O2-PLS, a two-block (X±Y) latent variable regression (LVR) method with an integral OSC filter" which published by Johan Trygg and Svante Wold at 2003 <doi:10.1002/cem.775>. 'O2PLS' is a bidirectional multivariate regression method that aims to separate the covariance between two data sets (it was recently extended to multiple data sets) (Löfstedt and Trygg, 2011 <doi:10.1002/cem.1388>; Löfstedt et al., 2012 <doi:10.1016/j.aca.2013.06.026>) from the systematic sources of variance being specific for each data set separately. |
| Authors: | Kai Guo [aut, cre], Junguk Hur [aut], Eva Feldman [aut] |
| Maintainer: | Kai Guo <[email protected]> |
| License: | GPL-3 + file LICENSE |
| Version: | 0.0.28 |
| Built: | 2026-05-31 10:24:35 UTC |
| Source: | https://github.com/guokai8/o2plsda |
Extract elements from sparse_plsda objects
## S4 method for signature 'sparse_plsda' x$name## S4 method for signature 'sparse_plsda' x$name
x |
A sparse_plsda object |
name |
Character string specifying the element to extract |
The requested element from the sparse_plsda object
Extract elements from SparseO2pls objects
## S4 method for signature 'SparseO2pls' x$name## S4 method for signature 'SparseO2pls' x$name
x |
A SparseO2pls object |
name |
Character string specifying the element to extract |
The requested element from the SparseO2pls object
Enhanced sparsity application function (FIXED VERSION)
apply_sparsity(x, keep_n, lambda = 0.1, penalty = "lasso")apply_sparsity(x, keep_n, lambda = 0.1, penalty = "lasso")
x |
Vector of values to apply sparsity to |
keep_n |
Number of variables to keep |
lambda |
Regularization parameter for penalty |
penalty |
Type of penalty ("lasso" or "ridge") |
This function computes Mahalanobis distances between score vectors and class centroids.
calculate_mahalanobis_distances(scores, classification_info)calculate_mahalanobis_distances(scores, classification_info)
scores |
Matrix of score vectors (samples x components) |
classification_info |
List containing centroids and other classification information |
Matrix of distances (samples x classes)
This function converts distance matrices to classification probabilities using softmax transformation.
calculate_prediction_probabilities(distances)calculate_prediction_probabilities(distances)
distances |
Matrix of distances (samples x classes) |
Matrix of probabilities (samples x classes)
Compare multiple O2PLS models
compare_models(..., type = "performance", names = NULL)compare_models(..., type = "performance", names = NULL)
... |
Multiple O2PLS model objects to compare |
type |
Character. Type of comparison: "performance", "summary" |
names |
Character vector. Names for the models (optional) |
ggplot object showing model comparison
## Not run: # Fit regular and sparse models regular_fit <- o2pls(X, Y, nc = 2, nx = 1, ny = 1) sparse_fit <- sparse_o2pls(X, Y, nc = 2, nx = 1, ny = 1, keepX = c(40, 30)) # Compare models compare_models(regular_fit, sparse_fit, type = "performance") ## End(Not run)## Not run: # Fit regular and sparse models regular_fit <- o2pls(X, Y, nc = 2, nx = 1, ny = 1) sparse_fit <- sparse_o2pls(X, Y, nc = 2, nx = 1, ny = 1, keepX = c(40, 30)) # Compare models compare_models(regular_fit, sparse_fit, type = "performance") ## End(Not run)
This function extracts loading parameters from an O2PLS fit
This function extracts loading parameters from an O2PLS fit
loadings(x, ...) ## S3 method for class 'O2pls' loadings(x, loading = c("Xjoint", "Yjoint", "Xorth", "Yorth"), ...)loadings(x, ...) ## S3 method for class 'O2pls' loadings(x, loading = c("Xjoint", "Yjoint", "Xorth", "Yorth"), ...)
x |
Object of class |
... |
For consistency |
loading |
the loadings for one of "Xjoint", "Yjoint", "Xorth", "Yorth" |
Loading matrix
Loading matrix
extract the loading value from the O2PLSDA analysis
## S3 method for class 'o2plsda' loadings(x, loading = "Xloading", ...)## S3 method for class 'o2plsda' loadings(x, loading = "Xloading", ...)
x |
Object of class |
loading |
the loadings for one of "Xjoint", "Yjoint", "Xorth", "Yorth" |
... |
For consistency |
extract the loading value from the PLSDA analysis
## S3 method for class 'plsda' loadings(x, ...)## S3 method for class 'plsda' loadings(x, ...)
x |
Object of class |
... |
For consistency |
Cross validation for O2PLS
o2cv( X, Y, nc, nx, ny, group = NULL, nr_folds = 5, ncores = 1, scale = FALSE, center = FALSE )o2cv( X, Y, nc, nx, ny, group = NULL, nr_folds = 5, ncores = 1, scale = FALSE, center = FALSE )
X |
a Numeric matrix (input) |
Y |
a Numeric matrix (input) |
nc |
Integer. Number of joint PLS components. |
nx |
Integer. Number of orthogonal components in X |
ny |
Integer. Number of orthogonal components in Y |
group |
a vector to indicate the group for Y |
nr_folds |
Integer to indicate the folds for cross validation |
ncores |
Integer. Number of CPUs to use for cross validation |
scale |
boolean values determining if data should be scaled or not |
center |
boolean values determining if data should be centered or not |
a data frame with the Q and RMSE values
Kai Guo
set.seed(123) X = matrix(rnorm(500),50,10) Y = matrix(rnorm(500),50,10) X = scale(X, scale = TRUE) Y = scale(Y, scale = TRUE) # group factor could be omitted if you don't have any group group <- rep(c("Ctrl","Treat"), each = 25) cv <- o2cv(X, Y, 1:2, 1:2, 1:2, group=group, nr_folds = 2, ncores=1)set.seed(123) X = matrix(rnorm(500),50,10) Y = matrix(rnorm(500),50,10) X = scale(X, scale = TRUE) Y = scale(Y, scale = TRUE) # group factor could be omitted if you don't have any group group <- rep(c("Ctrl","Treat"), each = 25) cv <- o2cv(X, Y, 1:2, 1:2, 1:2, group=group, nr_folds = 2, ncores=1)
Enhanced sparse cross-validation that integrates with existing o2cv
o2cv_sparse( X, Y, nc, nx, ny, keepX_range = seq(10, 50, 10), keepY_range = seq(10, 50, 10), group = NULL, nr_folds = 5, ncores = 1, validation = "Mfold" )o2cv_sparse( X, Y, nc, nx, ny, keepX_range = seq(10, 50, 10), keepY_range = seq(10, 50, 10), group = NULL, nr_folds = 5, ncores = 1, validation = "Mfold" )
X |
Input data matrix (samples x variables) |
Y |
Response data matrix (samples x variables) |
nc |
Number of joint components |
nx |
Number of X-specific orthogonal components |
ny |
Number of Y-specific orthogonal components |
keepX_range |
Range of values for number of X variables to keep |
keepY_range |
Range of values for number of Y variables to keep |
group |
Grouping variable for stratified CV |
nr_folds |
Number of cross-validation folds |
ncores |
Number of cores for parallel processing |
validation |
Cross-validation method ("Mfold", "loo", "bootstrap") |
fit O2PLS model with best nc, nx, ny
o2pls(X, Y, nc, nx, ny, scale = FALSE, center = FALSE)o2pls(X, Y, nc, nx, ny, scale = FALSE, center = FALSE)
X |
a Numeric matrix (input) |
Y |
a Numeric matrix (input) |
nc |
Integer. Number of joint PLS components. |
nx |
Integer. Number of orthogonal components in X |
ny |
Integer. Number of orthogonal components in Y |
scale |
boolean values determining if data should be scaled or not |
center |
boolean values determining if data should be centered or not |
An object containing
Xscore |
Joint |
Xloading |
Joint |
Yscore |
Joint |
Yloading |
Joint |
TYosc |
Orthogonal |
PYosc |
Orthogonal |
WYosc |
Orthogonal |
UXosc |
Orthogonal |
PXosc |
Orthogonal |
CXosc |
Orthogonal |
BU |
Regression coefficient in |
BT |
Regression coefficient in |
Xhat |
Prediction of |
Yhat |
Prediction of |
R2Xhat |
Variation of the predicted |
R2Yhat |
Variation of the predicted |
R2X |
Variation of the modeled part in |
R2Y |
Variation of the modeled part in |
R2Xcorr |
Variation of the joint part in |
R2Ycorr |
Variation of the joint part in |
R2Xo |
Variation of the orthogonal part in |
R2Yo |
Variation of the orthogonal part in |
R2Xp |
Variation in |
R2Yp |
Variation in |
varXj |
Variation in each Latent Variable (LV) in |
varYj |
Variation in each Latent Variable (LV) in |
varXorth |
Variation in each Latent Variable (LV) in |
varYorth |
Variation in each Latent Variable (LV) in |
Exy |
Residuals in |
Fxy |
Residuals in |
Kai Guo
set.seed(123) X = matrix(rnorm(500),50,10) Y = matrix(rnorm(500),50,10) X = scale(X, scale = TRUE) Y = scale(Y, scale = TRUE) fit <- o2pls(X, Y, 1, 2, 2) summary(fit)set.seed(123) X = matrix(rnorm(500),50,10) Y = matrix(rnorm(500),50,10) X = scale(X, scale = TRUE) Y = scale(Y, scale = TRUE) fit <- o2pls(X, Y, 1, 2, 2) summary(fit)
Class "O2pls" This class represents the Annotation information
Xa Numeric matrix (input)
Ya Numeric matrix (input)
paramsparamaters ysed in o2pls analysis
resultslist of o2pls results
Kai Guo
Computes orthogonal scores partial least squares regressions with the NIPALS algorithm. It return a comprehensive set of pls outputs (e.g. scores and vip).
oplsda(X, Y, nc, scale = FALSE, center = TRUE, maxiter = 100, tol = 1e-05)oplsda(X, Y, nc, scale = FALSE, center = TRUE, maxiter = 100, tol = 1e-05)
X |
a O2pls object or a matrix of predictor variables. |
Y |
a single vector indicate the group |
nc |
the number of pls components (the one joint components + number of orthogonal components ). |
scale |
logical indicating whether |
center |
boolean values determining if data should be centered or not |
maxiter |
maximum number of iterations. |
tol |
limit for convergence of the algorithm in the nipals algorithm. |
a list containing the following elements:
nc the number of components used(one joint components +
number of orthogonal components
scores a matrix of scores corresponding to the observations
in X, The components retrieved correspond to the ones optimized
or specified.
Xloadings a matrix of loadings corresponding to the
explanatory variables. The components retrieved correspond to the ones
optimized or specified.
Yloadings a matrix of partial least squares loadings
corresponding to Y
vip the VIP matrix.
xvar a matrix indicating the standard deviation of each
component (sd), the variance explained by each single component
(explained_var) and the cumulative explained variance
(cumulative_explained_var). These values are
computed based on the data used to create the projection matrices.
projection_matrix the matrix of projection matrix
weight a matrix of partial least squares ("pls") weights.
Kai Guo
X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) fit <- o2pls(X,Y,2,1,1) yy <- rep(c(0,1),5) fit0 <- oplsda(fit,yy,2)X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) fit <- o2pls(X,Y,2,1,1) yy <- rep(c(0,1),5) fit0 <- oplsda(fit,yy,2)
Score or loading plot for the O2PLS results
## S3 method for class 'O2pls' plot( x, type = "score", var = "Xjoint", group = NULL, ind = c(1, 2), color = NULL, top = 20, ellipse = FALSE, order = FALSE, pt.size = 3, label = TRUE, label.size = 4, repel = TRUE, rotation = FALSE, ... )## S3 method for class 'O2pls' plot( x, type = "score", var = "Xjoint", group = NULL, ind = c(1, 2), color = NULL, top = 20, ellipse = FALSE, order = FALSE, pt.size = 3, label = TRUE, label.size = 4, repel = TRUE, rotation = FALSE, ... )
x |
an O2pls object |
type |
score or loading |
var |
specify Xjoint |
group |
color used for score plot |
ind |
which components to be used for score plot or loading plot |
color |
color used for score or loading plot |
top |
the number of largest loading value to plot |
ellipse |
TRUE/FALSE |
order |
order by the value or not |
pt.size |
point size |
label |
plot label or not (TRUE/FALSE) |
label.size |
label size |
repel |
use ggrepel to show the label or not |
rotation |
flip the figure or not (TRUE/FALSE) |
... |
For consistency |
a ggplot2 object
Kai Guo
X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) fit <- o2pls(X,Y,2,1,1) plot(fit, type="score")X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) fit <- o2pls(X,Y,2,1,1) plot(fit, type="score")
Score, VIP or loading plot for the O2PLS results
## S3 method for class 'o2plsda' plot( x, type = "score", group = NULL, ind = c(1, 2), color = NULL, top = 20, ellipse = FALSE, order = FALSE, pt.size = 3, label = TRUE, label.size = 4, repel = FALSE, rotation = FALSE, ... )## S3 method for class 'o2plsda' plot( x, type = "score", group = NULL, ind = c(1, 2), color = NULL, top = 20, ellipse = FALSE, order = FALSE, pt.size = 3, label = TRUE, label.size = 4, repel = FALSE, rotation = FALSE, ... )
x |
an o2plsda object |
type |
score, vip or loading |
group |
color used for score plot |
ind |
which components to be used for score plot or loading plot |
color |
color used for score or loading plot |
top |
the number of largest loading value to plot |
ellipse |
TRUE/FALSE |
order |
order by the value or not |
pt.size |
point size |
label |
plot label or not (TRUE/FALSE) |
label.size |
label size |
repel |
use ggrepel to show the label or not |
rotation |
flip the figure or not (TRUE/FALSE) |
... |
For consistency |
a ggplot2 object
Kai Guo
X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) fit <- o2pls(X,Y,2,1,1) yy <- rep(c(0,1),5) fit0 <- oplsda(fit,yy,2) plot(fit0, type="score", group = factor(yy))X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) fit <- o2pls(X,Y,2,1,1) yy <- rep(c(0,1),5) fit0 <- oplsda(fit,yy,2) plot(fit0, type="score", group = factor(yy))
Score, VIP or loading plot for the plsda results
## S3 method for class 'plsda' plot( x, type = "score", group = NULL, ind = c(1, 2), color = NULL, top = 20, ellipse = FALSE, order = FALSE, pt.size = 3, label = TRUE, label.size = 4, repel = FALSE, rotation = FALSE, ... )## S3 method for class 'plsda' plot( x, type = "score", group = NULL, ind = c(1, 2), color = NULL, top = 20, ellipse = FALSE, order = FALSE, pt.size = 3, label = TRUE, label.size = 4, repel = FALSE, rotation = FALSE, ... )
x |
an plsda object |
type |
score, vip or loading |
group |
color used for score plot |
ind |
which components to be used for score plot or loading plot |
color |
color used for score or loading plot |
top |
the number of largest loading value to plot |
ellipse |
TRUE/FALSE |
order |
order by the value or not |
pt.size |
point size |
label |
plot label or not (TRUE/FALSE) |
label.size |
label size |
repel |
use ggrepel to show the label or not |
rotation |
flip the figure or not (TRUE/FALSE) |
... |
For consistency |
a ggplot2 object
Kai Guo
X <- matrix(rnorm(500),10,50) Y <- rep(c("a","b"),each=5) fit0 <- plsda(X,Y,2) plot(fit0, type = "score", group = factor(Y))X <- matrix(rnorm(500),10,50) Y <- rep(c("a","b"),each=5) fit0 <- plsda(X,Y,2) plot(fit0, type = "score", group = factor(Y))
Plot method for Sparse PLS-DA results
## S3 method for class 'sparse_plsda' plot( x, type = "score", component = c(1, 2), group = NULL, top = 20, color = NULL, ellipse = FALSE, ... )## S3 method for class 'sparse_plsda' plot( x, type = "score", component = c(1, 2), group = NULL, top = 20, color = NULL, ellipse = FALSE, ... )
x |
A sparse_plsda object (corrected class name) |
type |
Character. Type of plot: "score", "loading", "vip", "selection" |
component |
Integer or vector. Which component(s) to plot |
group |
Factor. Grouping variable (usually the Y classes) |
top |
Integer. Number of top variables to display |
color |
Custom colors for groups |
ellipse |
Logical. Add confidence ellipses to score plots |
... |
Additional plotting parameters |
# Generate example data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) # Add class-specific signals classes <- factor(rep(c("A", "B", "C"), length.out = n)) X[classes == "A", 1:20] <- X[classes == "A", 1:20] + 1.5 X[classes == "B", 21:40] <- X[classes == "B", 21:40] + 1.5 X[classes == "C", 41:60] <- X[classes == "C", 41:60] + 1.5 colnames(X) <- paste0("Gene_", 1:p) # Fit sparse PLS-DA (using corrected class name) sparse_plsda_fit <- sparse_plsda(X, classes, nc = 2, keepX = c(30, 25)) class(sparse_plsda_fit) <- "sparse_plsda" # Ensure correct class # Score plot plot(sparse_plsda_fit, type = "score", group = classes, ellipse = FALSE) # Loading plot plot(sparse_plsda_fit, type = "loading", component = 1, top = 20) # VIP plot plot(sparse_plsda_fit, type = "vip", component = 1, top = 25)# Generate example data set.seed(123) n <- 100 p <- 150 X <- matrix(rnorm(n * p), n, p) # Add class-specific signals classes <- factor(rep(c("A", "B", "C"), length.out = n)) X[classes == "A", 1:20] <- X[classes == "A", 1:20] + 1.5 X[classes == "B", 21:40] <- X[classes == "B", 21:40] + 1.5 X[classes == "C", 41:60] <- X[classes == "C", 41:60] + 1.5 colnames(X) <- paste0("Gene_", 1:p) # Fit sparse PLS-DA (using corrected class name) sparse_plsda_fit <- sparse_plsda(X, classes, nc = 2, keepX = c(30, 25)) class(sparse_plsda_fit) <- "sparse_plsda" # Ensure correct class # Score plot plot(sparse_plsda_fit, type = "score", group = classes, ellipse = FALSE) # Loading plot plot(sparse_plsda_fit, type = "loading", component = 1, top = 20) # VIP plot plot(sparse_plsda_fit, type = "vip", component = 1, top = 25)
Creates various plots for sparse O2PLS models stored as S4 objects, including score plots, loading plots, sparsity patterns, and variable selection summaries.
## S3 method for class 'SparseO2pls' plot( x, type = "score", component = c(1, 2), block = "X", group = NULL, top = 20, threshold = 1e-10, color = NULL, ellipse = FALSE, title = NULL, ... )## S3 method for class 'SparseO2pls' plot( x, type = "score", component = c(1, 2), block = "X", group = NULL, top = 20, threshold = 1e-10, color = NULL, ellipse = FALSE, title = NULL, ... )
x |
A SparseO2pls S4 object |
type |
Character. Type of plot: "score", "loading", "sparsity", "selection", "biplot", "contribution", "comparison", or "diagnostic" |
component |
Integer or vector. Which component(s) to plot (default: c(1,2)) |
block |
Character. Which data block: "X", "Y", or "both" (default: "X") |
group |
Factor. Grouping variable for score plots |
top |
Integer. Number of top variables to show in loading plots (default: 20) |
threshold |
Numeric. Threshold for variable selection display (default: 1e-10) |
color |
Character vector. Custom colors for groups |
ellipse |
Logical. Add confidence ellipses to score plots (default: TRUE) |
title |
Character. Custom plot title |
... |
Additional plotting parameters |
ggplot2 object or plotly object (for 3D plots)
# Generate example data for SparseO2pls plotting set.seed(456) n <- 120 p_X <- 180 p_Y <- 120 # Create structured data X <- matrix(rnorm(n * p_X), n, p_X) Y <- matrix(rnorm(n * p_Y), n, p_Y) # Add correlated signal signal <- matrix(rnorm(n * 4), n, 4) X[, 1:40] <- signal %*% matrix(rnorm(4 * 40), 4, 40) + matrix(rnorm(n * 40, sd = 0.4), n, 40) Y[, 1:30] <- signal %*% matrix(rnorm(4 * 30), 4, 30) + matrix(rnorm(n * 30, sd = 0.4), n, 30) # Add meaningful names colnames(X) <- paste0("Gene_", 1:p_X) colnames(Y) <- paste0("Protein_", 1:p_Y) rownames(X) <- rownames(Y) <- paste0("Sample_", 1:n) # Create grouping variable treatment_groups <- factor(rep(c("Control", "Low_Dose", "High_Dose", "Recovery"), each = n/4)) # Fit sparse O2PLS and convert to S4 object sparse_fit_list <- sparse_o2pls(X, Y, nc = 3, nx = 1, ny = 1, keepX = c(50, 40, 30), keepY = c(35, 25, 20)) # Convert to S4 SparseO2pls object sparse_s4 <- new("SparseO2pls", X = X, Y = Y, params = sparse_fit_list$params, results = sparse_fit_list$results, sparsity = sparse_fit_list$sparsity) # ============================================================================= # SCORE PLOTS # ============================================================================= # 2D score plot for X block with treatment groups p1 <- plot(sparse_s4, type = "score", block = "X", component = c(1, 2), group = treatment_groups, ellipse = FALSE) print(p1) # 2D score plot for Y block p2 <- plot(sparse_s4, type = "score", block = "Y", component = c(1, 3), group = treatment_groups, color = c("#E31A1C", "#1F78B4", "#33A02C", "#FF7F00")) print(p2) # 3D score plot p3 <- plot(sparse_s4, type = "score", block = "X", component = c(1, 2, 3), group = treatment_groups) print(p3) # ============================================================================= # LOADING PLOTS # ============================================================================= # Top gene loadings for component 1 p4 <- plot(sparse_s4, type = "loading", block = "X", component = 1, top = 30) print(p4) # Top protein loadings for component 2 p5 <- plot(sparse_s4, type = "loading", block = "Y", component = 2, top = 25, title = "Top Protein Loadings - Component 2") print(p5) # Multi-component loading comparison p6 <- plot(sparse_s4, type = "loading", block = "X", component = c(1, 2, 3), top = 35) print(p6) # ============================================================================= # SPARSITY AND SELECTION PLOTS # ============================================================================= # Sparsity pattern for both blocks p7 <- plot(sparse_s4, type = "sparsity", block = "both") print(p7) # Variable selection summary p8 <- plot(sparse_s4, type = "selection") print(p8) # ============================================================================= # ADVANCED PLOTS # ============================================================================= # Biplot combining scores and loadings p9 <- plot(sparse_s4, type = "biplot", component = c(1, 2), group = treatment_groups, top = 20) print(p9) # Variable contribution analysis p10 <- plot(sparse_s4, type = "contribution", block = "X", component = 1, top = 25) print(p10) # Diagnostic plots p11 <- plot(sparse_s4, type = "diagnostic") print(p11)# Generate example data for SparseO2pls plotting set.seed(456) n <- 120 p_X <- 180 p_Y <- 120 # Create structured data X <- matrix(rnorm(n * p_X), n, p_X) Y <- matrix(rnorm(n * p_Y), n, p_Y) # Add correlated signal signal <- matrix(rnorm(n * 4), n, 4) X[, 1:40] <- signal %*% matrix(rnorm(4 * 40), 4, 40) + matrix(rnorm(n * 40, sd = 0.4), n, 40) Y[, 1:30] <- signal %*% matrix(rnorm(4 * 30), 4, 30) + matrix(rnorm(n * 30, sd = 0.4), n, 30) # Add meaningful names colnames(X) <- paste0("Gene_", 1:p_X) colnames(Y) <- paste0("Protein_", 1:p_Y) rownames(X) <- rownames(Y) <- paste0("Sample_", 1:n) # Create grouping variable treatment_groups <- factor(rep(c("Control", "Low_Dose", "High_Dose", "Recovery"), each = n/4)) # Fit sparse O2PLS and convert to S4 object sparse_fit_list <- sparse_o2pls(X, Y, nc = 3, nx = 1, ny = 1, keepX = c(50, 40, 30), keepY = c(35, 25, 20)) # Convert to S4 SparseO2pls object sparse_s4 <- new("SparseO2pls", X = X, Y = Y, params = sparse_fit_list$params, results = sparse_fit_list$results, sparsity = sparse_fit_list$sparsity) # ============================================================================= # SCORE PLOTS # ============================================================================= # 2D score plot for X block with treatment groups p1 <- plot(sparse_s4, type = "score", block = "X", component = c(1, 2), group = treatment_groups, ellipse = FALSE) print(p1) # 2D score plot for Y block p2 <- plot(sparse_s4, type = "score", block = "Y", component = c(1, 3), group = treatment_groups, color = c("#E31A1C", "#1F78B4", "#33A02C", "#FF7F00")) print(p2) # 3D score plot p3 <- plot(sparse_s4, type = "score", block = "X", component = c(1, 2, 3), group = treatment_groups) print(p3) # ============================================================================= # LOADING PLOTS # ============================================================================= # Top gene loadings for component 1 p4 <- plot(sparse_s4, type = "loading", block = "X", component = 1, top = 30) print(p4) # Top protein loadings for component 2 p5 <- plot(sparse_s4, type = "loading", block = "Y", component = 2, top = 25, title = "Top Protein Loadings - Component 2") print(p5) # Multi-component loading comparison p6 <- plot(sparse_s4, type = "loading", block = "X", component = c(1, 2, 3), top = 35) print(p6) # ============================================================================= # SPARSITY AND SELECTION PLOTS # ============================================================================= # Sparsity pattern for both blocks p7 <- plot(sparse_s4, type = "sparsity", block = "both") print(p7) # Variable selection summary p8 <- plot(sparse_s4, type = "selection") print(p8) # ============================================================================= # ADVANCED PLOTS # ============================================================================= # Biplot combining scores and loadings p9 <- plot(sparse_s4, type = "biplot", component = c(1, 2), group = treatment_groups, top = 20) print(p9) # Variable contribution analysis p10 <- plot(sparse_s4, type = "contribution", block = "X", component = 1, top = 25) print(p10) # Diagnostic plots p11 <- plot(sparse_s4, type = "diagnostic") print(p11)
Enhanced plot method for stability_selection objects
## S3 method for class 'stability_selection' plot(x, component = 1, type = "heatmap", top_n = 50, ...)## S3 method for class 'stability_selection' plot(x, component = 1, type = "heatmap", top_n = 50, ...)
x |
A stability_selection object |
component |
Integer. Which component to visualize |
type |
Character. Type of plot: "heatmap", "barplot", "threshold", "summary" |
top_n |
Integer. Number of top variables to show |
... |
Additional plotting parameters |
## Not run: # Assuming you have a stability_selection object stability_res <- stability_selection(X, Y, nc = 2, n_bootstrap = 50) # Stability heatmap plot(stability_res, type = "heatmap", component = 1) # Stability barplot plot(stability_res, type = "barplot", component = 1, top_n = 30) # Threshold analysis plot(stability_res, type = "threshold") ## End(Not run)## Not run: # Assuming you have a stability_selection object stability_res <- stability_selection(X, Y, nc = 2, n_bootstrap = 50) # Stability heatmap plot(stability_res, type = "heatmap", component = 1) # Stability barplot plot(stability_res, type = "barplot", component = 1, top_n = 30) # Threshold analysis plot(stability_res, type = "threshold") ## End(Not run)
Enhanced plot method for TuneResult objects
## S3 method for class 'TuneResult' plot(x, type = "heatmap", metric = "mean_score", ...)## S3 method for class 'TuneResult' plot(x, type = "heatmap", metric = "mean_score", ...)
x |
A TuneResult object |
type |
Character. Type of plot: "heatmap", "line", "comparison", "optimal" |
metric |
Character. Performance metric to visualize |
... |
Additional plotting parameters |
# Example with tuning results ## Not run: # Assuming you have a TuneResult object from tune_o2pls() tune_results <- tune_o2pls(X, Y, nc_range = 1:3, nx_range = 0:2, ny_range = 0:2) # Heatmap of results plot(tune_results, type = "heatmap") # Line plot showing trends plot(tune_results, type = "line") # Comparison across parameters plot(tune_results, type = "comparison") ## End(Not run)# Example with tuning results ## Not run: # Assuming you have a TuneResult object from tune_o2pls() tune_results <- tune_o2pls(X, Y, nc_range = 1:3, nx_range = 0:2, ny_range = 0:2) # Heatmap of results plot(tune_results, type = "heatmap") # Line plot showing trends plot(tune_results, type = "line") # Comparison across parameters plot(tune_results, type = "comparison") ## End(Not run)
Perform a PLS discriminant analysis
plsda( X, Y, nc, scale = TRUE, center = TRUE, cv = TRUE, nr_folds = 5, tol = 1e-06, max_iter = 100, q2_threshold = 0.05 )plsda( X, Y, nc, scale = TRUE, center = TRUE, cv = TRUE, nr_folds = 5, tol = 1e-06, max_iter = 100, q2_threshold = 0.05 )
X |
a matrix of predictor variables. |
Y |
a single vector indicate the group |
nc |
the number of pls components |
scale |
logical indicating whether X must be scaled (suggest TRUE). |
center |
logical indicating whether X must be centered (suggest TRUE). |
cv |
logical indicating whether cross-validation will be performed or not (suggest TRUE). |
nr_folds |
Integer to indicate the folds for cross validation. |
tol |
Convergence tolerance (default: 1e-6) |
max_iter |
Maximum number of iterations (default: 100) |
q2_threshold |
Q2 threshold for component selection (default: 0.05) |
a list containing PLS-DA results
Predict method for sparse_plsda objects
## S4 method for signature 'sparse_plsda' predict(object, newdata, ...)## S4 method for signature 'sparse_plsda' predict(object, newdata, ...)
object |
A sparse_plsda object |
newdata |
New data matrix for prediction |
... |
Additional arguments passed to predict.sparse_plsda |
Prediction results
Predict method for SparseO2pls objects
## S4 method for signature 'SparseO2pls' predict(object, newdata, ...)## S4 method for signature 'SparseO2pls' predict(object, newdata, ...)
object |
A SparseO2pls object |
newdata |
New data matrix for prediction |
... |
Additional arguments passed to predict.SparseO2pls |
Prediction results
Enhanced predict method for sparse PLS-DA
predict.sparse_plsda(object, newdata, dist = "max.dist", ...)predict.sparse_plsda(object, newdata, dist = "max.dist", ...)
object |
A sparse_plsda object |
newdata |
New data matrix for prediction |
dist |
Distance method for classification ("max.dist" or "centroids.dist") |
... |
Additional arguments (currently unused) |
Prediction results
Enhanced predict method for sparse O2PLS
predict.SparseO2pls(object, newdata, ...)predict.SparseO2pls(object, newdata, ...)
object |
A SparseO2pls object |
newdata |
New data matrix for prediction |
... |
Additional arguments (currently unused) |
Prediction results
Print the summary of O2PLS results.
## S3 method for class 'O2pls' print(x, ...)## S3 method for class 'O2pls' print(x, ...)
x |
An O2pls object |
... |
For consistency |
Kai Guo
X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) object <- o2pls(X,Y,1,1,1) print(object)X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) object <- o2pls(X,Y,1,1,1) print(object)
Print the summary of plsda results.
## S3 method for class 'plsda' print(x, ...)## S3 method for class 'plsda' print(x, ...)
x |
An plsda object |
... |
For consistency |
Kai Guo
X <- matrix(rnorm(500),10,50) Y <- rep(c("a","b"),each=5) fit <- plsda(X,Y,2) print(fit)X <- matrix(rnorm(500),10,50) Y <- rep(c("a","b"),each=5) fit <- plsda(X,Y,2) print(fit)
Print method for sparse O2PLS results
## S3 method for class 'SparseO2pls' print(x, ...)## S3 method for class 'SparseO2pls' print(x, ...)
x |
A SparseO2pls object to print |
... |
Additional arguments passed to summary method |
Enhanced SCAD thresholding function
scad_threshold(x, lambda, a)scad_threshold(x, lambda, a)
x |
Vector of values to threshold |
lambda |
Regularization parameter |
a |
SCAD parameter (typically 3.7) |
Thresholded values
This function extracts score matrices from an O2PLS fit
scores(x, ...)scores(x, ...)
x |
Object of class |
... |
For consistency |
Scores matrix
This function extracts scores parameters from an O2PLS fit
## S3 method for class 'O2pls' scores(x, score = c("Xjoint", "Yjoint", "Xorth", "Yorth"), ...)## S3 method for class 'O2pls' scores(x, score = c("Xjoint", "Yjoint", "Xorth", "Yorth"), ...)
x |
Object of class |
score |
the scores matrix for one of "Xjoint", "Yjoint", "Xorth", "Yorth" |
... |
Other arguments |
score matrix
Extract the scores from an O2PLS DA analysis
## S3 method for class 'o2plsda' scores(x, ...)## S3 method for class 'o2plsda' scores(x, ...)
x |
Object of class |
... |
Other arguments |
score matrix
Kai Guo
Extract the scores PLSDA analysis
## S3 method for class 'plsda' scores(x, ...)## S3 method for class 'plsda' scores(x, ...)
x |
Object of class |
... |
Other arguments |
score matrix
Kai Guo
Get names of selected variables
selected_var_names(x, type = "X", ...)selected_var_names(x, type = "X", ...)
x |
A sparse model object |
type |
Character. Which variables to extract: "X", "Y", or "both" |
... |
Additional arguments passed to selected_vars |
Character vector of variable names or list of name vectors
Extract selected variables from sparse models
selected_vars(x, ...)selected_vars(x, ...)
x |
A sparse model object |
... |
Additional arguments passed to methods |
Vector or list of selected variable indices
Default method for selected_vars
## Default S3 method: selected_vars(x, ...)## Default S3 method: selected_vars(x, ...)
x |
An object |
... |
Additional arguments |
Extract selected variables from sparse PLS-DA models
## S3 method for class 'sparse_plsda' selected_vars(x, ...)## S3 method for class 'sparse_plsda' selected_vars(x, ...)
x |
A sparse_plsda object |
... |
Additional arguments (currently unused) |
Vector of selected variable indices
Extract selected variables from SparseO2pls S4 objects
## S3 method for class 'SparseO2pls' selected_vars(x, type = "X", ...)## S3 method for class 'SparseO2pls' selected_vars(x, type = "X", ...)
x |
A SparseO2pls S4 object |
type |
Character. Which variables to extract: "X", "Y", or "both" |
... |
Additional arguments (currently unused) |
Vector of variable indices (for "X" or "Y") or list of indices (for "both")
Performs sparse O2PLS analysis with automatic variable selection using L1 regularization (Lasso) or other penalty methods.
sparse_o2pls( X, Y, nc, nx = 0, ny = 0, keepX = NULL, keepY = NULL, lambda_x = 0.1, lambda_y = 0.1, penalty = "lasso", scale = TRUE, center = TRUE, max_iter = 100, tol = 1e-06 )sparse_o2pls( X, Y, nc, nx = 0, ny = 0, keepX = NULL, keepY = NULL, lambda_x = 0.1, lambda_y = 0.1, penalty = "lasso", scale = TRUE, center = TRUE, max_iter = 100, tol = 1e-06 )
X |
Numeric matrix (samples x variables). The predictor data matrix. |
Y |
Numeric matrix (samples x variables). The response data matrix. |
nc |
Integer. Number of joint components to extract. |
nx |
Integer. Number of X-specific orthogonal components (default: 0). |
ny |
Integer. Number of Y-specific orthogonal components (default: 0). |
keepX |
Vector of integers. Number of X variables to keep per component. If NULL, automatically determined based on data dimensions. |
keepY |
Vector of integers. Number of Y variables to keep per component. If NULL, automatically determined based on data dimensions. |
lambda_x |
Numeric. L1 penalty parameter for X variables (default: 0.1). |
lambda_y |
Numeric. L1 penalty parameter for Y variables (default: 0.1). |
penalty |
Character. Penalty type: "lasso" (default) or "elastic". |
scale |
Logical. Scale variables to unit variance (default: TRUE). |
center |
Logical. Center variables to zero mean (default: TRUE). |
max_iter |
Integer. Maximum iterations for convergence (default: 100). |
tol |
Numeric. Convergence tolerance (default: 1e-6). |
Sparse O2PLS extends traditional O2PLS by incorporating variable selection through L1 regularization. This is particularly useful for high-dimensional data where only a subset of variables are relevant for the relationship between X and Y datasets.
The keepX and keepY parameters control the level of sparsity. Smaller values lead to sparser models with fewer selected variables. If not specified, reasonable defaults are chosen based on data dimensions.
sparse_o2pls object containing sparse loadings, scores, and sparsity information.
Kai Guo
o2pls, selected_vars, stability_selection
# Example 1: Basic sparse O2PLS with automatic parameter selection set.seed(42) n <- 80 p_X <- 100 # High-dimensional X p_Y <- 60 # High-dimensional Y # Generate sparse data with only some variables being relevant X <- matrix(rnorm(n * p_X), n, p_X) Y <- matrix(rnorm(n * p_Y), n, p_Y) # Create relationships between first 20 variables true_signal <- matrix(rnorm(n * 3), n, 3) # 3 latent factors X[, 1:20] <- true_signal %*% matrix(rnorm(3 * 20), 3, 20) + matrix(rnorm(n * 20, sd = 0.5), n, 20) Y[, 1:15] <- true_signal %*% matrix(rnorm(3 * 15), 3, 15) + matrix(rnorm(n * 15, sd = 0.5), n, 15) # Add variable names colnames(X) <- paste0("X_", 1:p_X) colnames(Y) <- paste0("Y_", 1:p_Y) rownames(X) <- rownames(Y) <- paste0("Sample_", 1:n) # Fit sparse O2PLS with automatic keepX/keepY selection sparse_fit1 <- sparse_o2pls(X, Y, nc = 2) # View results print(sparse_fit1) summary(sparse_fit1) # Extract selected variables selected_X <- selected_vars(sparse_fit1, type = "X") selected_Y <- selected_vars(sparse_fit1, type = "Y") selected_X_names <- selected_var_names(sparse_fit1, type = "X") selected_Y_names <- selected_var_names(sparse_fit1, type = "Y") cat("Selected X variables:", length(selected_X), "out of", p_X, "\n") cat("Selected Y variables:", length(selected_Y), "out of", p_Y, "\n") cat("X variables:", paste(head(selected_X_names, 10), collapse = ", "), "\n") # Example 2: Controlled sparsity levels # Specify exact number of variables to keep sparse_fit2 <- sparse_o2pls( X = X, Y = Y, nc = 2, nx = 1, ny = 1, keepX = c(25, 20), # Keep 25 variables in comp 1, 20 in comp 2 keepY = c(20, 15), # Keep 20 variables in comp 1, 15 in comp 2 lambda_x = 0.05, lambda_y = 0.05, penalty = "lasso" ) # Compare sparsity levels sparsity_info1 <- sparsity_info(sparse_fit1) sparsity_info2 <- sparsity_info(sparse_fit2) print("Automatic sparsity:") print(sparsity_info1) print("Controlled sparsity:") print(sparsity_info2) # Example 3: Prediction with sparse model # Split data for validation train_idx <- 1:60 test_idx <- 61:80 X_train <- X[train_idx, ] Y_train <- Y[train_idx, ] X_test <- X[test_idx, ] Y_test <- Y[test_idx, ] # Fit on training data sparse_train <- sparse_o2pls(X_train, Y_train, nc = 2, keepX = c(30, 25)) # Predict on test data Y_pred <- predict(sparse_train, X_test) # Calculate prediction error pred_error <- sqrt(mean((Y_test - Y_pred)^2)) cat("Prediction RMSE:", round(pred_error, 4), "\n") # Example 4: Different penalty types # Compare Lasso vs Elastic Net sparse_lasso <- sparse_o2pls(X, Y, nc = 2, penalty = "lasso", lambda_x = 0.1) sparse_elastic <- sparse_o2pls(X, Y, nc = 2, penalty = "elastic", lambda_x = 0.1) # Compare number of selected variables cat("Lasso selected:", length(selected_vars(sparse_lasso, "X")), "X variables\n") cat("Elastic Net selected:", length(selected_vars(sparse_elastic, "X")), "X variables\n")# Example 1: Basic sparse O2PLS with automatic parameter selection set.seed(42) n <- 80 p_X <- 100 # High-dimensional X p_Y <- 60 # High-dimensional Y # Generate sparse data with only some variables being relevant X <- matrix(rnorm(n * p_X), n, p_X) Y <- matrix(rnorm(n * p_Y), n, p_Y) # Create relationships between first 20 variables true_signal <- matrix(rnorm(n * 3), n, 3) # 3 latent factors X[, 1:20] <- true_signal %*% matrix(rnorm(3 * 20), 3, 20) + matrix(rnorm(n * 20, sd = 0.5), n, 20) Y[, 1:15] <- true_signal %*% matrix(rnorm(3 * 15), 3, 15) + matrix(rnorm(n * 15, sd = 0.5), n, 15) # Add variable names colnames(X) <- paste0("X_", 1:p_X) colnames(Y) <- paste0("Y_", 1:p_Y) rownames(X) <- rownames(Y) <- paste0("Sample_", 1:n) # Fit sparse O2PLS with automatic keepX/keepY selection sparse_fit1 <- sparse_o2pls(X, Y, nc = 2) # View results print(sparse_fit1) summary(sparse_fit1) # Extract selected variables selected_X <- selected_vars(sparse_fit1, type = "X") selected_Y <- selected_vars(sparse_fit1, type = "Y") selected_X_names <- selected_var_names(sparse_fit1, type = "X") selected_Y_names <- selected_var_names(sparse_fit1, type = "Y") cat("Selected X variables:", length(selected_X), "out of", p_X, "\n") cat("Selected Y variables:", length(selected_Y), "out of", p_Y, "\n") cat("X variables:", paste(head(selected_X_names, 10), collapse = ", "), "\n") # Example 2: Controlled sparsity levels # Specify exact number of variables to keep sparse_fit2 <- sparse_o2pls( X = X, Y = Y, nc = 2, nx = 1, ny = 1, keepX = c(25, 20), # Keep 25 variables in comp 1, 20 in comp 2 keepY = c(20, 15), # Keep 20 variables in comp 1, 15 in comp 2 lambda_x = 0.05, lambda_y = 0.05, penalty = "lasso" ) # Compare sparsity levels sparsity_info1 <- sparsity_info(sparse_fit1) sparsity_info2 <- sparsity_info(sparse_fit2) print("Automatic sparsity:") print(sparsity_info1) print("Controlled sparsity:") print(sparsity_info2) # Example 3: Prediction with sparse model # Split data for validation train_idx <- 1:60 test_idx <- 61:80 X_train <- X[train_idx, ] Y_train <- Y[train_idx, ] X_test <- X[test_idx, ] Y_test <- Y[test_idx, ] # Fit on training data sparse_train <- sparse_o2pls(X_train, Y_train, nc = 2, keepX = c(30, 25)) # Predict on test data Y_pred <- predict(sparse_train, X_test) # Calculate prediction error pred_error <- sqrt(mean((Y_test - Y_pred)^2)) cat("Prediction RMSE:", round(pred_error, 4), "\n") # Example 4: Different penalty types # Compare Lasso vs Elastic Net sparse_lasso <- sparse_o2pls(X, Y, nc = 2, penalty = "lasso", lambda_x = 0.1) sparse_elastic <- sparse_o2pls(X, Y, nc = 2, penalty = "elastic", lambda_x = 0.1) # Compare number of selected variables cat("Lasso selected:", length(selected_vars(sparse_lasso, "X")), "X variables\n") cat("Elastic Net selected:", length(selected_vars(sparse_elastic, "X")), "X variables\n")
Performs sparse PLS-DA for classification with automatic variable selection.
sparse_plsda( X, Y, nc, keepX = NULL, validation = "Mfold", folds = 5, test.keepX = seq(5, 50, 5), scale = TRUE, center = TRUE, dist = "max.dist", tune.keepX = is.null(keepX) )sparse_plsda( X, Y, nc, keepX = NULL, validation = "Mfold", folds = 5, test.keepX = seq(5, 50, 5), scale = TRUE, center = TRUE, dist = "max.dist", tune.keepX = is.null(keepX) )
X |
Numeric matrix (samples x variables). The predictor data matrix. |
Y |
Factor or character vector. Class labels for each sample. |
nc |
Integer. Number of components to extract. |
keepX |
Vector of integers. Number of variables to keep per component. If NULL, automatically determined. |
validation |
Character. Validation method for parameter tuning: "Mfold" or "loo". |
folds |
Integer. Number of cross-validation folds (default: 5). |
test.keepX |
Vector of keepX values to test during tuning. |
scale |
Logical. Scale variables (default: TRUE). |
center |
Logical. Center variables (default: TRUE). |
dist |
Character. Distance metric for classification: "max.dist", "centroids.dist", or "mahalanobis.dist". |
tune.keepX |
Logical. Automatically tune keepX parameters (default: TRUE if keepX is NULL). |
sparse_plsda object containing sparse loadings, scores, VIP scores, and classification results.
Kai Guo
sparse_o2pls, stability_selection, tune_sparse_keepX
# Example 1: Basic sparse PLS-DA set.seed(123) n <- 120 p <- 200 # High-dimensional feature space # Generate classification data X <- matrix(rnorm(n * p), n, p) # Create three classes with different patterns n_per_class <- n / 3 classes <- rep(c("Class_A", "Class_B", "Class_C"), each = n_per_class) # Add class-specific signals to subsets of variables X[classes == "Class_A", 1:20] <- X[classes == "Class_A", 1:20] + 2 X[classes == "Class_B", 21:40] <- X[classes == "Class_B", 21:40] + 2 X[classes == "Class_C", 41:60] <- X[classes == "Class_C", 41:60] + 2 # Add some noise correlation X[, 61:80] <- X[, 1:20] + matrix(rnorm(n * 20, sd = 0.5), n, 20) colnames(X) <- paste0("Feature_", 1:p) Y <- factor(classes) # Fit sparse PLS-DA with automatic tuning sparse_plsda_fit <- sparse_plsda( X = X, Y = Y, nc = 2, validation = "Mfold", folds = 5, test.keepX = seq(10, 50, 10), # Test different sparsity levels tune.keepX = TRUE ) # View results print(sparse_plsda_fit) summary(sparse_plsda_fit) # Extract selected features selected_features <- selected_vars(sparse_plsda_fit) selected_names <- selected_var_names(sparse_plsda_fit) cat("Selected", length(selected_features), "out of", p, "features\n") cat("Selected features:", paste(head(selected_names, 15), collapse = ", "), "\n") # Plot results plot(sparse_plsda_fit, type = "score", group = Y) plot(sparse_plsda_fit, type = "loading", top = 20) # Example 2: Manual keepX specification sparse_plsda_manual <- sparse_plsda( X = X, Y = Y, nc = 3, keepX = c(30, 25, 20), # Decreasing sparsity per component tune.keepX = FALSE, dist = "centroids.dist" ) # Compare classification performance cat("Auto-tuned error rate:", sparse_plsda_fit$classification$overall_error, "\n") cat("Manual keepX error rate:", sparse_plsda_manual$classification$overall_error, "\n") # Example 3: Cross-validation and prediction # Split data train_idx <- sample(1:n, 0.7 * n) test_idx <- setdiff(1:n, train_idx) X_train <- X[train_idx, ] Y_train <- Y[train_idx] X_test <- X[test_idx, ] Y_test <- Y[test_idx] # Train model model_train <- sparse_plsda(X_train, Y_train, nc = 2, keepX = c(25, 20)) # Predict test set predictions <- predict(model_train, X_test, dist = "max.dist") # Evaluate predictions confusion_matrix <- table(Predicted = predictions$class, Actual = Y_test) accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix) print(confusion_matrix) cat("Test set accuracy:", round(accuracy, 3), "\n")# Example 1: Basic sparse PLS-DA set.seed(123) n <- 120 p <- 200 # High-dimensional feature space # Generate classification data X <- matrix(rnorm(n * p), n, p) # Create three classes with different patterns n_per_class <- n / 3 classes <- rep(c("Class_A", "Class_B", "Class_C"), each = n_per_class) # Add class-specific signals to subsets of variables X[classes == "Class_A", 1:20] <- X[classes == "Class_A", 1:20] + 2 X[classes == "Class_B", 21:40] <- X[classes == "Class_B", 21:40] + 2 X[classes == "Class_C", 41:60] <- X[classes == "Class_C", 41:60] + 2 # Add some noise correlation X[, 61:80] <- X[, 1:20] + matrix(rnorm(n * 20, sd = 0.5), n, 20) colnames(X) <- paste0("Feature_", 1:p) Y <- factor(classes) # Fit sparse PLS-DA with automatic tuning sparse_plsda_fit <- sparse_plsda( X = X, Y = Y, nc = 2, validation = "Mfold", folds = 5, test.keepX = seq(10, 50, 10), # Test different sparsity levels tune.keepX = TRUE ) # View results print(sparse_plsda_fit) summary(sparse_plsda_fit) # Extract selected features selected_features <- selected_vars(sparse_plsda_fit) selected_names <- selected_var_names(sparse_plsda_fit) cat("Selected", length(selected_features), "out of", p, "features\n") cat("Selected features:", paste(head(selected_names, 15), collapse = ", "), "\n") # Plot results plot(sparse_plsda_fit, type = "score", group = Y) plot(sparse_plsda_fit, type = "loading", top = 20) # Example 2: Manual keepX specification sparse_plsda_manual <- sparse_plsda( X = X, Y = Y, nc = 3, keepX = c(30, 25, 20), # Decreasing sparsity per component tune.keepX = FALSE, dist = "centroids.dist" ) # Compare classification performance cat("Auto-tuned error rate:", sparse_plsda_fit$classification$overall_error, "\n") cat("Manual keepX error rate:", sparse_plsda_manual$classification$overall_error, "\n") # Example 3: Cross-validation and prediction # Split data train_idx <- sample(1:n, 0.7 * n) test_idx <- setdiff(1:n, train_idx) X_train <- X[train_idx, ] Y_train <- Y[train_idx] X_test <- X[test_idx, ] Y_test <- Y[test_idx] # Train model model_train <- sparse_plsda(X_train, Y_train, nc = 2, keepX = c(25, 20)) # Predict test set predictions <- predict(model_train, X_test, dist = "max.dist") # Evaluate predictions confusion_matrix <- table(Predicted = predictions$class, Actual = Y_test) accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix) print(confusion_matrix) cat("Test set accuracy:", round(accuracy, 3), "\n")
Class "SparseO2pls" This class represents sparse O2PLS analysis results
XNumeric matrix (input X)
YNumeric matrix (input Y)
paramsList of parameters including sparsity
resultsList of sparse O2PLS results
sparsityList of sparsity-related information
preprocessinglist of preprocessing information
callThe matched call used to generate the object
Kai Guo
Get sparsity information from sparse models
sparsity_info(x, ...) sparsity_info.sparse_o2pls(x, ...) sparsity_info.sparse_plsda(x, ...) ## S4 method for signature 'sparse_plsda' sparsity_info(x, ...) sparsity_info.SparseO2pls(x, ...) ## S4 method for signature 'SparseO2pls' sparsity_info(x, ...)sparsity_info(x, ...) sparsity_info.sparse_o2pls(x, ...) sparsity_info.sparse_plsda(x, ...) ## S4 method for signature 'sparse_plsda' sparsity_info(x, ...) sparsity_info.SparseO2pls(x, ...) ## S4 method for signature 'SparseO2pls' sparsity_info(x, ...)
x |
A sparse model object |
... |
Additional arguments |
List containing sparsity statistics
Performs bootstrap stability selection to identify robust features across multiple bootstrap samples and sparsity levels.
stability_selection( X, Y, nc, keepX_range = seq(10, 100, 10), n_bootstrap = 100, threshold = 0.8, method = "sparse_plsda", parallel = TRUE, ncores = NULL )stability_selection( X, Y, nc, keepX_range = seq(10, 100, 10), n_bootstrap = 100, threshold = 0.8, method = "sparse_plsda", parallel = TRUE, ncores = NULL )
X |
Numeric matrix (samples x variables). The predictor data matrix. |
Y |
Factor, character vector, or numeric matrix. Response data. |
nc |
Integer. Number of components to extract. |
keepX_range |
Vector of integers. Range of keepX values to test (default: seq(10, 100, 10)). |
n_bootstrap |
Integer. Number of bootstrap samples (default: 100). |
threshold |
Numeric. Stability threshold between 0 and 1 (default: 0.8). Variables selected in at least this proportion of bootstrap samples are considered stable. |
method |
Character. Method to use: "sparse_plsda" or "sparse_o2pls". |
parallel |
Logical. Use parallel processing (default: TRUE). |
ncores |
Integer. Number of cores for parallel processing (default: NULL, auto-detect). |
Stability selection is a method for controlling the expected number of false positive selections in high-dimensional variable selection. It works by applying the sparse method to many bootstrap samples and identifying variables that are consistently selected across different samples and sparsity levels.
The threshold parameter controls the stringency of selection. Higher values (closer to 1) result in more conservative selection with fewer false positives but potentially more false negatives.
stability_selection object containing selection probabilities and stable variables.
Kai Guo
sparse_plsda, sparse_o2pls, plot.stability_selection
# Example 1: Stability selection for sparse PLS-DA set.seed(456) n <- 100 p <- 150 # Generate data with stable and unstable signals X <- matrix(rnorm(n * p), n, p) # Create stable signal in first 15 variables stable_signal <- matrix(rnorm(n * 2), n, 2) X[, 1:15] <- stable_signal %*% matrix(rnorm(2 * 15), 2, 15) + matrix(rnorm(n * 15, sd = 0.3), n, 15) # Create unstable/noise signal in variables 16-30 X[, 16:30] <- X[, 16:30] + matrix(rnorm(n * 15, sd = 0.8), n, 15) # Create classes based on stable signal class_probs <- plogis(rowSums(X[, 1:5]) - mean(rowSums(X[, 1:5]))) Y <- factor(ifelse(runif(n) < class_probs, "Class1", "Class2")) colnames(X) <- paste0("Var_", 1:p) # Perform stability selection (reduced parameters for quick example) stability_result <- stability_selection( X = X, Y = Y, nc = 2, keepX_range = seq(10, 40, 10), # Test fewer sparsity levels n_bootstrap = 50, # Fewer bootstrap samples for speed threshold = 0.7, # 70% stability threshold method = "sparse_plsda", parallel = FALSE, # Disable parallel for example ncores = 1 ) # View results print(stability_result) # Extract stable variables for each component stable_comp1 <- stability_result$stable_variables$Component_1 stable_comp2 <- stability_result$stable_variables$Component_2 if(!is.null(stable_comp1)) { cat("Stable variables in Component 1:\n") print(stable_comp1) } if(!is.null(stable_comp2)) { cat("Stable variables in Component 2:\n") print(stable_comp2) } # Plot stability results plot(stability_result, type = "heatmap", component = 1) plot(stability_result, type = "barplot", component = 1, top_n = 30) # Example 2: Stability selection for sparse O2PLS # Generate correlated X and Y matrices Y_matrix <- matrix(rnorm(n * 50), n, 50) # Add correlation with X Y_matrix[, 1:10] <- X[, 1:10] + matrix(rnorm(n * 10, sd = 0.5), n, 10) colnames(Y_matrix) <- paste0("Y_", 1:50) # Stability selection for O2PLS ## Not run: stability_o2pls <- stability_selection( X = X, Y = Y_matrix, nc = 2, keepX_range = seq(15, 45, 15), n_bootstrap = 100, threshold = 0.8, method = "sparse_o2pls", parallel = TRUE, ncores = 2 ) # Extract stable variables for both X and Y stable_X <- stability_o2pls$stable_variables$Component_1$Variable_Name stable_Y <- stability_o2pls$stable_variables$Component_1$Variable_Name ## End(Not run) # Example 3: Different stability thresholds # Compare conservative vs liberal thresholds ## Not run: # Conservative selection (80% threshold) stability_conservative <- stability_selection( X, Y, nc = 2, threshold = 0.8, n_bootstrap = 100 ) # Liberal selection (60% threshold) stability_liberal <- stability_selection( X, Y, nc = 2, threshold = 0.6, n_bootstrap = 100 ) # Compare number of stable variables n_stable_cons <- length(stability_conservative$stable_variables$Component_1$Variable) n_stable_lib <- length(stability_liberal$stable_variables$Component_1$Variable) cat("Conservative (80%):", n_stable_cons, "stable variables\n") cat("Liberal (60%):", n_stable_lib, "stable variables\n") ## End(Not run)# Example 1: Stability selection for sparse PLS-DA set.seed(456) n <- 100 p <- 150 # Generate data with stable and unstable signals X <- matrix(rnorm(n * p), n, p) # Create stable signal in first 15 variables stable_signal <- matrix(rnorm(n * 2), n, 2) X[, 1:15] <- stable_signal %*% matrix(rnorm(2 * 15), 2, 15) + matrix(rnorm(n * 15, sd = 0.3), n, 15) # Create unstable/noise signal in variables 16-30 X[, 16:30] <- X[, 16:30] + matrix(rnorm(n * 15, sd = 0.8), n, 15) # Create classes based on stable signal class_probs <- plogis(rowSums(X[, 1:5]) - mean(rowSums(X[, 1:5]))) Y <- factor(ifelse(runif(n) < class_probs, "Class1", "Class2")) colnames(X) <- paste0("Var_", 1:p) # Perform stability selection (reduced parameters for quick example) stability_result <- stability_selection( X = X, Y = Y, nc = 2, keepX_range = seq(10, 40, 10), # Test fewer sparsity levels n_bootstrap = 50, # Fewer bootstrap samples for speed threshold = 0.7, # 70% stability threshold method = "sparse_plsda", parallel = FALSE, # Disable parallel for example ncores = 1 ) # View results print(stability_result) # Extract stable variables for each component stable_comp1 <- stability_result$stable_variables$Component_1 stable_comp2 <- stability_result$stable_variables$Component_2 if(!is.null(stable_comp1)) { cat("Stable variables in Component 1:\n") print(stable_comp1) } if(!is.null(stable_comp2)) { cat("Stable variables in Component 2:\n") print(stable_comp2) } # Plot stability results plot(stability_result, type = "heatmap", component = 1) plot(stability_result, type = "barplot", component = 1, top_n = 30) # Example 2: Stability selection for sparse O2PLS # Generate correlated X and Y matrices Y_matrix <- matrix(rnorm(n * 50), n, 50) # Add correlation with X Y_matrix[, 1:10] <- X[, 1:10] + matrix(rnorm(n * 10, sd = 0.5), n, 10) colnames(Y_matrix) <- paste0("Y_", 1:50) # Stability selection for O2PLS ## Not run: stability_o2pls <- stability_selection( X = X, Y = Y_matrix, nc = 2, keepX_range = seq(15, 45, 15), n_bootstrap = 100, threshold = 0.8, method = "sparse_o2pls", parallel = TRUE, ncores = 2 ) # Extract stable variables for both X and Y stable_X <- stability_o2pls$stable_variables$Component_1$Variable_Name stable_Y <- stability_o2pls$stable_variables$Component_1$Variable_Name ## End(Not run) # Example 3: Different stability thresholds # Compare conservative vs liberal thresholds ## Not run: # Conservative selection (80% threshold) stability_conservative <- stability_selection( X, Y, nc = 2, threshold = 0.8, n_bootstrap = 100 ) # Liberal selection (60% threshold) stability_liberal <- stability_selection( X, Y, nc = 2, threshold = 0.6, n_bootstrap = 100 ) # Compare number of stable variables n_stable_cons <- length(stability_conservative$stable_variables$Component_1$Variable) n_stable_lib <- length(stability_liberal$stable_variables$Component_1$Variable) cat("Conservative (80%):", n_stable_cons, "stable variables\n") cat("Liberal (60%):", n_stable_lib, "stable variables\n") ## End(Not run)
Summary of an O2PLS object
## S3 method for class 'O2pls' summary(object, ...)## S3 method for class 'O2pls' summary(object, ...)
object |
a O2pls object |
... |
For consistency |
Detail of O2PLS results
Kai Guo
X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) object <- o2pls(X,Y,1,1,1) summary(object)X <- matrix(rnorm(50),10,5) Y <- matrix(rnorm(50),10,5) object <- o2pls(X,Y,1,1,1) summary(object)
Summary of an plsda object
## S3 method for class 'plsda' summary(object, ...)## S3 method for class 'plsda' summary(object, ...)
object |
a plsda object |
... |
For consistency |
Detail of plsda results
Kai Guo
X <- matrix(rnorm(500),10,50) Y <- rep(c("a","b"),each=5) fit <- plsda(X,Y,2) summary(fit)X <- matrix(rnorm(500),10,50) Y <- rep(c("a","b"),each=5) fit <- plsda(X,Y,2) summary(fit)
Summary of sparse PLS-DA results
## S3 method for class 'sparse_plsda' summary(object, ...)## S3 method for class 'sparse_plsda' summary(object, ...)
object |
A sparse_plsda object to summarize |
... |
Additional arguments (currently unused) |
Summary information printed to console
Summary method for sparse O2PLS results
## S3 method for class 'SparseO2pls' summary(object, ...)## S3 method for class 'SparseO2pls' summary(object, ...)
object |
A SparseO2pls object to summarize |
... |
Additional arguments (currently unused) |
Summary information about the sparse O2PLS model
Performs cross-validation to tune keepX parameters for sparse PLS-DA
tune_sparse_keepX( X, Y, nc, test.keepX = seq(5, 50, 5), validation = "Mfold", folds = 5, dist = "max.dist", nrepeat = 10 )tune_sparse_keepX( X, Y, nc, test.keepX = seq(5, 50, 5), validation = "Mfold", folds = 5, dist = "max.dist", nrepeat = 10 )
X |
Numeric matrix (samples x variables) |
Y |
Factor or character vector of class labels |
nc |
Number of components |
test.keepX |
Vector of keepX values to test |
validation |
Validation method ("Mfold" or "loo") |
folds |
Number of cross-validation folds |
dist |
Distance metric for classification |
nrepeat |
Number of repetitions for robust estimation |
List with optimal keepX and CV results
Performs cross-validation to evaluate sparse O2PLS models
tune_sparse_o2pls( X, Y, nc_range = 1:3, nx_range = 0:2, ny_range = 0:2, keepX_range = seq(10, 50, 10), keepY_range = seq(10, 50, 10), validation = "Mfold", folds = 5, measure = "RMSE" )tune_sparse_o2pls( X, Y, nc_range = 1:3, nx_range = 0:2, ny_range = 0:2, keepX_range = seq(10, 50, 10), keepY_range = seq(10, 50, 10), validation = "Mfold", folds = 5, measure = "RMSE" )
X |
Numeric matrix (samples x variables) |
Y |
Numeric matrix (samples x variables) |
nc_range |
Range of joint components to test |
nx_range |
Range of X-orthogonal components to test |
ny_range |
Range of Y-orthogonal components to test |
keepX_range |
Range of keepX values to test |
keepY_range |
Range of keepY values to test |
validation |
Validation method |
folds |
Number of folds |
measure |
Performance measure |
TuneResult object with optimal parameters
Extract the VIP values from the O2PLS-DA object
vip(x)vip(x)
x |
the o2plsda object or plsda object |
a data frame