Title: | Metabolite Set Enrichment Analysis for Loadings |
---|---|
Description: | Computing metabolite set enrichment analysis (MSEA) (Yamamoto, H. et al. (2014) <doi:10.1186/1471-2105-15-51>) and single sample enrichment analysis (SSEA) (Yamamoto, H. (2023) <doi:10.51094/jxiv.262>). |
Authors: | Hiroyuki Yamamoto |
Maintainer: | Hiroyuki Yamamoto <[email protected]> |
License: | LGPL-3 |
Version: | 2.0.2 |
Built: | 2024-11-18 23:26:19 UTC |
Source: | https://github.com/hiroyukiyamamoto/mseapca |
This function converts your own metabolite set (csv file to list).
csv2list(filepath)
csv2list(filepath)
filepath |
file path of metabolite set (csv file) |
The first row of csv file are "metabolite set name" and "metabolite IDs" as header. The first column must be metabolite IDs and second column must be metabolite set name.
list of metabolite set name and metabolite IDs
Hiroyuki Yamamoto
## Not run: # --------------------------- # Convert csv file to list # --------------------------- filepath <- "C:/pathway.csv" # filepath of csv file N <- csv2list(filepath) # convert csv file to list ## End(Not run)
## Not run: # --------------------------- # Convert csv file to list # --------------------------- filepath <- "C:/pathway.csv" # filepath of csv file N <- csv2list(filepath) # convert csv file to list ## End(Not run)
This function save compound set of list format as XML file.
list2xml(filepath, M)
list2xml(filepath, M)
filepath |
filepath of XML file to save |
M |
list fomat of compound set and compound names |
This function is used to store a compound set. Saved xml file can be read using the read_pathway function.
filepath of saved XML file
Hiroyuki Yamamoto
## Not run: data(pathway) M <- pathway$fasting xml_file <- "pathway_fasting.xml" N <- list2xml(xml_file, M) # XML::saveXML(N,filepath) ## End(Not run)
## Not run: data(pathway) M <- pathway$fasting xml_file <- "pathway_fasting.xml" N <- list2xml(xml_file, M) # XML::saveXML(N,filepath) ## End(Not run)
This function performs metabolite set enrichment analysis by over representation analysis (ORA). Statistical hypothesis test of cross tabulation is performed by one-sided Fisher's exact test.
msea_ora(SIG, ALL, M)
msea_ora(SIG, ALL, M)
SIG |
Metabolite names of significant metabolites |
ALL |
Metabolite names of all detected metabolites |
M |
list of metabolite set name and metabolite name |
list of p-value and q-value for metabolite set and selected (significant) metabolite IDs for each metabolite set
Hiroyuki Yamamoto
Draghici S, Khatri P, Martins RP, Ostermeier GC, Krawetz SA. Global functional profiling of gene expression. Genomics. 2003 Feb;81(2):98-104.
## Example1 : Metabolome data data(fasting) data(pathway) # pca and pca loading pca <- prcomp(fasting$X, scale=TRUE) pca <- pca_loading(pca) # all detected metabolites metabolites <- colnames(fasting$X) # statistically significant negatively correlated metabolites in PC1 loading SIG <- metabolites[pca$loading$R[,1] < 0 & pca$loading$p.value[,1] < 0.05] ALL <- metabolites #all detected metabolites # metabolite set list M <- pathway$fasting # MSEA by over representation analysis B <- msea_ora(SIG, ALL, M) B$`Result of MSEA(ORA)` ## Example2 : Proteome data data(covid19) data(pathway) X <- covid19$X$proteomics Y <- covid19$Y D <- covid19$D tau <- covid19$tau protein_name <- colnames(X) # pls-rog and pls-rog loading plsrog <- pls_rog(X,Y,D) plsrog <- plsrog_loading(plsrog) # statistically significant proteins index_prot <- which(plsrog$loading$R[,1]>0 & plsrog$loading$p.value[,1]<0.05) sig_prot <- protein_name[index_prot] # detected proteins protein_name <- colnames(X) # protein set list M <- pathway$covid19$proteomics # MSEA by over representation analysis B <- msea_ora(sig_prot, protein_name, M) B$`Result of MSEA(ORA)`
## Example1 : Metabolome data data(fasting) data(pathway) # pca and pca loading pca <- prcomp(fasting$X, scale=TRUE) pca <- pca_loading(pca) # all detected metabolites metabolites <- colnames(fasting$X) # statistically significant negatively correlated metabolites in PC1 loading SIG <- metabolites[pca$loading$R[,1] < 0 & pca$loading$p.value[,1] < 0.05] ALL <- metabolites #all detected metabolites # metabolite set list M <- pathway$fasting # MSEA by over representation analysis B <- msea_ora(SIG, ALL, M) B$`Result of MSEA(ORA)` ## Example2 : Proteome data data(covid19) data(pathway) X <- covid19$X$proteomics Y <- covid19$Y D <- covid19$D tau <- covid19$tau protein_name <- colnames(X) # pls-rog and pls-rog loading plsrog <- pls_rog(X,Y,D) plsrog <- plsrog_loading(plsrog) # statistically significant proteins index_prot <- which(plsrog$loading$R[,1]>0 & plsrog$loading$p.value[,1]<0.05) sig_prot <- protein_name[index_prot] # detected proteins protein_name <- colnames(X) # protein set list M <- pathway$covid19$proteomics # MSEA by over representation analysis B <- msea_ora(sig_prot, protein_name, M) B$`Result of MSEA(ORA)`
This function performs metabolite set enrichment analysis implemented in the same fashion as gene set enrichment analysis (Subramanian et al. 2005). In this function, a permutation procedure is performed for a metabolite set rather than class label. This procedure corresponds to a "gene set" of permutation type in GSEA-P software (Subramanian et al. 2007). A leading-edge subset analysis is also undertaken following the standard GSEA procedure.
msea_sub(M, D, y, maxiter = 1000)
msea_sub(M, D, y, maxiter = 1000)
M |
list of metbolite set name and metabolite IDs |
D |
data.frame(metabolite ID, data matix) |
y |
response variable (e.g. PC score) |
maxiter |
maximum number of iterations in random permutation (default=1000) |
list of normalized enrichment score, p-value and q-value for metabolite set, and the results of leading edge subset
Hiroyuki Yamamoto
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. & Mesirov, J. P. (2005) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545-15550.
Subramanian, A., Kuehn, H., Gould, J., Tamayo, P., Mesirov, J.P. (2007) GSEA-P: A desktop application for Gene Set Enrichment Analysis. Bioinformatics, doi: 10.1093/bioinformatics/btm369.
data(fasting) data(pathway) # pca and pca loading pca <- prcomp(fasting$X, scale=TRUE) pca <- pca_loading(pca) # all detected metabolites metabolites <- colnames(fasting$X) # statistically significant negatively correlated metabolites in PC1 loading SIG <- metabolites[pca$loading$R[,1] < 0 & pca$loading$p.value[,1] < 0.05] ALL <- metabolites #all detected metabolites # Set response variable y <- pca$x[,1] # preparing dataframe D <- data.frame(ALL,t(fasting$X)) # preparing dataframe # MSEA by Subramanian et al. M <- pathway$fasting P <- msea_sub(M,D,y, maxiter = 10) # iteration was set ato 10 for demonstration
data(fasting) data(pathway) # pca and pca loading pca <- prcomp(fasting$X, scale=TRUE) pca <- pca_loading(pca) # all detected metabolites metabolites <- colnames(fasting$X) # statistically significant negatively correlated metabolites in PC1 loading SIG <- metabolites[pca$loading$R[,1] < 0 & pca$loading$p.value[,1] < 0.05] ALL <- metabolites #all detected metabolites # Set response variable y <- pca$x[,1] # preparing dataframe D <- data.frame(ALL,t(fasting$X)) # preparing dataframe # MSEA by Subramanian et al. M <- pathway$fasting P <- msea_sub(M,D,y, maxiter = 10) # iteration was set ato 10 for demonstration
This function generates metabolite set list of PathBank database by referencing the AHPathbankDbs Bioconductor package.
pathbank2list(tbl_pathbank, subject, id)
pathbank2list(tbl_pathbank, subject, id)
tbl_pathbank |
tibble from AHPathbankDbs |
subject |
Pathway subject (Metabolic, Disease, etc.) in tibble |
id |
database ID (HMDB ID, Uniprot ID, etc.) used for analysis |
AHPathbankDbs needs to be installed separately.
list of metabolite or protein set
Hiroyuki Yamamoto
## Not run: ## PathBank library(AnnotationHub) ah <- AnnotationHub() qr <- query(ah, c("pathbank", "Homo sapiens")) #tbl_pathbank <- qr[[1]] # metabolomics tbl_pathbank <- qr[[2]] # proteomics ids <- names(tbl_pathbank)[-c(1:4)] id <- ids[1] # Uniprot ID subs <- unique(tbl_pathbank$`Pathway Subject`) subject <- subs[6] # Protein M <- pathbank2list(tbl_pathbank, subject, id) ## End(Not run)
## Not run: ## PathBank library(AnnotationHub) ah <- AnnotationHub() qr <- query(ah, c("pathbank", "Homo sapiens")) #tbl_pathbank <- qr[[1]] # metabolomics tbl_pathbank <- qr[[2]] # proteomics ids <- names(tbl_pathbank)[-c(1:4)] id <- ids[1] # Uniprot ID subs <- unique(tbl_pathbank$`Pathway Subject`) subject <- subs[6] # Protein M <- pathbank2list(tbl_pathbank, subject, id) ## End(Not run)
This data includes a metabolite set list and metabolite name list for fasting, and a metabolite set list for covid19 dataset within the "loadings" package
data(pathway)
data(pathway)
The list object pathway contains the following elements:
fasting : metabolite set list for fasting mouse dataset
data$fasting : metabolite name list for fasting mouse dataset
covid19$proteomics : protein set list for covid19 dataset.
Yamamoto H., Fujimori T., Sato H., Ishikawa G., Kami K., Ohashi Y. (2014). "Statistical hypothesis testing of factor loading in principal component analysis and its application to metabolite set enrichment analysis". BMC Bioinformatics, (2014) 15(1):51.
B. Shen, et al, Proteomic and Metabolomic Characterization of COVID-19 Patient Sera, Cell. 182 (2020) 59-72.e15.
data(pathway)
data(pathway)
This function generates metabolite set list from metabolite set file (XML). This is mainly used to be called by other functions.
read_pathway(fullpath)
read_pathway(fullpath)
fullpath |
file path of metabolite set (XML) |
list of metabolite set name and metabolite IDs.
Hiroyuki Yamamoto
## Not run: filename <- "C:/R/pathway.xml" # load metabolite set file M <- read_pathway(filename) # Convert XML to metabolite set (list) ## End(Not run)
## Not run: filename <- "C:/R/pathway.xml" # load metabolite set file M <- read_pathway(filename) # Convert XML to metabolite set (list) ## End(Not run)
This function generates binary label matrix of metabolite names and metabolite sets. This is mainly used to be called by other functions, and used to count the number of metabolites in a specific metabolite set.
setlabel(M_ID, M)
setlabel(M_ID, M)
M_ID |
detected metabolites |
M |
list of metabolite set and metabolite names |
If single peak has multiple metabolite IDs in M_ID, split by "," or ";".
binary label matrix of metabolite names in metabolite sets
Hiroyuki Yamamoto
data(fasting) data(pathway) M_ID <- colnames(fasting$X) # detected metabolites M <- pathway$fasting # metabolite set list L <- setlabel(M_ID,M) # binary label matrix
data(fasting) data(pathway) M_ID <- colnames(fasting$X) # detected metabolites M <- pathway$fasting # metabolite set list L <- setlabel(M_ID,M) # binary label matrix
This function performs single sample enrichment analysis (SSEA) by over representation analysis (ORA). SSEA performs MSEA by ORA between detected and not detected metabolites in each sample."
ssea_ora(det_list, det_all, M)
ssea_ora(det_list, det_all, M)
det_list |
metabolite names of detected metabolites |
det_all |
metabolite names of all metabolites |
M |
list of metabolite set and metabolite names |
The threshold for determining whether a metabolite is detected or not is typically set by the signal-to-noise (S/N) ratio. If the S/N ratio is unavailable, one might consider using the signal intensity or peak area for each metabolite as an alternative. In such cases, all values below the threshold can be set to 0.
A matrix where each row represents a sample and each column represents a set of metabolites.
Hiroyuki Yamamoto
Yamamoto H., Single sample enrichment analysisfor mass spectrometry-based omics data, Jxiv.(2023)
## Not run: data(fasting) data(pathway) det_list <- pathway$data$fasting M <- pathway$fasting det_all <- unique(c(colnames(fasting$X), as.character(unlist(M)))) # SSEA Z <- ssea_ora(det_list, det_all, M) ## PCA for SSEA score pca <- prcomp(Z, scale=TRUE) pca <- pca_loading(pca) ## End(Not run)
## Not run: data(fasting) data(pathway) det_list <- pathway$data$fasting M <- pathway$fasting det_all <- unique(c(colnames(fasting$X), as.character(unlist(M)))) # SSEA Z <- ssea_ora(det_list, det_all, M) ## PCA for SSEA score pca <- prcomp(Z, scale=TRUE) pca <- pca_loading(pca) ## End(Not run)