Title: | Do 16s Data Analysis and Generate Figures |
---|---|
Description: | Provides functions to enhance the available statistical analysis procedures in R by providing simple functions to analysis and visualize the 16S rRNA data.Here we present a tutorial with minimum working examples to demonstrate usage and dependencies. |
Authors: | Kai Guo [aut, cre], Pan Gao [aut] |
Maintainer: | Kai Guo <[email protected]> |
License: | GPL-3 |
Version: | 0.0.22 |
Built: | 2025-02-06 05:44:36 UTC |
Source: | https://github.com/guokai8/microbial |
check file format
.checkfile(file)
.checkfile(file)
file |
filename |
replace p value with star
.getstar(x)
.getstar(x)
x |
a (non-empty) numeric data values |
LEfse function
.lda.fun(df)
.lda.fun(df)
df |
a dataframe with groups and bacteria abundance |
calcaute beta diversity
betadiv(physeq, distance = "bray", method = "PCoA")
betadiv(physeq, distance = "bray", method = "PCoA")
physeq |
A |
distance |
A string character specifying dissimilarity index to be used in calculating pairwise distances (Default index is "bray".). "unifrac","wunifrac","manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup" , "binomial", "chao", "cao" or "mahalanobis". |
method |
A character string specifying ordination method. All methods available to the |
list with beta diversity data.frame and PCs
Kai Guo
{ data("Physeq") phy<-normalize(physeq) res <- betadiv(phy) }
{ data("Physeq") phy<-normalize(physeq) res <- betadiv(phy) }
PERMANOVA test for phyloseq
betatest(physeq, group, distance = "bray")
betatest(physeq, group, distance = "bray")
physeq |
A |
group |
(Required). Character string specifying name of a categorical variable that is preferred for grouping the information. information. |
distance |
A string character specifying dissimilarity index to be used in calculating pairwise distances (Default index is "bray".). "unifrac","wunifrac","manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup" , "binomial", "chao", "cao" or "mahalanobis". |
PERMANOVA test result
Kai Guo
{ data("Physeq") phy<-normalize(physeq) beta <-betatest(phy,group="SampleType") }
{ data("Physeq") phy<-normalize(physeq) beta <-betatest(phy,group="SampleType") }
Identify biomarker by using randomForest method
biomarker( physeq, group, ntree = 500, pvalue = 0.05, normalize = TRUE, method = "relative" )
biomarker( physeq, group, ntree = 500, pvalue = 0.05, normalize = TRUE, method = "relative" )
physeq |
A |
group |
group. A character string specifying the name of a categorical variable containing grouping information. |
ntree |
Number of trees to grow. This should not be set to too small a number, to ensure that every input row gets predicted at least a few times. |
pvalue |
pvalue threshold for significant results from kruskal.test |
normalize |
to normalize the data before analysis(TRUE/FALSE) |
method |
A list of character strings specifying |
data frame with significant biomarker
Kai Guo
data("Physeq") res <- biomarker(physeq,group="group")
data("Physeq") res <- biomarker(physeq,group="group")
contruction of plylogenetic tree (extreme slow)
buildTree(seqs)
buildTree(seqs)
seqs |
DNA sequences |
tree object
Kai Guo
Published in PNAS in early 2011. This work compared the microbial communities from 25 environmental samples and three known “mock communities” – a total of 9 sample types – at a depth averaging 3.1 million reads per sample. Authors were able to reproduce diversity patterns seen in many other published studies, while also invesitigating technical issues/bias by applying the same techniques to simulated microbial communities of known
Caporaso, J. G., et al. (2011). Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. PNAS, 108, 4516-4522. PMCID: PMC3063599
data(Physeq)
data(Physeq)
Published in PNAS in early 2011. This work compared the microbial communities from 25 environmental samples and three known “mock communities” – a total of 9 sample types – at a depth averaging 3.1 million reads per sample. Authors were able to reproduce diversity patterns seen in many other published studies, while also invesitigating technical issues/bias by applying the same techniques to simulated microbial communities of known
Caporaso, J. G., et al. (2011). Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. PNAS, 108, 4516-4522. PMCID: PMC3063599
data(Physeq) plot_richness(physeq, x="SampleType", measures=c("Observed", "Shannon"))
data(Physeq) plot_richness(physeq, x="SampleType", measures=c("Observed", "Shannon"))
Calculate differential bacteria with DESeq2
difftest( physeq, group, ref = NULL, pvalue = 0.05, padj = NULL, log2FC = 0, gm_mean = TRUE, fitType = "local", quiet = FALSE )
difftest( physeq, group, ref = NULL, pvalue = 0.05, padj = NULL, log2FC = 0, gm_mean = TRUE, fitType = "local", quiet = FALSE )
physeq |
A |
group |
group (DESeq2). A character string specifying the name of a categorical variable containing grouping information. |
ref |
reference group |
pvalue |
pvalue threshold for significant results |
padj |
adjust p value threshold for significant results |
log2FC |
log2 Fold Change threshold |
gm_mean |
TRUE/FALSE calculate geometric means prior to estimate size factors |
fitType |
either "parametric", "local", or "mean" for the type of fitting of dispersions to the mean intensity. |
quiet |
whether to print messages at each step |
datafame with differential test with DESeq2
Kai Guo
data("Physeq") res <- difftest(physeq,group="group")
data("Physeq") res <- difftest(physeq,group="group")
distinguish colors for making figures
distcolor
distcolor
An object of class character
of length 41.
Kai Guo
do anova test and return results as data.frame
do_aov(x, group, ...)
do_aov(x, group, ...)
x |
data.frame with sample id as the column name, genes or otu as rownames |
group |
group factor used for comparison |
... |
parameters to anova_test |
Kai Guo
{ data("ToothGrowth") do_aov(ToothGrowth,group="supp") }
{ data("ToothGrowth") do_aov(ToothGrowth,group="supp") }
do t.test
do_ttest(x, group, ref = NULL, ...)
do_ttest(x, group, ref = NULL, ...)
x |
data.frame with sample id as the column name, genes or otu as rownames |
group |
group factor used for comparison |
ref |
reference group |
... |
parameters to t_test |
Kai Guo
{ data("mtcars") do_ttest(mtcars,group="vs") do_ttest(mtcars,group="cyl",ref="4") }
{ data("mtcars") do_ttest(mtcars,group="vs") do_ttest(mtcars,group="cyl",ref="4") }
do wilcox test
do_wilcox(x, group, ref = NULL, ...)
do_wilcox(x, group, ref = NULL, ...)
x |
data.frame with sample id as the column name, genes or otu as rownames |
group |
group factor used for comparison |
ref |
reference group |
... |
parameters to wilcox_test |
Kai Guo
{ data("mtcars") do_wilcox(mtcars,group="vs") do_wilcox(mtcars,group="cyl",ref="4") }
{ data("mtcars") do_wilcox(mtcars,group="vs") do_wilcox(mtcars,group="cyl",ref="4") }
Do the generalized linear model regression
glmr( physeq, group, factors = NULL, ref = NULL, family = binomial(link = "logit") )
glmr( physeq, group, factors = NULL, ref = NULL, family = binomial(link = "logit") )
physeq |
phyloseq object |
group |
the group factor to regression |
factors |
a vector to indicate adjuested factors |
ref |
the reference group |
family |
binomial() or gaussian() |
Kai Guo
data("Physeq") phy<-normalize(physeq) fit <-glmr(phy,group="SampleType")
data("Physeq") phy<-normalize(physeq) fit <-glmr(phy,group="SampleType")
Identify biomarker by using LEfSe method
ldamarker(physeq, group, pvalue = 0.05, normalize = TRUE, method = "relative")
ldamarker(physeq, group, pvalue = 0.05, normalize = TRUE, method = "relative")
physeq |
A |
group |
group. A character string specifying the name of a categorical variable containing grouping information. |
pvalue |
pvalue threshold for significant results from kruskal.test |
normalize |
to normalize the data before analysis(TRUE/FALSE) |
method |
A list of character strings specifying |
Kai Guo
data("Physeq") res <- ldamarker(physeq,group="group")
data("Physeq") res <- ldamarker(physeq,group="group")
light colors for making figures
lightcolor
lightcolor
An object of class character
of length 56.
Kai Guo
Normalize the phyloseq object with different methods
normalize(physeq, group, method = "relative", table = FALSE)
normalize(physeq, group, method = "relative", table = FALSE)
physeq |
A |
group |
group (DESeq2). A character string specifying the name of a categorical variable containing grouping information. |
method |
A list of character strings specifying |
table |
return a data.frame or not |
phyloseq object with normalized data
Kai Guo
{ data("Physeq") phy<-normalize(physeq) }
{ data("Physeq") phy<-normalize(physeq) }
extract otu table
otu_table(physeq, ...)
otu_table(physeq, ...)
physeq |
(Required). An integer matrix, otu_table-class, or phyloseq-class. |
... |
parameters for the otu_table function in phyloseq package |
Retrieve phylogenetic tree (phylo-class) from object.
phy_tree(physeq, ...)
phy_tree(physeq, ...)
physeq |
(Required). An instance of phyloseq-class that contains a phylogenetic tree. If physeq is a phylogenetic tree (a component data class), then it is returned as-is. |
... |
parameters for the phy_tree function in phyloseq package |
plot alpha diversity
plotalpha( physeq, group, method = c("Observed", "Simpson", "Shannon"), color = NULL, geom = "boxplot", pvalue = 0.05, padj = NULL, sig.only = TRUE, wilcox = FALSE, show.number = FALSE )
plotalpha( physeq, group, method = c("Observed", "Simpson", "Shannon"), color = NULL, geom = "boxplot", pvalue = 0.05, padj = NULL, sig.only = TRUE, wilcox = FALSE, show.number = FALSE )
physeq |
A |
group |
group (Required). A character string specifying the name of a categorical variable containing grouping information. |
method |
A list of character strings specifying |
color |
A vector of character use specifying the color |
geom |
different geom to display("boxplot","violin","dotplot") |
pvalue |
pvalue threshold for significant dispersion results |
padj |
adjust p value threshold for significant dispersion results |
sig.only |
display the significant comparsion only(TRUE/ FALSE) |
wilcox |
use wilcoxon test or not |
show.number |
to show the pvalue instead of significant symbol(TRUE/FALSE) |
Returns a ggplot object. This can further be manipulated as preferred by user.
Kai Guo
{ data("Physeq") plotalpha(physeq,group="SampleType") }
{ data("Physeq") plotalpha(physeq,group="SampleType") }
plot bar for relative abundance for bacteria
plotbar( physeq, level = "Phylum", color = NULL, group = NULL, top = 5, return = FALSE, fontsize.x = 5, fontsize.y = 12 )
plotbar( physeq, level = "Phylum", color = NULL, group = NULL, top = 5, return = FALSE, fontsize.x = 5, fontsize.y = 12 )
physeq |
A |
level |
the level to plot |
color |
A vector of character use specifying the color |
group |
group (Optional). A character string specifying the name of a categorical variable containing grouping information. |
top |
the number of most abundance bacteria to display |
return |
return the data with the relative abundance |
fontsize.x |
the size of x axis label |
fontsize.y |
the size of y axis label |
Returns a ggplot object. This can further be manipulated as preferred by user.
Kai Guo
data("Physeq") phy<-normalize(physeq) plotbar(phy,level="Phylum")
data("Physeq") phy<-normalize(physeq) plotbar(phy,level="Phylum")
plot beta diversity
plotbeta( physeq, group, shape = NULL, distance = "bray", method = "PCoA", color = NULL, size = 3, ellipse = FALSE )
plotbeta( physeq, group, shape = NULL, distance = "bray", method = "PCoA", color = NULL, size = 3, ellipse = FALSE )
physeq |
A |
group |
(Required). Character string specifying name of a categorical variable that is preferred for grouping the information. information. |
shape |
shape(Optional) Character string specifying shape of a categorical variable |
distance |
A string character specifying dissimilarity index to be used in calculating pairwise distances (Default index is "bray".). "unifrac","wunifrac","manhattan", "euclidean", "canberra", "bray", "kulczynski", "jaccard", "gower", "altGower", "morisita", "horn", "mountford", "raup" , "binomial", "chao", "cao" or "mahalanobis". |
method |
A character string specifying ordination method. All methods available to the |
color |
user defined color for group |
size |
the point size |
ellipse |
draw ellipse or not |
ggplot2 object
Kai Guo
{ data("Physeq") phy<-normalize(physeq) plotbeta(phy,group="SampleType") }
{ data("Physeq") phy<-normalize(physeq) plotbeta(phy,group="SampleType") }
plot differential results
plotdiff( res, level = "Genus", color = NULL, pvalue = 0.05, padj = NULL, log2FC = 0, size = 3, fontsize.x = 5, fontsize.y = 10, horiz = TRUE )
plotdiff( res, level = "Genus", color = NULL, pvalue = 0.05, padj = NULL, log2FC = 0, size = 3, fontsize.x = 5, fontsize.y = 10, horiz = TRUE )
res |
differential test results from diff_test |
level |
the level to plot |
color |
A vector of character use specifying the color |
pvalue |
pvalue threshold for significant results |
padj |
adjust p value threshold for significant results |
log2FC |
log2 Fold Change threshold |
size |
size for the point |
fontsize.x |
the size of x axis label |
fontsize.y |
the size of y axis label |
horiz |
horizontal or not (TRUE/FALSE) |
ggplot object
Kai Guo
data("Physeq") res <- difftest(physeq,group="group") plotdiff(res,level="Genus",padj=0.001)
data("Physeq") res <- difftest(physeq,group="group") plotdiff(res,level="Genus",padj=0.001)
plot LEfSe results from ldamarker function
plotLDA( x, group, lda = 2, pvalue = 0.05, padj = NULL, color = NULL, fontsize.x = 4, fontsize.y = 5 )
plotLDA( x, group, lda = 2, pvalue = 0.05, padj = NULL, color = NULL, fontsize.x = 4, fontsize.y = 5 )
x |
LEfse results from ldamarker |
group |
a vector include two character to show the group comparsion |
lda |
LDA threshold for significant biomarker |
pvalue |
pvalue threshold for significant results |
padj |
adjust p value threshold for significant results |
color |
A vector of character use specifying the color |
fontsize.x |
the size of x axis label |
fontsize.y |
the size of y axis label |
ggplot2 object
Kai Guo
data("Physeq") res <- ldamarker(physeq,group="group") plotLDA(res,group=c("A","B"),lda=5,pvalue=0.05)
data("Physeq") res <- ldamarker(physeq,group="group") plotLDA(res,group=c("A","B"),lda=5,pvalue=0.05)
plot the biomarker from the biomarker function with randomForest
plotmarker( x, level = "Genus", top = 30, rotate = FALSE, dot.size = 8, label.color = "black", label.size = 6 )
plotmarker( x, level = "Genus", top = 30, rotate = FALSE, dot.size = 8, label.color = "black", label.size = 6 )
x |
biomarker results from randomForest |
level |
the bacteria level to display |
top |
the number of important biomarker to draw |
rotate |
TRUE/FALSE |
dot.size |
size for the dot |
label.color |
label color |
label.size |
label size |
ggplot2 object
Kai Guo
data("Physeq") res <- biomarker(physeq,group="group") plotmarker(res,level="Genus")
data("Physeq") res <- biomarker(physeq,group="group") plotmarker(res,level="Genus")
plot the quality for the fastq file
plotquality(file, n = 5e+05, aggregate = FALSE)
plotquality(file, n = 5e+05, aggregate = FALSE)
file |
(Required). character. File path(s) to fastq or fastq.gz file(s). |
n |
(Optional). Default 500,000. The number of records to sample from the fastq file. |
aggregate |
(Optional). Default FALSE. If TRUE, compute an aggregate quality profile for all fastq files provided. |
figure
plotquality(system.file("extdata", "sam1F.fastq.gz", package="dada2"))
plotquality(system.file("extdata", "sam1F.fastq.gz", package="dada2"))
filter the phyloseq
prefilter(physeq, min = 10, perc = 0.05)
prefilter(physeq, min = 10, perc = 0.05)
physeq |
A |
min |
Numeric, the threshold for mininal Phylum shown in samples |
perc |
Numeric, input the percentage of samples for which to filter low counts. |
filter phyloseq object
Kai Guo
data("Physeq") physeqs<-prefilter(physeq)
data("Physeq") physeqs<-prefilter(physeq)
Download the reference database
preRef(ref_db, path = ".")
preRef(ref_db, path = ".")
ref_db |
the reference database |
path |
path for the database |
the path of the database
Kai Guo
preRef(ref_db="silva",path=tempdir())
preRef(ref_db="silva",path=tempdir())
Perform dada2 analysis
processSeq( path = ".", truncLen = c(0, 0), trimLeft = 0, trimRight = 0, minLen = 20, maxLen = Inf, sample_info = NULL, train_data = "silva_nr99_v138_train_set.fa.gz", train_species = "silva_species_assignment_v138.fa.gz", outpath = NULL, saveobj = FALSE, buildtree = FALSE, verbose = TRUE )
processSeq( path = ".", truncLen = c(0, 0), trimLeft = 0, trimRight = 0, minLen = 20, maxLen = Inf, sample_info = NULL, train_data = "silva_nr99_v138_train_set.fa.gz", train_species = "silva_species_assignment_v138.fa.gz", outpath = NULL, saveobj = FALSE, buildtree = FALSE, verbose = TRUE )
path |
working dir for the input reads |
truncLen |
(Optional). Default 0 (no truncation). Truncate reads after truncLen bases. Reads shorter than this are discarded. |
trimLeft |
(Optional). The number of nucleotides to remove from the start of each read. |
trimRight |
(Optional). Default 0. The number of nucleotides to remove from the end of each read. If both truncLen and trimRight are provided, truncation will be performed after trimRight is enforced. |
minLen |
(Optional). Default 20. Remove reads with length less than minLen. minLen is enforced after trimming and truncation. |
maxLen |
Optional). Default Inf (no maximum). Remove reads with length greater than maxLen. maxLen is enforced before trimming and truncation. |
sample_info |
(Optional).sample information for the sequence |
train_data |
(Required).training database |
train_species |
(Required). species database |
outpath |
(Optional).the path for the filtered reads and th out table |
saveobj |
(Optional).Default FALSE. save the phyloseq object output. |
buildtree |
build phylogenetic tree or not(default: FALSE) |
verbose |
(Optional). Default TRUE. Print verbose text output. |
list include count table, summary table, taxonomy information and phyloseq object
Kai Guo
Melt phyloseq data object into large data.frame
psmelt(physeq, ...)
psmelt(physeq, ...)
physeq |
A sample_data-class, or a phyloseq-class object with a sample_data. If the sample_data slot is missing in physeq, then physeq will be returned as-is, and a warning will be printed to screen. |
... |
parameters for the subset_samples function in phyloseq package |
calculat the richness for the phyloseq object
richness(physeq, method = c("Observed", "Simpson", "Shannon"))
richness(physeq, method = c("Observed", "Simpson", "Shannon"))
physeq |
A |
method |
A list of character strings specifying |
data.frame of alpha diversity
Kai Guo
{ data("Physeq") rich <-richness(physeq,method=c("Simpson", "Shannon")) }
{ data("Physeq") rich <-richness(physeq,method=c("Simpson", "Shannon")) }
extract sample information
sample_data(physeq, ...)
sample_data(physeq, ...)
physeq |
(Required). A data.frame-class, or a phyloseq-class object. |
... |
parameters for the sample_data function in phyloseq package |
Subset the phyloseq based on sample
subset_samples(physeq, ...)
subset_samples(physeq, ...)
physeq |
A sample_data-class, or a phyloseq-class object with a sample_data. If the sample_data slot is missing in physeq, then physeq will be returned as-is, and a warning will be printed to screen. |
... |
parameters for the subset_samples function in phyloseq package |
Subset species by taxonomic expression
subset_taxa(physeq, ...)
subset_taxa(physeq, ...)
physeq |
A sample_data-class, or a phyloseq-class object with a sample_data. If the sample_data slot is missing in physeq, then physeq will be returned as-is, and a warning will be printed to screen. |
... |
parameters for the subset_taxa function in phyloseq package |
extract taxonomy table
tax_table(physeq, ...)
tax_table(physeq, ...)
physeq |
An object among the set of classes defined by the phyloseq package that contain taxonomyTable. |
... |
parameters for the tax_table function in phyloseq package |