This script identifies and visualizes the core microbiome of two spider species:
- PRD – Pardosa lugubris
- PTSD – Parasteatoda tepidariorum
The analysis compares bacterial taxa detected in three sample compartments:
- ENV – environmental samples
- SILK – silk-associated samples
- EGGS – egg-associated samples
Core taxa are identified separately for each species-compartment combination and then compared using Venn diagrams. The script also exports supplementary tables listing shared and unique taxa for each comparison.
The goal of this script is to detect bacterial families or genera that are:
- unique to a specific compartment,
- shared between selected compartments,
- shared across the full ENV → SILK → EGGS gradient,
- comparable between the two spider species.
This provides a simple and publication-friendly way to explore overlap patterns within the core microbiome.
The script requires three tab-separated input files:
Path:
01_Metadata/metadata_plik.tsv
Required columns:
sample-idgroup— species label (PRD/PTSD)type— compartment label (ENV/SILK/EGGS)
Path:
06_Exports/deseq2_input/feature-table.tsv
This should be a QIIME2-exported feature table containing ASV counts per sample (mitochondria + chloroplasts filtered out).
Path:
06_Exports/deseq2_input/taxonomy_export/taxonomy.tsv
This file should contain taxonomy assignments for each feature (ASV), exported from QIIME2.
The workflow consists of the following main steps:
-
Load input data
- metadata
- feature table
- taxonomy table
-
Extract taxonomy
- family names are extracted from taxonomy strings
- genus names are extracted from taxonomy strings
-
Match valid samples
- only samples present in both metadata and feature table are retained
-
Aggregate ASV counts
- counts are summed to the selected taxonomic level:
familygenus
- counts are summed to the selected taxonomic level:
-
Define core microbiome
- a taxon is considered part of the core microbiome if it:
- has more than
10reads in a sample, - is present in at least
66%of samples within a given group
- has more than
- a taxon is considered part of the core microbiome if it:
-
Filter taxonomy labels
- ambiguous or non-informative labels are removed, such as:
UnassignedUnknownuncultured- unresolved subgroup-like labels
- overly broad taxonomic placeholders
- ambiguous or non-informative labels are removed, such as:
-
Generate Venn diagrams
- two-set and three-set comparisons are produced
-
Save outputs
- plots in multiple formats
- CSV tables with shared and unique taxa
The default thresholds used in this script are:
- Count threshold:
> 10reads per sample - Prevalence threshold:
>= 66%of samples within a group
These values can be modified in the Settings section of the script.
The script creates the following output directories:
plots/— graphical outputssupplementary_tables/— summary and overlap tablesraw_sets/— exported raw taxon sets
Each Venn diagram is saved as:
.png.pdf.svg
For each comparison, the script exports CSV files containing:
- taxa shared between all sets
- taxa shared between selected pairs only
- taxa unique to each set
- summary counts for all Venn regions
The most important user-defined parameters are:
TAX_LEVEL <- "family" # "genus" or "family"
COUNT_THRESHOLD <- 10 PREVALENCE_THRESHOLD <- 0.66
LEVEL_PRD <- "PRD" LEVEL_PTSD <- "PTSD" LEVEL_ENV <- "ENV" LEVEL_SILK <- "SILK" LEVEL_EGGS <- "EGGS"
SPECIES_COL <- "group" COMPARTMENT_COL <- "type" SAMPLE_ID_COL <- "sample-id"
dplyr tibble ggplot2 readr tidyr stringr ggVennDiagram patchwork
This script is suitable for questions such as:
- Which bacterial taxa are shared between environment, silk and eggs?
- Which taxa are unique to the environment, silk and eggs?
- Which taxa persist across the full ENV → SILK → EGGS transition?
- Are the same core taxa present in both spider species?
The Venn diagrams generated by this script represent presence/overlap of core taxa, not differential abundance. They are useful for identifying:
- stable taxa consistently detected in a compartment,
- compartment-specific taxa,
- taxa potentially associated with transfer across sample types.
Because the analysis is based on core membership thresholds, it complements abundance-based approaches such as heatmaps, differential abundance testing, or compositional analyses.
The script works on QIIME2-exported feature and taxonomy tables. Taxa are filtered before plotting to improve biological interpretability. The resulting diagrams are intended for descriptive and comparative analysis of core microbiome overlap.
Author Mateusz Glenszczyk