Seqwin is a lightning‑fast, memory‑efficient toolkit for discovering signature sequences (genomic markers) that balance high sensitivity with high specificity. It builds a minimizer‑based pan‑genome graph across target and neighboring non‑target genomes and extracts signature sequences using a novel graph algorithm. Signatures can be used for downstream assay design such as qPCR, ddPCR, amplicon sequencing, and hybrid capture probes.
Seqwin computes minimizers with ntHash, using code adopted from btllib (licensed under the GNU General Public License v3.0).
See the Seqwin Wiki for full documentation.
Seqwin can be installed from Bioconda or PyPI.
- Bioconda is the recommended installation method because it installs Seqwin with all dependencies, but it requires Conda and supports only Linux and macOS.
- PyPI (
pip install seqwin) supports Windows (x64), Linux, and macOS, but installs only Seqwin and its Python dependencies. Non-Python dependencies can be installed separately if needed.
Works on Linux (x64 / arm64) and macOS (Intel / Apple Silicon).
If Conda is not installed, install it with Miniforge or Miniconda.
1. Create a new Conda environment named seqwin and install Seqwin via Bioconda
conda create -n seqwin seqwin \
--channel conda-forge \
--channel bioconda \
--strict-channel-priority2. Activate the environment and verify the install
conda activate seqwin
seqwin --helpWorks on Windows (x64), Linux (x64 / arm64), and macOS (Intel / Apple Silicon). Requires Python >= 3.10.
1. Install Seqwin from PyPI
python -m pip install --upgrade pip
python -m pip install --prefer-binary seqwin
seqwin --help2. Install non-Python dependencies (optional)
Seqwin can run without these tools, but some features will be unavailable or skipped. See the Command Line Parameters for details.
- Mash (minimizer sketches are used if it is not installed)
- NCBI BLAST+ (needed for signature evaluation)
- NCBI Datasets CLI (needed for downloading NCBI genomes)
Identify signatures by providing one or more target taxa (-t) and neighboring non-target taxa (-n).
seqwin \
-t "Salmonella enterica subsp. diarizonae" \
-n "Salmonella enterica subsp. salamae" \
-n "Salmonella bongori" \
--threads 8Taxa names must be exact matches to NCBI Taxonomy. Genomes under each taxon will be downloaded automatically.
Outputs are written to seqwin-out/ in your working directory (see Description of Outputs).
Alternatively, a list of target or non-target genomes can be provided as a text file of file paths. Each line should be the path to a genome FASTA file (plain text or gzipped).
seqwin --tar-paths targets.txt --neg-paths non-targets.txtExamples can be found under test/. Use the test script to download and run the test dataset.
git clone https://github.com/treangenlab/Seqwin.git
cd Seqwin/test/
python run_test.pyExpected runtime (with --threads 8 or -p 8):
- ~5 min and 2.5 GB peak RAM for ~500 bacterial genomes with default settings.
- ~5 min and 23 GB peak RAM for ~15k bacterial genomes with
--no-blastand--no-mash.
Run seqwin --help or seqwin -h to see the full command line interface.
If you use Seqwin in your research, please cite:
Michael X. Wang, Bryce Kille, Michael G. Nute, Siyi Zhou, Lauren B. Stadler, and Todd J. Treangen "Seqwin: Ultrafast identification of signature sequences in microbial genomes". Proceedings of ISMB 2026, accepted (2026).
Benchmarking datasets, outputs, and scripts are available on Zenodo.