Skip to content

treangenlab/Seqwin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

96 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

install with bioconda conda downloads pypi version supported platforms Build and Test

Seqwin

Seqwin is a lightning‑fast, memory‑efficient toolkit for discovering signature sequences (genomic markers) that balance high sensitivity with high specificity. It builds a minimizer‑based pan‑genome graph across target and neighboring non‑target genomes and extracts signature sequences using a novel graph algorithm. Signatures can be used for downstream assay design such as qPCR, ddPCR, amplicon sequencing, and hybrid capture probes.

Seqwin computes minimizers with ntHash, using code adopted from btllib (licensed under the GNU General Public License v3.0).


Table of contents

  1. Installation
  2. Quick start
  3. Citation

See the Seqwin Wiki for full documentation.

Installation

Seqwin can be installed from Bioconda or PyPI.

  • Bioconda is the recommended installation method because it installs Seqwin with all dependencies, but it requires Conda and supports only Linux and macOS.
  • PyPI (pip install seqwin) supports Windows (x64), Linux, and macOS, but installs only Seqwin and its Python dependencies. Non-Python dependencies can be installed separately if needed.

Bioconda (recommended)

Works on Linux (x64 / arm64) and macOS (Intel / Apple Silicon).

If Conda is not installed, install it with Miniforge or Miniconda.

1. Create a new Conda environment named seqwin and install Seqwin via Bioconda

conda create -n seqwin seqwin \
  --channel conda-forge \
  --channel bioconda \
  --strict-channel-priority

2. Activate the environment and verify the install

conda activate seqwin
seqwin --help

PyPI

Works on Windows (x64), Linux (x64 / arm64), and macOS (Intel / Apple Silicon). Requires Python >= 3.10.

1. Install Seqwin from PyPI

python -m pip install --upgrade pip
python -m pip install --prefer-binary seqwin
seqwin --help

2. Install non-Python dependencies (optional)
Seqwin can run without these tools, but some features will be unavailable or skipped. See the Command Line Parameters for details.

  • Mash (minimizer sketches are used if it is not installed)
  • NCBI BLAST+ (needed for signature evaluation)
  • NCBI Datasets CLI (needed for downloading NCBI genomes)

Quick start

Identify signatures by providing one or more target taxa (-t) and neighboring non-target taxa (-n).

seqwin \
  -t "Salmonella enterica subsp. diarizonae" \
  -n "Salmonella enterica subsp. salamae" \
  -n "Salmonella bongori" \
  --threads 8

Taxa names must be exact matches to NCBI Taxonomy. Genomes under each taxon will be downloaded automatically.

Outputs are written to seqwin-out/ in your working directory (see Description of Outputs).

Alternatively, a list of target or non-target genomes can be provided as a text file of file paths. Each line should be the path to a genome FASTA file (plain text or gzipped).

seqwin --tar-paths targets.txt --neg-paths non-targets.txt

Examples can be found under test/. Use the test script to download and run the test dataset.

git clone https://github.com/treangenlab/Seqwin.git
cd Seqwin/test/
python run_test.py

Expected runtime (with --threads 8 or -p 8):

  • ~5 min and 2.5 GB peak RAM for ~500 bacterial genomes with default settings.
  • ~5 min and 23 GB peak RAM for ~15k bacterial genomes with --no-blast and --no-mash.

Run seqwin --help or seqwin -h to see the full command line interface.

Citation

If you use Seqwin in your research, please cite:

Michael X. Wang, Bryce Kille, Michael G. Nute, Siyi Zhou, Lauren B. Stadler, and Todd J. Treangen "Seqwin: Ultrafast identification of signature sequences in microbial genomes". Proceedings of ISMB 2026, accepted (2026).

Benchmarking datasets, outputs, and scripts are available on Zenodo.