Autogrammer

Autogrammer constrains the output of language models (LLMs) to generate syntactically valid JSON or SQL.

By leveraging grammars, Autogrammer ensures that an LLM generates output adhering to specific structures and syntax, even with smaller models.

Autogrammer is still being actively developed and should be considered in alpha

Why Autogrammer?

LLMs produce a probability distribution over possible next tokens. By manipulating this distribution, you can constrain what the LLM outputs, for example by only allowing syntactically valid next tokens. For smaller LLMs (like ones that run in a browser) this is particularly valuable. It's a harness that guides them toward correct output.

GBNF is a grammar format for defining syntactic validity. At inference time, Autogrammer uses the GBNF grammar to mask invalid tokens from the model's logits, guaranteeing parseable output.

Similar packages exist in the Python ecosystem: Outlines and guidance.

Use Cases

💻 Live code generation in the browser
🗣️ Natural language to SQL conversion
🎇 Generating visualizations from text descriptions
🌳 Offline apps
🕵️ When you want your data staying private

Key features

Bring your own model — Works seamlessly with Transformers.js, web-llm, and REST endpoints for llama.cpp and llamafile, allowing you to use your preferred LLM.
Support for JSON and SQL — With syntactic validity guaranteed by GBNF grammars.
Schema support — Provide schemas** (database schema or JSON schema) to further restrict possible output and ensure semantic correctness.
Plugins — (Coming soon) Enable additional functionality, such as RAG, on-the-fly error correction, and more.

Installation

npm install autogrammer

Quickstart

import { pipeline } from '@xenova/transformers'
import { Autogrammer } from 'autogrammer'

// Load your preferred model
const model = pipeline('text-generation', 'Xenova/gpt2')

// Create Autogrammer for JSON output
const autogrammer = new Autogrammer({
  language: 'json',
})

// Tell the model what to generate
const prompt = 'Write me JSON that captures the following address: 1600 Pennsylvania Avenue NW, Washington, DC 20500'

// Run
const response = await autogrammer.execute(prompt)

// See the generated JSON
console.log(response) // { ... json object }

Packages

Autogrammer is made up of several packages:

Grammar packages (in GBNF repo):

gbnf - Parses a GBNF grammar into a graph of rules, which can be used to determine the validity of a next token. Also enables the creation of GBNF grammars dynamically.
json2gbnf - Generates a GBNF grammar for JSON, with optional JSON schema
sql2gbnf - Generates a GBNF grammar for SQL, with optional database schema

Orchestration packages (in this repo):

contort - Implements a Logits post-processor that restricts LLM output to only include valid next tokens
autogrammer - Orchestrates support for SQL and JSON grammar generation with a variety of LLM models.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 577 Commits
.github/workflows		.github/workflows
.vscode		.vscode
packages		packages
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autogrammer

Why Autogrammer?

Use Cases

Key features

Installation

Quickstart

Packages

License

About

Uh oh!

Releases 71

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autogrammer

Why Autogrammer?

Use Cases

Key features

Installation

Quickstart

Packages

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 71

Uh oh!

Contributors

Uh oh!

Languages