Skip to content

feat: expose token usage in CrawlResult (#1745)#1874

Open
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/token-usage-in-crawl-result-1745
Open

feat: expose token usage in CrawlResult (#1745)#1874
hafezparast wants to merge 1 commit intounclecode:developfrom
hafezparast:fix/token-usage-in-crawl-result-1745

Conversation

@hafezparast
Copy link
Copy Markdown
Contributor

Summary

Adds token_usage field to CrawlResult so LLM token consumption is returned in the crawl response — including via the Docker API.

What changed

crawl4ai/models.py:

# Added to CrawlResult
token_usage: Optional[Dict[str, Any]] = None

crawl4ai/async_webcrawler.py:
After extraction completes, reads config.extraction_strategy.total_usage and passes it to CrawlResult:

if hasattr(config.extraction_strategy, 'total_usage'):
    token_usage = {k: v for k, v in _token_usage.__dict__.items() if v is not None and v != 0}

Before / After

Before (Docker API response):

{
  "extracted_content": "...",
  "success": true
  // no token usage
}

After:

{
  "extracted_content": "...",
  "token_usage": {
    "prompt_tokens": 1234,
    "completion_tokens": 567,
    "total_tokens": 1801
  },
  "success": true
}

Python SDK (also works)

result = await crawler.arun(url, config=config)
print(result.token_usage)
# {'prompt_tokens': 1234, 'completion_tokens': 567, 'total_tokens': 1801}

Test plan

  • 15/15 unit tests pass
  • 290/291 regression tests pass (1 pre-existing transformers issue)
  • CrawlResult serializes token_usage correctly
  • Default is None when no LLM extraction is used

Closes #1745

🤖 Generated with Claude Code

Add token_usage field to CrawlResult so LLM token consumption is
included in the JSON response when using LLMExtractionStrategy.

Changes:
- models.py: add token_usage: Optional[Dict[str, Any]] to CrawlResult
- async_webcrawler.py: after extraction, read total_usage from the
  extraction strategy and pass it to CrawlResult

The token_usage dict contains prompt_tokens, completion_tokens,
total_tokens, and optionally completion_tokens_details and
prompt_tokens_details — matching the existing TokenUsage dataclass.

Previously, token usage was only accessible via the strategy object
(strategy.total_usage) in the Python SDK. Now it's serialized in
CrawlResult, making it available via the Docker API endpoint too.

Closes unclecode#1745

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dbhurley added a commit to dbhurley/crawl4ai that referenced this pull request Apr 8, 2026
…laywright

Closes unclecode#1256 (memory leak in Docker from Chrome)
Related to unclecode#1874 (token usage tracking)

Plasmate (https://github.com/plasmate-labs/plasmate) is an open-source
Rust browser engine that replaces Chrome/Playwright for static pages.
No browser process, ~64MB RAM vs ~300MB, 10-100x fewer tokens per page.

Changes:
- crawl4ai/async_plasmate_strategy.py: AsyncPlasmateCrawlerStrategy
  - Implements AsyncCrawlerStrategy ABC (drop-in replacement)
  - Supports output_format: text (default), markdown, som, links
  - Supports --selector, --header, --timeout flags
  - Optional fallback_to_playwright=True for JS-heavy SPAs
  - Subprocess runs in asyncio executor — safe for concurrent use
- crawl4ai/__init__.py: export AsyncPlasmateCrawlerStrategy
- tests/general/test_plasmate_strategy.py: 20 unit tests

Install: pip install plasmate

Usage:
  from crawl4ai import AsyncWebCrawler
  from crawl4ai.async_plasmate_strategy import AsyncPlasmateCrawlerStrategy

  strategy = AsyncPlasmateCrawlerStrategy(
      output_format="markdown",
      fallback_to_playwright=True,   # SPA safety net
  )
  async with AsyncWebCrawler(crawler_strategy=strategy) as crawler:
      result = await crawler.arun("https://docs.python.org/3/")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant