Skip to content

feat: Add send_request_enqueue_strategy option to BasicCrawler#1865

Open
vdusek wants to merge 2 commits intomasterfrom
feat/send-request-enqueue-strategy
Open

feat: Add send_request_enqueue_strategy option to BasicCrawler#1865
vdusek wants to merge 2 commits intomasterfrom
feat/send-request-enqueue-strategy

Conversation

@vdusek
Copy link
Copy Markdown
Collaborator

@vdusek vdusek commented Apr 30, 2026

Summary

Adds an opt-in send_request_enqueue_strategy: EnqueueStrategy = 'all' option to BasicCrawler so handlers can reject cross-host send_request calls. Default 'all' preserves current behavior.

Follow-up to #1864, which applied the same EnqueueStrategy mechanism to SitemapRequestLoader.

What changed

  • New send_request_enqueue_strategy option on BasicCrawler / _BasicCrawlerOptions.
  • When set to anything other than 'all', _prepare_send_request_function validates the target URL against the current request's loaded_url (falling back to url) via _check_enqueue_strategy and raises ValueError on mismatch.
  • The closure captures the Request, so post-redirect loaded_url is the origin once set.

@github-actions github-actions Bot added this to the 139th sprint - Tooling team milestone Apr 30, 2026
@github-actions github-actions Bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Apr 30, 2026
@vdusek vdusek marked this pull request as draft April 30, 2026 14:40
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.48%. Comparing base (ac66b2a) to head (c5ebceb).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1865      +/-   ##
==========================================
+ Coverage   92.44%   92.48%   +0.04%     
==========================================
  Files         158      158              
  Lines       11026    11033       +7     
==========================================
+ Hits        10193    10204      +11     
+ Misses        833      829       -4     
Flag Coverage Δ
unit 92.48% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vdusek vdusek force-pushed the feat/send-request-enqueue-strategy branch from 075a54a to 96b840f Compare April 30, 2026 15:01
@vdusek vdusek marked this pull request as ready for review April 30, 2026 15:03
@vdusek vdusek added the adhoc Ad-hoc unplanned task added during the sprint. label Apr 30, 2026
@vdusek vdusek force-pushed the feat/send-request-enqueue-strategy branch from 96b840f to fa5f402 Compare April 30, 2026 15:19
Copy link
Copy Markdown
Collaborator

@Mantisus Mantisus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@B4nan
Copy link
Copy Markdown
Member

B4nan commented May 1, 2026

Isn't this weird? If the URL is explicitly provided, we should just follow it. If the URL was added to the queue, it must have already respected the enqueue links strategy, or again, be explicitly provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants