Skip to content

Allow shrinking of generated values into wider strategies for their type#4713

Open
DRMacIver wants to merge 7 commits intomasterfrom
DRMacIver/value-based-shrinking
Open

Allow shrinking of generated values into wider strategies for their type#4713
DRMacIver wants to merge 7 commits intomasterfrom
DRMacIver/value-based-shrinking

Conversation

@DRMacIver
Copy link
Copy Markdown
Member

This is perhaps heresy, but a case I've been finding myself caring about recently is that sometimes you want shrinking to not go via generation. You care about the entire input space, but you want generators that can explore parts of the space that you'll never hit by chance. e.g. you might want to not crash on arbitrary bytes, but the interesting space corresponds to some grammar.

This PR makes it so that the shrinker can, under some limited cases, do this. If you e.g. have st.text() | some_strategy_that_generates_text, the shrinker can now reliably take the text produced by the second strategy and shrink it as if it were arbitrary text.

We achieve this in two ways:

  1. We annotate spans with their generated primitive value.
  2. just and sampled_from will insert artificial choice nodes for primitive values, so they are not artificially simpler than their generated values and this sort of shrinking is possible.

This is a bit of a gimmick and I'm not 100% sure that it's worth it, but I think it's a really nice feature for the small subset of cases where it matters. The shrink pass is also very narrowly designed so it's pretty cheap-to-free in cases where it's not useful.

@Zac-HD
Copy link
Copy Markdown
Member

Zac-HD commented Apr 24, 2026

some late-night takes

  • the UX is good, so we should do whatever crimes it takes to offer, but also see if there are pointers to a nice abstraction
  • restricting it to the primitive types is ugly and unprincipled
  • we should instead go with a minimal SearchStrategy._invert() method, use that, and give OneOfStrategy special integration with the choice sequence and Just/SampledFromStrategy.
    • basics should be about as easy, with a clear pathway to wildly more power later (search, and maybe solvers). Recording the object ids associated with choice-sequence-spans might also be helpful here, and maybe shared with call-pprinting?
    • SearchStrategy._invert() would also let us pull off complete bullshit features like "paste an externally-reported failure into @example(...), and we'll shrink it for you". I want this.

DRMacIver and others added 5 commits April 24, 2026 09:35
Annotate spans that produced a primitive value with that value, and use
those annotations in a new shrink pass that tries to widen a non-zero
one_of selector down to zero by replacing the selected span's choices
with a single forced choice holding the annotated value.

To make this work for ``just`` and ``sampled_from``, add a
``maybe_add_choice_node_for`` method on ``ConjectureData`` that records
a forced marker choice when the generated value is one of the primitive
choice types, so these strategies' spans are no longer empty and the
widening pass has something to replace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
``Span.annotation`` is now ``Span.generated_primitive_value``, and the
corresponding backing fields on ``Spans`` / ``SpanRecord`` follow suit.
``SpanRecord.annotate_current_span`` is renamed to
``record_value_for_span``, and the primitive-type check is now done
inside that method so callers don't need to guard it themselves.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
``add_choice_node_for`` now always records a choice - primitive values
still produce a forced choice of the matching type, and non-primitive
values produce a forced ``True`` boolean as a minimal marker.

To avoid polluting one_of's selector span with those markers (which
would break the widening pass's "immediately followed by" condition),
``OneOfStrategy.do_draw`` now draws an integer index directly instead
of routing the alternative selection through ``SampledFromStrategy``.

Update two invariant tests that asserted the old ``does not draw``
behaviour, and refresh the ``@reproduce_failure`` blob whose encoded
choice sequence shape has changed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DRMacIver DRMacIver force-pushed the DRMacIver/value-based-shrinking branch from df6dafc to 687ba30 Compare April 24, 2026 09:35
DRMacIver and others added 2 commits April 24, 2026 10:08
…n slices

- Change the placeholder choice added by ``add_choice_node_for`` for
  non-primitive values from ``draw_boolean(forced=True)`` to
  ``draw_boolean(forced=False)``. False is the trivial boolean value, so
  the resulting span has the same minimal sort-key as an empty span
  would, which preserves the relative complexity ordering of strategies
  like ``none()`` and ``booleans()``.
- Refresh the ``@reproduce_failure`` blob to match the new forced value.
- Skip slices containing only forced choices in the explain-phase loop
  that decides when to add ``# or any other generated value`` to an
  argument. If every node in the slice is forced, there's nothing to
  vary and the comment would be misleading.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Add direct unit tests for ``_choice_node_for_value`` (all primitive
  branches + the non-primitive assertion), which the quality tests were
  exercising via integration but the conjecture-only coverage run was
  not.
- Add a direct test for ``LazyStrategy.do_draw``; the normal
  ``data.draw`` path unwraps lazy strategies, so this method is only
  reachable via a direct call.
- Use a ``# pragma: no cover`` for the forced-only-slice skip in the
  explain pass - it's only reachable through a full explain-phase run,
  not through conjecture-level unit tests.
- Use double backticks for ``my_primitive_strategy |
  some_more_complicated_strategy`` in ``RELEASE.rst`` so Sphinx treats
  it as a literal rather than a Python object reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@DRMacIver DRMacIver force-pushed the DRMacIver/value-based-shrinking branch from 9cb877a to 25b74d5 Compare April 24, 2026 10:54
@DRMacIver
Copy link
Copy Markdown
Member Author

@Liam-DeVoe I investigated the failures from always inserting a choice node and it turns out that they were 100% that it makes one_of worse because of its internal use of sampled_from, so I just changed it to not do that and everything is happy.

@Zac-HD I'm... open in principle to an invert based approach to this, but I think it turns this from a minor nice-to-have chunk of work to a major feature that I'm not going to get around to implementing any time soon. It requires some very intrusive reworking of the way that the shrinker interacts with data generation in a way that I'm not quite prepared to think through the implementation details of right now.

Agreed it would be nice to make this work in generality, but I actually do think the primitive values are the highest priority version of this (actually I think bytes and text are the highest priority versions of this and the other primitive values are just there along for the ride because might as well)

@DRMacIver
Copy link
Copy Markdown
Member Author

DRMacIver commented Apr 24, 2026

There seems to be a flaky test made worse by this PR. I've not investigated enough yet. Here's what Claude says about it:

Details The flaky test was tests/pandas/test_argument_validation.py::test_raise_invalid_argument[...rows=just(['x'])...] (and an adjacent variant with rows=just({'a': 'x'})).

What it checks: that constructing a data_frames() strategy with a row shape that doesn't match the declared columns raises InvalidArgument. Concretely:

pdst.data_frames(
columns=pdst.columns(["a", "b"], dtype=str, elements=st.text()), # 2 columns
rows=st.just(["x"]), # 1-element row
index=pdst.indexes(dtype=int, min_size=1),
)

Two columns but each row has length 1 → should raise InvalidArgument.

What actually happened on those runs: instead of InvalidArgument, pytest saw a RuntimeWarning: overflow encountered in scalar add. With -bb -X dev on CI that gets escalated
to a hard error. The overflow happens somewhere inside pandas/numpy scalar arithmetic when pdst.indexes(dtype=int, min_size=1) draws a large-magnitude int and pandas does an
internal addition on it — before Hypothesis's own validation path rejects the bad shape.

Why my changes made it more likely to flake: previously just([...]) / just({...}) never touched the choice sequence. Now add_choice_node_for inserts a forced boolean marker.
Every choice that indexes(...) draws comes out of a slightly different sort-key position in the sequence, so the distribution of the int values it produces shifted, and with
Python 3.14 + pandas 3.0.2 that new distribution hits the overflow corner more often.

It passed locally 10× in a row with -W error and passed on rerun every time on CI — so the test is racy, not broken. Not a correctness issue with our shrinker changes: the
InvalidArgument path is still the outcome the majority of the time; we're just widening the window where pandas trips over the specific int. A proper fix (not part of this
PR) would be to bound pdst.indexes in that parametrization, or to suppress the overflow warning for argument-validation tests.

Copy link
Copy Markdown
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced we should do this now, rather than waiting for a more-general ._invert()-based solution.

Further notes:

  • All the tests are currently for text() | sampled_from(...) or equivalents. We could handle this with a strategy-level change, which taught string strategies about these additional constants using our existing constants machinery.
  • For more complex tests, e.g. using text() | specific_json().map(json.encodes), I worry that our primitive shrinking logic is not actually that sophisticated - we've historically relied on the structure from the strategy to make this easier, rather than shrinkray-style intelligence.

Comment on lines +11 to +17
"""Quality tests for ``basic_strategy | specific_strategy`` combinations.

Each predicate below is chosen so that random draws from the basic (open)
strategy satisfy it with probability well under 1%. That ensures the
initial failing example essentially always comes from the specific
(complicated) branch, so the test genuinely exercises the shrinker's
ability to widen out of the specific branch into the open one.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry this might be wrong, given that we find and upweight magic constants?

Comment on lines +1574 to +1587
"""Try to navigate away from a specific ``one_of`` alternative into
an earlier one by using the span's recorded generated primitive value.

If we have an integer choice with ``min_value == 0`` currently set to
a non-zero value, and it is immediately followed by a span whose
corresponding strategy produced a primitive value, we replace the
integer with ``0`` and the span's choices with a single choice
holding that primitive value. The engine then re-runs the test
against the earlier alternative with that value.

This is useful for ``basic_strategy | specific_strategy``, where
the specific branch produced a primitive that the basic branch could
also have produced: we slip the primitive across into the basic
branch so that normal shrinking can take it the rest of the way.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels uncomfortably brittle; it's obviously useful if the strategy is exactly like this, but there are some very sharp edges in how it works. (for example: you're just out of luck if you wrote branches in any other order)

Plausibly still worth doing, but it weighs against the PR for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants