Add support for lambda column capture#21323
Conversation
| } else if let Some(lambda_variable) = | ||
| expr.as_any().downcast_ref::<LambdaVariable>() | ||
| { | ||
| used_column_indices.insert(lambda_variable.index()); |
There was a problem hiding this comment.
I'm 98% sure this has a bug for conflicting indices for lambda variable and columns, and even if you separate lambda variable indices from the column indices you can still have problem with nested lambda variables and using upper lambda variable inside nested ones
There was a problem hiding this comment.
I added a sqllogictest test which I hope includes all the cases you cited and more (4932cae). Compared to your snippet at #21231 (comment) where lambda variables are included first in the scoped schema and external columns after them, here lambda variables are pushed to the end of the outer schema, which still includes unreferenced columns, and in case of any name conflicts(a lambda variable shadows a field from the outer schema), we rename the shadowed field to an unique name ( 5c5ca19#diff-a3e127629e9516ec496d656ebb53a1e8bf730eb02d219c4ce42ee47572685844R253-R325, 5c5ca19#diff-7fb0a64e734f54d94d48e9e02c51573a3678205f9ee8e2afaf41d686187a285eR440-R489). That way, after a field has been introduced into the schema, be it a column on the outermost schema or a lambda variable into inner schemas, their index never changes, regardless of how many new scopes are created from it down the tree. Because of that, the casewhen optimization (as well as the same opimization in lambdas) can safely collect all indices and assume all those that are out-of-bounds of the scoped batch it's projecting refer to inner lambda variables not yet available. It still need to rewrite all of them since they were originally computed based on the unprojected, full schema, and any projection of a outer schema affects the indices of all it's derived, inner schemas, and must be propagated down the tree, for every projection(inner projections couldn't know how to rewrite indices of outer projection)
|
@gstvg do you want to align the pr to latest main and continue working on it so we can release with the lambda support to avoid 2 breaking changes? |
|
@rluvaton Sure, I will work on this later today. I will ping you when I finish. |
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
|
@rluvaton This is ready for review. Note I kept the same approach because it fully implements capture and doesn't make copies of uncaptured columns, the whole tree is exposed via TreeNode and every Column exposed to regular tree traversals have an index referring to the outer schema, without requiring new tree node methods. It does so by implementing the same optimization as CaseWhen, collect used indices, rewrite the body into a projected_body but expose the unprojected body to TreeNode, and then project the batch before providing it to the function implementer via Note I'm using the same approach as #18329 to keep the codebase consistent, but my first version before it got merged instead of rewriting+projecting, simply swapped uncaptured columns for NullArrays which are cheap to create. While not as elegant as #18329, it's simpler and easier to reason about specially in the context of deeply nested lambdas and case's. |
@comphead Yes, there's a test with nested lambdas ( column datafusion/datafusion/sqllogictest/test_files/array/array_transform.slt Lines 189 to 222 in 17e50e0 |
Which issue does this PR close?
Part of #21172
Rationale for this change
Capture support wasn't implemented in the core lambda support to reduce PR size and because it requires further discussions not tied to basic support
What changes are included in this PR?
Lambda capture
list_values_row_number helper to adjust a list to the lambda scope
Make #18329 lambda-aware
Are these changes tested?
sqllogictests for lambda capture and CaseWhen
unit tests for list_values_row_number
Are there any user-facing changes?
This add breaking changes to unreleased items only