Describe the enhancement requested
This is follow-up work to GH-33985 / PR #34834 now that Substrait can represent unresolved / partially bound expressions (see substrait-io/substrait#515).
Arrow can currently deserialize bound Substrait ExtendedExpression messages, but it cannot yet consume unresolved expressions that contain:
Expression.NamedExpression
Type.Unknown
- unresolved function signatures such as
add:unknown_unknown
To support front-end filter / projection workflows, Arrow should be able to deserialize these messages using a supplied Arrow schema, bind unresolved names and types against that schema, and then return normal Arrow compute expressions.
Concretely, this means:
- binding
NamedExpression to Arrow FieldRef
- treating
Type.Unknown as a bind-time placeholder instead of an executable Arrow type
- allowing schema-aware deserialization of
ExtendedExpression
- exposing that path in both C++ and Python APIs
The expected API shape is something like:
- C++:
DeserializeExpressions(buf, input_schema, ...)
- Python:
pyarrow.substrait.deserialize_expressions(buf, schema=...)
pyarrow.substrait.BoundExpressions.from_substrait(..., schema=...)
pyarrow.compute.Expression.from_substrait(..., schema=...)
This work depends on the Substrait protocol change in substrait-io/substrait#515.
Component(s)
C++, Python
Describe the enhancement requested
This is follow-up work to GH-33985 / PR #34834 now that Substrait can represent unresolved / partially bound expressions (see substrait-io/substrait#515).
Arrow can currently deserialize bound Substrait
ExtendedExpressionmessages, but it cannot yet consume unresolved expressions that contain:Expression.NamedExpressionType.Unknownadd:unknown_unknownTo support front-end filter / projection workflows, Arrow should be able to deserialize these messages using a supplied Arrow schema, bind unresolved names and types against that schema, and then return normal Arrow compute expressions.
Concretely, this means:
NamedExpressionto ArrowFieldRefType.Unknownas a bind-time placeholder instead of an executable Arrow typeExtendedExpressionThe expected API shape is something like:
DeserializeExpressions(buf, input_schema, ...)pyarrow.substrait.deserialize_expressions(buf, schema=...)pyarrow.substrait.BoundExpressions.from_substrait(..., schema=...)pyarrow.compute.Expression.from_substrait(..., schema=...)This work depends on the Substrait protocol change in substrait-io/substrait#515.
Component(s)
C++, Python