Skip to content

[C++][Python] Bind unresolved Substrait expressions using a supplied schema #49885

@malinjawi

Description

@malinjawi

Describe the enhancement requested

This is follow-up work to GH-33985 / PR #34834 now that Substrait can represent unresolved / partially bound expressions (see substrait-io/substrait#515).

Arrow can currently deserialize bound Substrait ExtendedExpression messages, but it cannot yet consume unresolved expressions that contain:

  • Expression.NamedExpression
  • Type.Unknown
  • unresolved function signatures such as add:unknown_unknown

To support front-end filter / projection workflows, Arrow should be able to deserialize these messages using a supplied Arrow schema, bind unresolved names and types against that schema, and then return normal Arrow compute expressions.

Concretely, this means:

  • binding NamedExpression to Arrow FieldRef
  • treating Type.Unknown as a bind-time placeholder instead of an executable Arrow type
  • allowing schema-aware deserialization of ExtendedExpression
  • exposing that path in both C++ and Python APIs

The expected API shape is something like:

  • C++: DeserializeExpressions(buf, input_schema, ...)
  • Python:
    • pyarrow.substrait.deserialize_expressions(buf, schema=...)
    • pyarrow.substrait.BoundExpressions.from_substrait(..., schema=...)
    • pyarrow.compute.Expression.from_substrait(..., schema=...)

This work depends on the Substrait protocol change in substrait-io/substrait#515.

Component(s)

C++, Python

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions