During my work, I've discovered an "impedance mismatch" between GenStage and Ecto. Specifically, we drive a lot work using Ecto.Stream and its lower-level equivalent:
alias Ecto.Adapters.SQL
# We have a complicated query that produces millions of rows.
q = "SELECT n FROM generate_series(1, 1000000) n;"
chunks = SQL.stream(My.Repo, q, [], log: false)
stream = Stream.flat_map(chunks, fn
%{num_rows: 0} ->
[]
%{rows: rows} ->
Enum.map(rows, fn row -> ... end end)
end)
# We use those rows to drive a bunch of (parallelizable) work.
{:ok, producer} = GenStage.from_enumerable(stream, ...)
:ok = GenStage.async_subscribe(self(), to: producer, ...)
We use the above pattern a lot. Unfortunately, Ecto requires streams to be run inside a transaction, thus making GenStage.from_enumerable/2 unusable.
To work around this, we spawn a forwarding process that sends chunks of events whenever our GenStage producer requests them:
forward_upon_request = fn chunk ->
receive do
:more ->
send(..., {:supply, chunk})
end
end
My.Repo.transaction! fn ->
stream |> Stream.chunk(n) |> Stream.each(forward_upon_request) |> Stream.run
end
We also tried to write our own producer that reduces streams in a transaction in a similar fashion to GenStage.Streamer. It didn't work because – as far as I could tell – the continuations reuse the connection from the first transaction?
While the aforementioned hack works, it is sub-optimal.
Do you see any way GenStage.Streamer can support such a use case through some sort of generalized functionality?
If not, should Ecto or another library provide a GenStage producer that produces events from a query?
During my work, I've discovered an "impedance mismatch" between
GenStageandEcto. Specifically, we drive a lot work usingEcto.Streamand its lower-level equivalent:We use the above pattern a lot. Unfortunately, Ecto requires streams to be run inside a transaction, thus making
GenStage.from_enumerable/2unusable.To work around this, we spawn a forwarding process that sends chunks of events whenever our GenStage producer requests them:
We also tried to write our own producer that reduces streams in a transaction in a similar fashion to
GenStage.Streamer. It didn't work because – as far as I could tell – the continuations reuse the connection from the first transaction?While the aforementioned hack works, it is sub-optimal.
Do you see any way
GenStage.Streamercan support such a use case through some sort of generalized functionality?If not, should
Ectoor another library provide a GenStage producer that produces events from a query?