Skip to content

release GVL while encoding / decoding tokens#90

Merged
gjtorikian merged 4 commits intoIAPark:mainfrom
tenderworks:release-gvl
Dec 19, 2025
Merged

release GVL while encoding / decoding tokens#90
gjtorikian merged 4 commits intoIAPark:mainfrom
tenderworks:release-gvl

Conversation

@tenderworks
Copy link
Copy Markdown

Encoding / decoding tokens is a CPU intensive task. Since tiktoken doesn't call back in to Ruby when encoding / decoding, it should be safe for us to release the GVL while doing it. This allows us to process tokens in parallel on Ruby.

For example:

require "tiktoken_ruby"
require "benchmark"

 # Generate a large text to tokenize
LARGE_TEXT = ("Hello, world! This is a test of the tiktoken library. " * 1000).freeze
THREAD_COUNT = 4
ITERATIONS = 10

encoder = Tiktoken.encoding_for_model("gpt-4")

 # Single-threaded baseline
single_threaded_time = Benchmark.realtime do
  (THREAD_COUNT * ITERATIONS).times do
    encoder.encode(LARGE_TEXT)
  end
end

 # Multi-threaded with GVL release
multi_threaded_time = Benchmark.realtime do
  threads = THREAD_COUNT.times.map do
    Thread.new do
      ITERATIONS.times do
        encoder.encode(LARGE_TEXT)
      end
    end
  end
  threads.each(&:join)
end

p SPEEDUP: (single_threaded_time / multi_threaded_time.to_f)

On the main branch we see no speedup (the output is 1.0). On this branch, I see 3.4x speedup (though I think as the input grows in size, we'll see numbers even closer to the theoretical max which would be THREAD_COUNT)

The main thing is that we don't want to allocate Ruby objects (or call out to the VM) inside our GVL callback. I think since Magnus is handling the Ruby object conversion after we've returned, this should be safe.

tenderlove and others added 3 commits December 9, 2025 14:24
Encoding / decoding tokens is a CPU intensive task.  Since tiktoken
doesn't call back in to Ruby when encoding / decoding, it should be safe
for us to release the GVL while doing it.  This allows us to process
tokens in parallel on Ruby.

For example:

```ruby
require "tiktoken_ruby"
require "benchmark"

 # Generate a large text to tokenize
LARGE_TEXT = ("Hello, world! This is a test of the tiktoken library. " * 1000).freeze
THREAD_COUNT = 4
ITERATIONS = 10

encoder = Tiktoken.encoding_for_model("gpt-4")

 # Single-threaded baseline
single_threaded_time = Benchmark.realtime do
  (THREAD_COUNT * ITERATIONS).times do
    encoder.encode(LARGE_TEXT)
  end
end

 # Multi-threaded with GVL release
multi_threaded_time = Benchmark.realtime do
  threads = THREAD_COUNT.times.map do
    Thread.new do
      ITERATIONS.times do
        encoder.encode(LARGE_TEXT)
      end
    end
  end
  threads.each(&:join)
end

p SPEEDUP: (single_threaded_time / multi_threaded_time.to_f)
```

On the main branch we see no speedup (the output is 1.0).  On this
branch, I see 3.4x speedup (though I think as the input grows in size,
we'll see numbers even closer to the theoretical max which would be
THREAD_COUNT)
@gjtorikian
Copy link
Copy Markdown
Collaborator

👋

I've had bad luck with the GC marking things away in the past, e.g. gjtorikian/selma#81

I added a pretty basic stress test to this PR. Would you mind reviewing it and confirming that it's somewhat useful in verifying that nothing terrible is going to happen with this change?

I also ended up exposing encode_with_special_tokens in this PR, sorry for the additional diff noise.

@tenderlove
Copy link
Copy Markdown
Contributor

👋

👋 Good to see you! (online)

I've had bad luck with the GC marking things away in the past, e.g. gjtorikian/selma#81

I added a pretty basic stress test to this PR. Would you mind reviewing it and confirming that it's somewhat useful in verifying that nothing terrible is going to happen with this change?

The tests make sense, and should work. I'm personally not worried about GC in this case. It looks like we're converting all Ruby types in to Rust types before encoding them in the context that's passed to the GVL release routines. Before we merge this though, give me a bit of time to review exactly how magnus handles Ruby type unboxing. I'm less worried about objects being "freed" and more worried about objects being allocated in the no_gvl routines (you're not allow to allocate Ruby objects while the GVL is released).

The other thing I'm slightly worried about is that it looks like these CoreBPE contexts are typically singleton objects. Do we know if the underlying data structures are threadsafe? Is it legit to call encode_ordinary on the same context in multiple threads? If you know, please let me know. Otherwise I'm going to poke at what the Python implementation does and follow up.

I also ended up exposing encode_with_special_tokens in this PR, sorry for the additional diff noise.

No problem! Thank you!

@gjtorikian
Copy link
Copy Markdown
Collaborator

The other thing I'm slightly worried about is that it looks like these CoreBPE contexts are typically singleton objects. Do we know if the underlying data structures are threadsafe? Is it legit to call encode_ordinary on the same context in multiple threads? If you know, please let me know. Otherwise I'm going to poke at what the Python implementation does and follow up.

I think this isn't a problem, since the CoreBPE struct is Send + Sync.

Otherwise, there are some comments in the source that seem to hint at multi-threadedness working in Python, releasing their GIL. I may be misreading this though.

@tenderlove
Copy link
Copy Markdown
Contributor

I think this isn't a problem, since the CoreBPE struct is Send + Sync.

Otherwise, there are some comments in the source that seem to hint at multi-threadedness working in Python, releasing their GIL. I may be misreading this though.

Ah, great. Yes, this should work totally fine then.

Before we merge this though, give me a bit of time to review exactly how magnus handles Ruby type unboxing.

Ok, I think everything is good. All of the context data types (EncodeOrdinaryData, etc) only refer to Rust types, and the context structs are the only thing we pass to the non-GVL callbacks. I think this should be safe to merge.

Thanks for your patience!!

@gjtorikian
Copy link
Copy Markdown
Collaborator

Thanks for your contribution! Release should be going out later today. I'm going to use this update as an excuse to finally sort out #79.

@gjtorikian gjtorikian merged commit 9cd9879 into IAPark:main Dec 19, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants