release GVL while encoding / decoding tokens by tenderworks · Pull Request #90 · IAPark/tiktoken_ruby

tenderworks · 2025-12-09T19:30:59Z

Encoding / decoding tokens is a CPU intensive task. Since tiktoken doesn't call back in to Ruby when encoding / decoding, it should be safe for us to release the GVL while doing it. This allows us to process tokens in parallel on Ruby.

For example:

require "tiktoken_ruby"
require "benchmark"

 # Generate a large text to tokenize
LARGE_TEXT = ("Hello, world! This is a test of the tiktoken library. " * 1000).freeze
THREAD_COUNT = 4
ITERATIONS = 10

encoder = Tiktoken.encoding_for_model("gpt-4")

 # Single-threaded baseline
single_threaded_time = Benchmark.realtime do
  (THREAD_COUNT * ITERATIONS).times do
    encoder.encode(LARGE_TEXT)
  end
end

 # Multi-threaded with GVL release
multi_threaded_time = Benchmark.realtime do
  threads = THREAD_COUNT.times.map do
    Thread.new do
      ITERATIONS.times do
        encoder.encode(LARGE_TEXT)
      end
    end
  end
  threads.each(&:join)
end

p SPEEDUP: (single_threaded_time / multi_threaded_time.to_f)

On the main branch we see no speedup (the output is 1.0). On this branch, I see 3.4x speedup (though I think as the input grows in size, we'll see numbers even closer to the theoretical max which would be THREAD_COUNT)

The main thing is that we don't want to allocate Ruby objects (or call out to the VM) inside our GVL callback. I think since Magnus is handling the Ruby object conversion after we've returned, this should be safe.

Encoding / decoding tokens is a CPU intensive task. Since tiktoken doesn't call back in to Ruby when encoding / decoding, it should be safe for us to release the GVL while doing it. This allows us to process tokens in parallel on Ruby. For example: ```ruby require "tiktoken_ruby" require "benchmark" # Generate a large text to tokenize LARGE_TEXT = ("Hello, world! This is a test of the tiktoken library. " * 1000).freeze THREAD_COUNT = 4 ITERATIONS = 10 encoder = Tiktoken.encoding_for_model("gpt-4") # Single-threaded baseline single_threaded_time = Benchmark.realtime do (THREAD_COUNT * ITERATIONS).times do encoder.encode(LARGE_TEXT) end end # Multi-threaded with GVL release multi_threaded_time = Benchmark.realtime do threads = THREAD_COUNT.times.map do Thread.new do ITERATIONS.times do encoder.encode(LARGE_TEXT) end end end threads.each(&:join) end p SPEEDUP: (single_threaded_time / multi_threaded_time.to_f) ``` On the main branch we see no speedup (the output is 1.0). On this branch, I see 3.4x speedup (though I think as the input grows in size, we'll see numbers even closer to the theoretical max which would be THREAD_COUNT)

gjtorikian · 2025-12-10T15:44:10Z

👋

I've had bad luck with the GC marking things away in the past, e.g. gjtorikian/selma#81

I added a pretty basic stress test to this PR. Would you mind reviewing it and confirming that it's somewhat useful in verifying that nothing terrible is going to happen with this change?

I also ended up exposing encode_with_special_tokens in this PR, sorry for the additional diff noise.

tenderlove · 2025-12-12T17:42:04Z

👋

👋 Good to see you! (online)

I've had bad luck with the GC marking things away in the past, e.g. gjtorikian/selma#81

I added a pretty basic stress test to this PR. Would you mind reviewing it and confirming that it's somewhat useful in verifying that nothing terrible is going to happen with this change?

The tests make sense, and should work. I'm personally not worried about GC in this case. It looks like we're converting all Ruby types in to Rust types before encoding them in the context that's passed to the GVL release routines. Before we merge this though, give me a bit of time to review exactly how magnus handles Ruby type unboxing. I'm less worried about objects being "freed" and more worried about objects being allocated in the no_gvl routines (you're not allow to allocate Ruby objects while the GVL is released).

The other thing I'm slightly worried about is that it looks like these CoreBPE contexts are typically singleton objects. Do we know if the underlying data structures are threadsafe? Is it legit to call encode_ordinary on the same context in multiple threads? If you know, please let me know. Otherwise I'm going to poke at what the Python implementation does and follow up.

I also ended up exposing encode_with_special_tokens in this PR, sorry for the additional diff noise.

No problem! Thank you!

gjtorikian · 2025-12-12T23:04:12Z

The other thing I'm slightly worried about is that it looks like these CoreBPE contexts are typically singleton objects. Do we know if the underlying data structures are threadsafe? Is it legit to call encode_ordinary on the same context in multiple threads? If you know, please let me know. Otherwise I'm going to poke at what the Python implementation does and follow up.

I think this isn't a problem, since the CoreBPE struct is Send + Sync.

Otherwise, there are some comments in the source that seem to hint at multi-threadedness working in Python, releasing their GIL. I may be misreading this though.

tenderlove · 2025-12-16T19:20:26Z

I think this isn't a problem, since the CoreBPE struct is Send + Sync.

Otherwise, there are some comments in the source that seem to hint at multi-threadedness working in Python, releasing their GIL. I may be misreading this though.

Ah, great. Yes, this should work totally fine then.

Before we merge this though, give me a bit of time to review exactly how magnus handles Ruby type unboxing.

Ok, I think everything is good. All of the context data types (EncodeOrdinaryData, etc) only refer to Rust types, and the context structs are the only thing we pass to the non-GVL callbacks. I think this should be safe to merge.

Thanks for your patience!!

gjtorikian · 2025-12-19T16:12:40Z

Thanks for your contribution! Release should be going out later today. I'm going to use this update as an excuse to finally sort out #79.

tenderlove and others added 3 commits December 9, 2025 14:24

this is an unnecessary clone

415d09b

expose encode_with_special_tokens

773f1d1

add spec for GC/GVL stress test

58e4baf

gjtorikian force-pushed the release-gvl branch from c23bb1f to 58e4baf Compare December 10, 2025 15:44

gjtorikian merged commit 9cd9879 into IAPark:main Dec 19, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release GVL while encoding / decoding tokens#90

release GVL while encoding / decoding tokens#90
gjtorikian merged 4 commits intoIAPark:mainfrom
tenderworks:release-gvl

tenderworks commented Dec 9, 2025

Uh oh!

gjtorikian commented Dec 10, 2025

Uh oh!

tenderlove commented Dec 12, 2025

Uh oh!

gjtorikian commented Dec 12, 2025

Uh oh!

tenderlove commented Dec 16, 2025

Uh oh!

gjtorikian commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tenderworks commented Dec 9, 2025

Uh oh!

gjtorikian commented Dec 10, 2025

Uh oh!

tenderlove commented Dec 12, 2025

Uh oh!

gjtorikian commented Dec 12, 2025

Uh oh!

tenderlove commented Dec 16, 2025

Uh oh!

gjtorikian commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants