Skip to content

Simplify threading: remove the use of inner_pool.#167

Merged
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma2
Apr 30, 2024
Merged

Simplify threading: remove the use of inner_pool.#167
copybara-service[bot] merged 1 commit intogoogle:devfrom
szabadka:gemma2

Conversation

@szabadka
Copy link
Copy Markdown
Collaborator

We only used inner_pool in the prefill FFW function, and there we can achieve sufficient parallelism on the rows of the matrix-vector multiplications.

Benchmark results on a 1600-token summarization task:

               Prefill speed
Num threads    BEFORE         AFTER
4               9.24 t/s       9.76 t/s
18             31.41 t/s      31.16 t/s
32             31.41 t/s      45.13 t/s
64             31.03 t/s      57.85 t/s

We only used inner_pool in the prefill FFW function, and there we
can achieve sufficient parallelism on the rows of the matrix-vector
multiplications.

Benchmark results on a 1600-token summarization task:

```
               Prefill speed
Num threads    BEFORE         AFTER
4               9.24 t/s       9.76 t/s
18             31.41 t/s      31.16 t/s
32             31.41 t/s      45.13 t/s
64             31.03 t/s      57.85 t/s
```
Copy link
Copy Markdown
Member

@jan-wassenberg jan-wassenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplification, thank you @szabadka !
Was talking today with Austin about using matmul in Prefill, that will also help speed this up.

Unrelatedly, what do you think of moving some of the function call arguments into a struct, for example AuxOut(layers_output) and/or Sampling(temp/gen)?

@jan-wassenberg jan-wassenberg added the copybara-import Trigger Copybara for merging pull requests label Apr 29, 2024
@szabadka
Copy link
Copy Markdown
Collaborator Author

Unrelatedly, what do you think of moving some of the function call arguments into a struct, for example AuxOut(layers_output) and/or Sampling(temp/gen)?

Yes, that could be useful, these functions have quite a few arguments.

@jan-wassenberg
Copy link
Copy Markdown
Member

Will do :)

@copybara-service copybara-service Bot merged commit befe9fb into google:dev Apr 30, 2024
@szabadka szabadka deleted the gemma2 branch April 30, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara-import Trigger Copybara for merging pull requests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants