With the support of new model architectures, we start to observe a lot of repeating patterns in the code for building their compute graphs. We should find a way to refactor and reuse the repetitive code. We should also consider splitting the implementation in separate source files if necessary.
https://github.com/ggerganov/llama.cpp/blob/0e76a8992c8200237bbc6471a53fb8796b3872f7/llama.cpp#L3997-L4026
Open to ideas and suggestions
With the support of new model architectures, we start to observe a lot of repeating patterns in the code for building their compute graphs. We should find a way to refactor and reuse the repetitive code. We should also consider splitting the implementation in separate source files if necessary.
https://github.com/ggerganov/llama.cpp/blob/0e76a8992c8200237bbc6471a53fb8796b3872f7/llama.cpp#L3997-L4026
Open to ideas and suggestions