Refactor inference dedicated crates for llama and gemma inferencing, not integrated
apply_cached_repeat_penalty