warning: unused import: `Config as Config1` --> crates/inference-engine/src/model.rs:2:42 | 2 | use candle_transformers::models::gemma::{Config as Config1, Model as Model1}; | ^^^^^^^^^^^^^^^^^ | = note: `#[warn(unused_imports)]` on by default warning: unused import: `Config as Config2` --> crates/inference-engine/src/model.rs:3:43 | 3 | use candle_transformers::models::gemma2::{Config as Config2, Model as Model2}; | ^^^^^^^^^^^^^^^^^ warning: unused import: `Config as Config3` --> crates/inference-engine/src/model.rs:4:43 | 4 | use candle_transformers::models::gemma3::{Config as Config3, Model as Model3}; | ^^^^^^^^^^^^^^^^^ warning: unused import: `self` --> crates/inference-engine/src/server.rs:10:28 | 10 | use futures_util::stream::{self, Stream}; | ^^^^ warning: `inference-engine` (lib) generated 4 warnings (run `cargo fix --lib -p inference-engine` to apply 4 suggestions) Finished `release` profile [optimized] target(s) in 0.13s Running `target/release/predict-otron-9000` avx: false, neon: true, simd128: false, f16c: false 2025-08-28T00:34:39.293635Z  INFO hf_hub: Using token file found "/Users/williamseemueller/.cache/huggingface/token" retrieved the files in 295.458µs 2025-08-28T00:34:39.294536Z  INFO predict_otron_9000::middleware::metrics: Performance metrics summary: 2025-08-28T00:34:40.507474Z  INFO predict_otron_9000: Unified predict-otron-9000 server listening on 127.0.0.1:8080 2025-08-28T00:34:40.507503Z  INFO predict_otron_9000: Performance metrics tracking enabled - summary logs every 60 seconds 2025-08-28T00:34:40.507508Z  INFO predict_otron_9000: Available endpoints: 2025-08-28T00:34:40.507512Z  INFO predict_otron_9000: GET / - Root endpoint from embeddings-engine 2025-08-28T00:34:40.507515Z  INFO predict_otron_9000: POST /v1/embeddings - Text embeddings 2025-08-28T00:34:40.507517Z  INFO predict_otron_9000: POST /v1/chat/completions - Chat completions 2025-08-28T00:34:52.313606Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: tower_http::trace::on_request: started processing request 2025-08-28T00:34:52.313671Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: inference_engine::server: Formatted prompt: user You are a helpful assistant who responds thoughtfully and concisely. Write a paragraph about dogs model 2025-08-28T00:34:52.313693Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: predict_otron_9000::middleware::metrics: POST /v1/chat/completions 200 OK - 0 ms 2025-08-28T00:34:52.313709Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=0 ms status=200 2025-08-28T00:34:52.313763Z DEBUG inference_engine::text_generation: Cleared penalty cache for new generation (streaming mode) 2025-08-28T00:34:52.313985Z DEBUG inference_engine::text_generation: Streaming Tokenization completed in 217.04µs 2025-08-28T00:34:52.313990Z DEBUG inference_engine::text_generation: Streaming Input tokens: 26 2025-08-28T00:34:52.340937Z DEBUG inference_engine::text_generation: Using special generation approach for gemma-2/gemma-3 models (streaming) 2025-08-28T00:34:52.602691Z DEBUG inference_engine::server: Streaming token: 'Dogs' 2025-08-28T00:34:52.602718Z DEBUG inference_engine::server: Sending chunk with content: 'Dogs' 2025-08-28T00:34:52.769918Z DEBUG inference_engine::server: Streaming token: ' have' 2025-08-28T00:34:52.769949Z DEBUG inference_engine::server: Sending chunk with content: ' have' 2025-08-28T00:34:52.905947Z DEBUG inference_engine::server: Streaming token: ' captivated' 2025-08-28T00:34:52.905977Z DEBUG inference_engine::server: Sending chunk with content: ' captivated' 2025-08-28T00:34:53.040888Z DEBUG inference_engine::server: Streaming token: ' humans' 2025-08-28T00:34:53.040921Z DEBUG inference_engine::server: Sending chunk with content: ' humans' 2025-08-28T00:34:53.177116Z DEBUG inference_engine::server: Streaming token: ' for' 2025-08-28T00:34:53.177145Z DEBUG inference_engine::server: Sending chunk with content: ' for' 2025-08-28T00:34:53.313887Z DEBUG inference_engine::server: Streaming token: ' millennia' 2025-08-28T00:34:53.313920Z DEBUG inference_engine::server: Sending chunk with content: ' millennia' 2025-08-28T00:34:53.444031Z DEBUG inference_engine::server: Streaming token: ',' 2025-08-28T00:34:53.444060Z DEBUG inference_engine::server: Sending chunk with content: ',' 2025-08-28T00:34:53.571919Z DEBUG inference_engine::server: Streaming token: ' evolving' 2025-08-28T00:34:53.571951Z DEBUG inference_engine::server: Sending chunk with content: ' evolving' 2025-08-28T00:34:53.699811Z DEBUG inference_engine::server: Streaming token: ' from' 2025-08-28T00:34:53.699852Z DEBUG inference_engine::server: Sending chunk with content: ' from' 2025-08-28T00:34:53.828082Z DEBUG inference_engine::server: Streaming token: ' wolves' 2025-08-28T00:34:53.828111Z DEBUG inference_engine::server: Sending chunk with content: ' wolves' 2025-08-28T00:34:53.957276Z DEBUG inference_engine::server: Streaming token: ' to' 2025-08-28T00:34:53.957313Z DEBUG inference_engine::server: Sending chunk with content: ' to' 2025-08-28T00:34:54.093248Z DEBUG inference_engine::server: Streaming token: ' beloved' 2025-08-28T00:34:54.093284Z DEBUG inference_engine::server: Sending chunk with content: ' beloved' 2025-08-28T00:34:54.228357Z DEBUG inference_engine::server: Streaming token: ' companions' 2025-08-28T00:34:54.228385Z DEBUG inference_engine::server: Sending chunk with content: ' companions' 2025-08-28T00:34:54.356315Z DEBUG inference_engine::server: Streaming token: ' offering' 2025-08-28T00:34:54.356349Z DEBUG inference_engine::server: Sending chunk with content: ' offering' 2025-08-28T00:34:54.484051Z DEBUG inference_engine::server: Streaming token: ' unwavering' 2025-08-28T00:34:54.484085Z DEBUG inference_engine::server: Sending chunk with content: ' unwavering' 2025-08-28T00:34:54.613022Z DEBUG inference_engine::server: Streaming token: ' loyalty' 2025-08-28T00:34:54.613061Z DEBUG inference_engine::server: Sending chunk with content: ' loyalty' 2025-08-28T00:34:54.742024Z DEBUG inference_engine::server: Streaming token: ' alongside' 2025-08-28T00:34:54.742043Z DEBUG inference_engine::server: Sending chunk with content: ' alongside' 2025-08-28T00:34:54.869804Z DEBUG inference_engine::server: Streaming token: ' boundless' 2025-08-28T00:34:54.869829Z DEBUG inference_engine::server: Sending chunk with content: ' boundless' 2025-08-28T00:34:54.998140Z DEBUG inference_engine::server: Streaming token: ' affection' 2025-08-28T00:34:54.998165Z DEBUG inference_engine::server: Sending chunk with content: ' affection' 2025-08-28T00:34:55.126560Z DEBUG inference_engine::server: Streaming token: ' –' 2025-08-28T00:34:55.126582Z DEBUG inference_engine::server: Sending chunk with content: ' –' 2025-08-28T00:34:55.255214Z DEBUG inference_engine::server: Streaming token: ' often' 2025-08-28T00:34:55.255232Z DEBUG inference_engine::server: Sending chunk with content: ' often' 2025-08-28T00:34:55.383529Z DEBUG inference_engine::server: Streaming token: ' fueled' 2025-08-28T00:34:55.383551Z DEBUG inference_engine::server: Sending chunk with content: ' fueled' 2025-08-28T00:34:55.511437Z DEBUG inference_engine::server: Streaming token: ' by' 2025-08-28T00:34:55.511456Z DEBUG inference_engine::server: Sending chunk with content: ' by' 2025-08-28T00:34:55.639748Z DEBUG inference_engine::server: Streaming token: ' their' 2025-08-28T00:34:55.639768Z DEBUG inference_engine::server: Sending chunk with content: ' their' 2025-08-28T00:34:55.767723Z DEBUG inference_engine::server: Streaming token: ' incredible' 2025-08-28T00:34:55.767741Z DEBUG inference_engine::server: Sending chunk with content: ' incredible' 2025-08-28T00:34:55.895796Z DEBUG inference_engine::server: Streaming token: ' ability' 2025-08-28T00:34:55.895817Z DEBUG inference_engine::server: Sending chunk with content: ' ability' 2025-08-28T00:34:56.025191Z DEBUG inference_engine::server: Streaming token: ' at' 2025-08-28T00:34:56.025219Z DEBUG inference_engine::server: Sending chunk with content: ' at' 2025-08-28T00:34:56.153604Z DEBUG inference_engine::server: Streaming token: ' understanding' 2025-08-28T00:34:56.153626Z DEBUG inference_engine::server: Sending chunk with content: ' understanding' 2025-08-28T00:34:56.282571Z DEBUG inference_engine::server: Streaming token: ' human' 2025-08-28T00:34:56.282590Z DEBUG inference_engine::server: Sending chunk with content: ' human' 2025-08-28T00:34:56.411224Z DEBUG inference_engine::server: Streaming token: ' emotion' 2025-08-28T00:34:56.411247Z DEBUG inference_engine::server: Sending chunk with content: ' emotion' 2025-08-28T00:34:56.540028Z DEBUG inference_engine::server: Streaming token: ' through' 2025-08-28T00:34:56.540050Z DEBUG inference_engine::server: Sending chunk with content: ' through' 2025-08-28T00:34:56.668612Z DEBUG inference_engine::server: Streaming token: ' subtle' 2025-08-28T00:34:56.668630Z DEBUG inference_engine::server: Sending chunk with content: ' subtle' 2025-08-28T00:34:56.797698Z DEBUG inference_engine::server: Streaming token: ' cues' 2025-08-28T00:34:56.797716Z DEBUG inference_engine::server: Sending chunk with content: ' cues' 2025-08-28T00:34:56.927032Z DEBUG inference_engine::server: Streaming token: '!' 2025-08-28T00:34:56.927054Z DEBUG inference_engine::server: Sending chunk with content: '!' 2025-08-28T00:34:57.054903Z DEBUG inference_engine::server: Streaming token: ' Beyond' 2025-08-28T00:34:57.054922Z DEBUG inference_engine::server: Sending chunk with content: ' Beyond' 2025-08-28T00:34:57.183890Z DEBUG inference_engine::server: Streaming token: ' companionship' 2025-08-28T00:34:57.183914Z DEBUG inference_engine::server: Sending chunk with content: ' companionship' 2025-08-28T00:34:57.313258Z DEBUG inference_engine::server: Streaming token: ' they' 2025-08-28T00:34:57.313278Z DEBUG inference_engine::server: Sending chunk with content: ' they' 2025-08-28T00:34:57.441875Z DEBUG inference_engine::server: Streaming token: ' provide' 2025-08-28T00:34:57.441897Z DEBUG inference_engine::server: Sending chunk with content: ' provide' 2025-08-28T00:34:57.569839Z DEBUG inference_engine::server: Streaming token: ' crucial' 2025-08-28T00:34:57.569864Z DEBUG inference_engine::server: Sending chunk with content: ' crucial' 2025-08-28T00:34:57.700161Z DEBUG inference_engine::server: Streaming token: ' assistance' 2025-08-28T00:34:57.700184Z DEBUG inference_engine::server: Sending chunk with content: ' assistance' 2025-08-28T00:34:57.828427Z DEBUG inference_engine::server: Streaming token: ' with' 2025-08-28T00:34:57.828453Z DEBUG inference_engine::server: Sending chunk with content: ' with' 2025-08-28T00:34:57.957703Z DEBUG inference_engine::server: Streaming token: ' tasks' 2025-08-28T00:34:57.957727Z DEBUG inference_engine::server: Sending chunk with content: ' tasks' 2025-08-28T00:34:58.085556Z DEBUG inference_engine::server: Streaming token: ' like' 2025-08-28T00:34:58.085579Z DEBUG inference_engine::server: Sending chunk with content: ' like' 2025-08-28T00:34:58.213727Z DEBUG inference_engine::server: Streaming token: ' guarding' 2025-08-28T00:34:58.213750Z DEBUG inference_engine::server: Sending chunk with content: ' guarding' 2025-08-28T00:34:58.342674Z DEBUG inference_engine::server: Streaming token: ' property' 2025-08-28T00:34:58.342696Z DEBUG inference_engine::server: Sending chunk with content: ' property' 2025-08-28T00:34:58.474992Z DEBUG inference_engine::server: Streaming token: ' or' 2025-08-28T00:34:58.475011Z DEBUG inference_engine::server: Sending chunk with content: ' or' 2025-08-28T00:34:58.603613Z DEBUG inference_engine::server: Streaming token: ' assisting' 2025-08-28T00:34:58.603636Z DEBUG inference_engine::server: Sending chunk with content: ' assisting' 2025-08-28T00:34:58.732292Z DEBUG inference_engine::server: Streaming token: ' individuals' 2025-08-28T00:34:58.732316Z DEBUG inference_engine::server: Sending chunk with content: ' individuals' 2025-08-28T00:34:58.861810Z DEBUG inference_engine::server: Streaming token: ' who' 2025-08-28T00:34:58.861847Z DEBUG inference_engine::server: Sending chunk with content: ' who' 2025-08-28T00:34:58.989748Z DEBUG inference_engine::server: Streaming token: ' are' 2025-08-28T00:34:58.989765Z DEBUG inference_engine::server: Sending chunk with content: ' are' 2025-08-28T00:34:59.118088Z DEBUG inference_engine::server: Streaming token: ' blind' 2025-08-28T00:34:59.118105Z DEBUG inference_engine::server: Sending chunk with content: ' blind' 2025-08-28T00:34:59.246722Z DEBUG inference_engine::server: Streaming token: ' and' 2025-08-28T00:34:59.246746Z DEBUG inference_engine::server: Sending chunk with content: ' and' 2025-08-28T00:34:59.375090Z DEBUG inference_engine::server: Streaming token: ' deaf' 2025-08-28T00:34:59.375119Z DEBUG inference_engine::server: Sending chunk with content: ' deaf' 2025-08-28T00:34:59.503369Z DEBUG inference_engine::server: Streaming token: '.' 2025-08-28T00:34:59.503398Z DEBUG inference_engine::server: Sending chunk with content: '.' 2025-08-28T00:34:59.632352Z DEBUG inference_engine::server: Streaming token: ' Their' 2025-08-28T00:34:59.632374Z DEBUG inference_engine::server: Sending chunk with content: ' Their' 2025-08-28T00:34:59.760656Z DEBUG inference_engine::server: Streaming token: ' diverse' 2025-08-28T00:34:59.760675Z DEBUG inference_engine::server: Sending chunk with content: ' diverse' 2025-08-28T00:34:59.889274Z DEBUG inference_engine::server: Streaming token: ' breeds' 2025-08-28T00:34:59.889293Z DEBUG inference_engine::server: Sending chunk with content: ' breeds' 2025-08-28T00:35:00.018013Z DEBUG inference_engine::server: Streaming token: ' reflect' 2025-08-28T00:35:00.018043Z DEBUG inference_engine::server: Sending chunk with content: ' reflect' 2025-08-28T00:35:00.146874Z DEBUG inference_engine::server: Streaming token: ' a' 2025-08-28T00:35:00.146903Z DEBUG inference_engine::server: Sending chunk with content: ' a' 2025-08-28T00:35:00.275232Z DEBUG inference_engine::server: Streaming token: ' fascinating' 2025-08-28T00:35:00.275257Z DEBUG inference_engine::server: Sending chunk with content: ' fascinating' 2025-08-28T00:35:00.403452Z DEBUG inference_engine::server: Streaming token: ' range' 2025-08-28T00:35:00.403472Z DEBUG inference_engine::server: Sending chunk with content: ' range' 2025-08-28T00:35:00.535110Z DEBUG inference_engine::server: Streaming token: ' of' 2025-08-28T00:35:00.535133Z DEBUG inference_engine::server: Sending chunk with content: ' of' 2025-08-28T00:35:00.663383Z DEBUG inference_engine::server: Streaming token: ' personalities' 2025-08-28T00:35:00.663402Z DEBUG inference_engine::server: Sending chunk with content: ' personalities' 2025-08-28T00:35:00.792808Z DEBUG inference_engine::server: Streaming token: ' shaped' 2025-08-28T00:35:00.792836Z DEBUG inference_engine::server: Sending chunk with content: ' shaped' 2025-08-28T00:35:00.921350Z DEBUG inference_engine::server: Streaming token: ' over' 2025-08-28T00:35:00.921378Z DEBUG inference_engine::server: Sending chunk with content: ' over' 2025-08-28T00:35:01.049207Z DEBUG inference_engine::server: Streaming token: ' countless' 2025-08-28T00:35:01.049228Z DEBUG inference_engine::server: Sending chunk with content: ' countless' 2025-08-28T00:35:01.178030Z DEBUG inference_engine::server: Streaming token: ' generations' 2025-08-28T00:35:01.178058Z DEBUG inference_engine::server: Sending chunk with content: ' generations' 2025-08-28T00:35:01.306740Z DEBUG inference_engine::server: Streaming token: '،' 2025-08-28T00:35:01.306762Z DEBUG inference_engine::server: Sending chunk with content: '،' 2025-08-28T00:35:01.434552Z DEBUG inference_engine::server: Streaming token: ' making' 2025-08-28T00:35:01.434573Z DEBUG inference_engine::server: Sending chunk with content: ' making' 2025-08-28T00:35:01.562628Z DEBUG inference_engine::server: Streaming token: ' them' 2025-08-28T00:35:01.562647Z DEBUG inference_engine::server: Sending chunk with content: ' them' 2025-08-28T00:35:01.690509Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:01.690530Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:01.819330Z DEBUG inference_engine::server: Streaming token: ' unique' 2025-08-28T00:35:01.819351Z DEBUG inference_engine::server: Sending chunk with content: ' unique' 2025-08-28T00:35:01.947700Z DEBUG inference_engine::server: Streaming token: ' members' 2025-08-28T00:35:01.947720Z DEBUG inference_engine::server: Sending chunk with content: ' members' 2025-08-28T00:35:02.076045Z DEBUG inference_engine::server: Streaming token: ' within' 2025-08-28T00:35:02.076071Z DEBUG inference_engine::server: Sending chunk with content: ' within' 2025-08-28T00:35:02.204721Z DEBUG inference_engine::server: Streaming token: ' our' 2025-08-28T00:35:02.204743Z DEBUG inference_engine::server: Sending chunk with content: ' our' 2025-08-28T00:35:02.333483Z DEBUG inference_engine::server: Streaming token: ' families' 2025-08-28T00:35:02.333506Z DEBUG inference_engine::server: Sending chunk with content: ' families' 2025-08-28T00:35:02.461905Z DEBUG inference_engine::server: Streaming token: ',' 2025-08-28T00:35:02.461926Z DEBUG inference_engine::server: Sending chunk with content: ',' 2025-08-28T00:35:02.589686Z DEBUG inference_engine::server: Streaming token: ' enriching' 2025-08-28T00:35:02.589710Z DEBUG inference_engine::server: Sending chunk with content: ' enriching' 2025-08-28T00:35:02.718589Z DEBUG inference_engine::server: Streaming token: ' lives' 2025-08-28T00:35:02.718618Z DEBUG inference_engine::server: Sending chunk with content: ' lives' 2025-08-28T00:35:02.846614Z DEBUG inference_engine::server: Streaming token: ' in' 2025-08-28T00:35:02.846635Z DEBUG inference_engine::server: Sending chunk with content: ' in' 2025-08-28T00:35:02.976008Z DEBUG inference_engine::server: Streaming token: ' profound' 2025-08-28T00:35:02.976028Z DEBUG inference_engine::server: Sending chunk with content: ' profound' 2025-08-28T00:35:03.107573Z DEBUG inference_engine::server: Streaming token: ' ways' 2025-08-28T00:35:03.107594Z DEBUG inference_engine::server: Sending chunk with content: ' ways' 2025-08-28T00:35:03.236069Z DEBUG inference_engine::server: Streaming token: ' regardless' 2025-08-28T00:35:03.236088Z DEBUG inference_engine::server: Sending chunk with content: ' regardless' 2025-08-28T00:35:03.364469Z DEBUG inference_engine::server: Streaming token: ' if' 2025-08-28T00:35:03.364492Z DEBUG inference_engine::server: Sending chunk with content: ' if' 2025-08-28T00:35:03.492669Z DEBUG inference_engine::server: Streaming token: ' we' 2025-08-28T00:35:03.492690Z DEBUG inference_engine::server: Sending chunk with content: ' we' 2025-08-28T00:35:03.621905Z DEBUG inference_engine::server: Streaming token: ' choose' 2025-08-28T00:35:03.621927Z DEBUG inference_engine::server: Sending chunk with content: ' choose' 2025-08-28T00:35:03.754038Z DEBUG inference_engine::server: Streaming token: ' to' 2025-08-28T00:35:03.754059Z DEBUG inference_engine::server: Sending chunk with content: ' to' 2025-08-28T00:35:03.883044Z DEBUG inference_engine::server: Streaming token: ' own' 2025-08-28T00:35:03.883066Z DEBUG inference_engine::server: Sending chunk with content: ' own' 2025-08-28T00:35:04.010685Z DEBUG inference_engine::server: Streaming token: ' one' 2025-08-28T00:35:04.010703Z DEBUG inference_engine::server: Sending chunk with content: ' one' 2025-08-28T00:35:04.139584Z DEBUG inference_engine::server: Streaming token: ' ourselves' 2025-08-28T00:35:04.139609Z DEBUG inference_engine::server: Sending chunk with content: ' ourselves' 2025-08-28T00:35:04.269128Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:04.269144Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:04.398132Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:04.398151Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:04.527627Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:04.527654Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:04.657885Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:04.657914Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:04.788586Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:04.788607Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:04.918153Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:04.918179Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:05.048431Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:05.048460Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:05.178022Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:05.178055Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:05.308805Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:05.308833Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:05.438091Z DEBUG inference_engine::server: Streaming token: ' truly' 2025-08-28T00:35:05.438113Z DEBUG inference_engine::server: Sending chunk with content: ' truly' 2025-08-28T00:35:05.561745Z  INFO inference_engine::text_generation: Streaming Text generation completed in 13.22s 2025-08-28T00:35:05.561767Z  INFO inference_engine::text_generation: Streaming Tokens generated: 100 2025-08-28T00:35:05.561770Z  INFO inference_engine::text_generation: Streaming Generation speed: 7.56 tokens/second 2025-08-28T00:35:05.561772Z  INFO inference_engine::text_generation: Streaming Average time per token: 129.65ms 2025-08-28T00:35:05.561774Z DEBUG inference_engine::text_generation: Streaming - Forward pass: 124.98ms (96.4%) 2025-08-28T00:35:05.561776Z DEBUG inference_engine::text_generation: Streaming - Repeat penalty: 74.02µs (0.1%) 2025-08-28T00:35:05.561778Z DEBUG inference_engine::text_generation: Streaming - Sampling: 5.85ms (4.5%) 2025-08-28T00:35:05.561779Z  INFO inference_engine::text_generation: Streaming Total request time: 13.25s 2025-08-28T00:35:05.561781Z DEBUG inference_engine::text_generation: Streaming - Tokenization: 217.04µs (0.0%) 2025-08-28T00:35:05.561782Z DEBUG inference_engine::text_generation: Streaming - Generation: 13.22s (99.8%) 2025-08-28T00:35:05.561783Z DEBUG inference_engine::text_generation: Streaming - Final decoding: 8.17µs (0.0%) 2025-08-28T00:35:30.845607Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: tower_http::trace::on_request: started processing request 2025-08-28T00:35:30.845670Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: inference_engine::server: Formatted prompt: user You are a helpful assistant who responds thoughtfully and concisely. Write a paragraph about cats model 2025-08-28T00:35:30.845684Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: predict_otron_9000::middleware::metrics: POST /v1/chat/completions 200 OK - 0 ms 2025-08-28T00:35:30.845691Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=0 ms status=200 2025-08-28T00:35:30.845719Z DEBUG inference_engine::text_generation: Cleared penalty cache for new generation (streaming mode) 2025-08-28T00:35:30.845789Z DEBUG inference_engine::text_generation: Streaming Tokenization completed in 65.50µs 2025-08-28T00:35:30.845794Z DEBUG inference_engine::text_generation: Streaming Input tokens: 26 2025-08-28T00:35:30.871195Z DEBUG inference_engine::text_generation: Using special generation approach for gemma-2/gemma-3 models (streaming) ./run_server.sh: line 7: 30566 Killed: 9 cargo run --bin predict-otron-9000 --release