- Change default server host to localhost for improved security.

- Increase default maximum tokens in CLI configuration to 256.
- Refactor and reorganize CLI
This commit is contained in:
geoffsee
2025-08-27 21:47:24 -04:00
parent 766d41af78
commit 719beb3791
20 changed files with 1703 additions and 490 deletions

277
server_fresh.txt Normal file
View File

@@ -0,0 +1,277 @@
warning: unused import: `Config as Config1`
--> crates/inference-engine/src/model.rs:2:42
|
2 | use candle_transformers::models::gemma::{Config as Config1, Model as Model1};
| ^^^^^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `Config as Config2`
--> crates/inference-engine/src/model.rs:3:43
|
3 | use candle_transformers::models::gemma2::{Config as Config2, Model as Model2};
| ^^^^^^^^^^^^^^^^^
warning: unused import: `Config as Config3`
--> crates/inference-engine/src/model.rs:4:43
|
4 | use candle_transformers::models::gemma3::{Config as Config3, Model as Model3};
| ^^^^^^^^^^^^^^^^^
warning: unused import: `self`
--> crates/inference-engine/src/server.rs:10:28
|
10 | use futures_util::stream::{self, Stream};
| ^^^^
warning: `inference-engine` (lib) generated 4 warnings (run `cargo fix --lib -p inference-engine` to apply 4 suggestions)
Finished `release` profile [optimized] target(s) in 0.13s
Running `target/release/predict-otron-9000`
avx: false, neon: true, simd128: false, f16c: false
2025-08-28T00:34:39.293635Z  INFO hf_hub: Using token file found "/Users/williamseemueller/.cache/huggingface/token"
retrieved the files in 295.458µs
2025-08-28T00:34:39.294536Z  INFO predict_otron_9000::middleware::metrics: Performance metrics summary:
2025-08-28T00:34:40.507474Z  INFO predict_otron_9000: Unified predict-otron-9000 server listening on 127.0.0.1:8080
2025-08-28T00:34:40.507503Z  INFO predict_otron_9000: Performance metrics tracking enabled - summary logs every 60 seconds
2025-08-28T00:34:40.507508Z  INFO predict_otron_9000: Available endpoints:
2025-08-28T00:34:40.507512Z  INFO predict_otron_9000: GET / - Root endpoint from embeddings-engine
2025-08-28T00:34:40.507515Z  INFO predict_otron_9000: POST /v1/embeddings - Text embeddings
2025-08-28T00:34:40.507517Z  INFO predict_otron_9000: POST /v1/chat/completions - Chat completions
2025-08-28T00:34:52.313606Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2025-08-28T00:34:52.313671Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: inference_engine::server: Formatted prompt: <start_of_turn>user
You are a helpful assistant who responds thoughtfully and concisely.
Write a paragraph about dogs<end_of_turn>
<start_of_turn>model
2025-08-28T00:34:52.313693Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: predict_otron_9000::middleware::metrics: POST /v1/chat/completions 200 OK - 0 ms
2025-08-28T00:34:52.313709Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=0 ms status=200
2025-08-28T00:34:52.313763Z DEBUG inference_engine::text_generation: Cleared penalty cache for new generation (streaming mode)
2025-08-28T00:34:52.313985Z DEBUG inference_engine::text_generation: Streaming Tokenization completed in 217.04µs
2025-08-28T00:34:52.313990Z DEBUG inference_engine::text_generation: Streaming Input tokens: 26
2025-08-28T00:34:52.340937Z DEBUG inference_engine::text_generation: Using special generation approach for gemma-2/gemma-3 models (streaming)
2025-08-28T00:34:52.602691Z DEBUG inference_engine::server: Streaming token: 'Dogs'
2025-08-28T00:34:52.602718Z DEBUG inference_engine::server: Sending chunk with content: 'Dogs'
2025-08-28T00:34:52.769918Z DEBUG inference_engine::server: Streaming token: ' have'
2025-08-28T00:34:52.769949Z DEBUG inference_engine::server: Sending chunk with content: ' have'
2025-08-28T00:34:52.905947Z DEBUG inference_engine::server: Streaming token: ' captivated'
2025-08-28T00:34:52.905977Z DEBUG inference_engine::server: Sending chunk with content: ' captivated'
2025-08-28T00:34:53.040888Z DEBUG inference_engine::server: Streaming token: ' humans'
2025-08-28T00:34:53.040921Z DEBUG inference_engine::server: Sending chunk with content: ' humans'
2025-08-28T00:34:53.177116Z DEBUG inference_engine::server: Streaming token: ' for'
2025-08-28T00:34:53.177145Z DEBUG inference_engine::server: Sending chunk with content: ' for'
2025-08-28T00:34:53.313887Z DEBUG inference_engine::server: Streaming token: ' millennia'
2025-08-28T00:34:53.313920Z DEBUG inference_engine::server: Sending chunk with content: ' millennia'
2025-08-28T00:34:53.444031Z DEBUG inference_engine::server: Streaming token: ','
2025-08-28T00:34:53.444060Z DEBUG inference_engine::server: Sending chunk with content: ','
2025-08-28T00:34:53.571919Z DEBUG inference_engine::server: Streaming token: ' evolving'
2025-08-28T00:34:53.571951Z DEBUG inference_engine::server: Sending chunk with content: ' evolving'
2025-08-28T00:34:53.699811Z DEBUG inference_engine::server: Streaming token: ' from'
2025-08-28T00:34:53.699852Z DEBUG inference_engine::server: Sending chunk with content: ' from'
2025-08-28T00:34:53.828082Z DEBUG inference_engine::server: Streaming token: ' wolves'
2025-08-28T00:34:53.828111Z DEBUG inference_engine::server: Sending chunk with content: ' wolves'
2025-08-28T00:34:53.957276Z DEBUG inference_engine::server: Streaming token: ' to'
2025-08-28T00:34:53.957313Z DEBUG inference_engine::server: Sending chunk with content: ' to'
2025-08-28T00:34:54.093248Z DEBUG inference_engine::server: Streaming token: ' beloved'
2025-08-28T00:34:54.093284Z DEBUG inference_engine::server: Sending chunk with content: ' beloved'
2025-08-28T00:34:54.228357Z DEBUG inference_engine::server: Streaming token: ' companions'
2025-08-28T00:34:54.228385Z DEBUG inference_engine::server: Sending chunk with content: ' companions'
2025-08-28T00:34:54.356315Z DEBUG inference_engine::server: Streaming token: ' offering'
2025-08-28T00:34:54.356349Z DEBUG inference_engine::server: Sending chunk with content: ' offering'
2025-08-28T00:34:54.484051Z DEBUG inference_engine::server: Streaming token: ' unwavering'
2025-08-28T00:34:54.484085Z DEBUG inference_engine::server: Sending chunk with content: ' unwavering'
2025-08-28T00:34:54.613022Z DEBUG inference_engine::server: Streaming token: ' loyalty'
2025-08-28T00:34:54.613061Z DEBUG inference_engine::server: Sending chunk with content: ' loyalty'
2025-08-28T00:34:54.742024Z DEBUG inference_engine::server: Streaming token: ' alongside'
2025-08-28T00:34:54.742043Z DEBUG inference_engine::server: Sending chunk with content: ' alongside'
2025-08-28T00:34:54.869804Z DEBUG inference_engine::server: Streaming token: ' boundless'
2025-08-28T00:34:54.869829Z DEBUG inference_engine::server: Sending chunk with content: ' boundless'
2025-08-28T00:34:54.998140Z DEBUG inference_engine::server: Streaming token: ' affection'
2025-08-28T00:34:54.998165Z DEBUG inference_engine::server: Sending chunk with content: ' affection'
2025-08-28T00:34:55.126560Z DEBUG inference_engine::server: Streaming token: ' '
2025-08-28T00:34:55.126582Z DEBUG inference_engine::server: Sending chunk with content: ' '
2025-08-28T00:34:55.255214Z DEBUG inference_engine::server: Streaming token: ' often'
2025-08-28T00:34:55.255232Z DEBUG inference_engine::server: Sending chunk with content: ' often'
2025-08-28T00:34:55.383529Z DEBUG inference_engine::server: Streaming token: ' fueled'
2025-08-28T00:34:55.383551Z DEBUG inference_engine::server: Sending chunk with content: ' fueled'
2025-08-28T00:34:55.511437Z DEBUG inference_engine::server: Streaming token: ' by'
2025-08-28T00:34:55.511456Z DEBUG inference_engine::server: Sending chunk with content: ' by'
2025-08-28T00:34:55.639748Z DEBUG inference_engine::server: Streaming token: ' their'
2025-08-28T00:34:55.639768Z DEBUG inference_engine::server: Sending chunk with content: ' their'
2025-08-28T00:34:55.767723Z DEBUG inference_engine::server: Streaming token: ' incredible'
2025-08-28T00:34:55.767741Z DEBUG inference_engine::server: Sending chunk with content: ' incredible'
2025-08-28T00:34:55.895796Z DEBUG inference_engine::server: Streaming token: ' ability'
2025-08-28T00:34:55.895817Z DEBUG inference_engine::server: Sending chunk with content: ' ability'
2025-08-28T00:34:56.025191Z DEBUG inference_engine::server: Streaming token: ' at'
2025-08-28T00:34:56.025219Z DEBUG inference_engine::server: Sending chunk with content: ' at'
2025-08-28T00:34:56.153604Z DEBUG inference_engine::server: Streaming token: ' understanding'
2025-08-28T00:34:56.153626Z DEBUG inference_engine::server: Sending chunk with content: ' understanding'
2025-08-28T00:34:56.282571Z DEBUG inference_engine::server: Streaming token: ' human'
2025-08-28T00:34:56.282590Z DEBUG inference_engine::server: Sending chunk with content: ' human'
2025-08-28T00:34:56.411224Z DEBUG inference_engine::server: Streaming token: ' emotion'
2025-08-28T00:34:56.411247Z DEBUG inference_engine::server: Sending chunk with content: ' emotion'
2025-08-28T00:34:56.540028Z DEBUG inference_engine::server: Streaming token: ' through'
2025-08-28T00:34:56.540050Z DEBUG inference_engine::server: Sending chunk with content: ' through'
2025-08-28T00:34:56.668612Z DEBUG inference_engine::server: Streaming token: ' subtle'
2025-08-28T00:34:56.668630Z DEBUG inference_engine::server: Sending chunk with content: ' subtle'
2025-08-28T00:34:56.797698Z DEBUG inference_engine::server: Streaming token: ' cues'
2025-08-28T00:34:56.797716Z DEBUG inference_engine::server: Sending chunk with content: ' cues'
2025-08-28T00:34:56.927032Z DEBUG inference_engine::server: Streaming token: '!'
2025-08-28T00:34:56.927054Z DEBUG inference_engine::server: Sending chunk with content: '!'
2025-08-28T00:34:57.054903Z DEBUG inference_engine::server: Streaming token: ' Beyond'
2025-08-28T00:34:57.054922Z DEBUG inference_engine::server: Sending chunk with content: ' Beyond'
2025-08-28T00:34:57.183890Z DEBUG inference_engine::server: Streaming token: ' companionship'
2025-08-28T00:34:57.183914Z DEBUG inference_engine::server: Sending chunk with content: ' companionship'
2025-08-28T00:34:57.313258Z DEBUG inference_engine::server: Streaming token: ' they'
2025-08-28T00:34:57.313278Z DEBUG inference_engine::server: Sending chunk with content: ' they'
2025-08-28T00:34:57.441875Z DEBUG inference_engine::server: Streaming token: ' provide'
2025-08-28T00:34:57.441897Z DEBUG inference_engine::server: Sending chunk with content: ' provide'
2025-08-28T00:34:57.569839Z DEBUG inference_engine::server: Streaming token: ' crucial'
2025-08-28T00:34:57.569864Z DEBUG inference_engine::server: Sending chunk with content: ' crucial'
2025-08-28T00:34:57.700161Z DEBUG inference_engine::server: Streaming token: ' assistance'
2025-08-28T00:34:57.700184Z DEBUG inference_engine::server: Sending chunk with content: ' assistance'
2025-08-28T00:34:57.828427Z DEBUG inference_engine::server: Streaming token: ' with'
2025-08-28T00:34:57.828453Z DEBUG inference_engine::server: Sending chunk with content: ' with'
2025-08-28T00:34:57.957703Z DEBUG inference_engine::server: Streaming token: ' tasks'
2025-08-28T00:34:57.957727Z DEBUG inference_engine::server: Sending chunk with content: ' tasks'
2025-08-28T00:34:58.085556Z DEBUG inference_engine::server: Streaming token: ' like'
2025-08-28T00:34:58.085579Z DEBUG inference_engine::server: Sending chunk with content: ' like'
2025-08-28T00:34:58.213727Z DEBUG inference_engine::server: Streaming token: ' guarding'
2025-08-28T00:34:58.213750Z DEBUG inference_engine::server: Sending chunk with content: ' guarding'
2025-08-28T00:34:58.342674Z DEBUG inference_engine::server: Streaming token: ' property'
2025-08-28T00:34:58.342696Z DEBUG inference_engine::server: Sending chunk with content: ' property'
2025-08-28T00:34:58.474992Z DEBUG inference_engine::server: Streaming token: ' or'
2025-08-28T00:34:58.475011Z DEBUG inference_engine::server: Sending chunk with content: ' or'
2025-08-28T00:34:58.603613Z DEBUG inference_engine::server: Streaming token: ' assisting'
2025-08-28T00:34:58.603636Z DEBUG inference_engine::server: Sending chunk with content: ' assisting'
2025-08-28T00:34:58.732292Z DEBUG inference_engine::server: Streaming token: ' individuals'
2025-08-28T00:34:58.732316Z DEBUG inference_engine::server: Sending chunk with content: ' individuals'
2025-08-28T00:34:58.861810Z DEBUG inference_engine::server: Streaming token: ' who'
2025-08-28T00:34:58.861847Z DEBUG inference_engine::server: Sending chunk with content: ' who'
2025-08-28T00:34:58.989748Z DEBUG inference_engine::server: Streaming token: ' are'
2025-08-28T00:34:58.989765Z DEBUG inference_engine::server: Sending chunk with content: ' are'
2025-08-28T00:34:59.118088Z DEBUG inference_engine::server: Streaming token: ' blind'
2025-08-28T00:34:59.118105Z DEBUG inference_engine::server: Sending chunk with content: ' blind'
2025-08-28T00:34:59.246722Z DEBUG inference_engine::server: Streaming token: ' and'
2025-08-28T00:34:59.246746Z DEBUG inference_engine::server: Sending chunk with content: ' and'
2025-08-28T00:34:59.375090Z DEBUG inference_engine::server: Streaming token: ' deaf'
2025-08-28T00:34:59.375119Z DEBUG inference_engine::server: Sending chunk with content: ' deaf'
2025-08-28T00:34:59.503369Z DEBUG inference_engine::server: Streaming token: '.'
2025-08-28T00:34:59.503398Z DEBUG inference_engine::server: Sending chunk with content: '.'
2025-08-28T00:34:59.632352Z DEBUG inference_engine::server: Streaming token: ' Their'
2025-08-28T00:34:59.632374Z DEBUG inference_engine::server: Sending chunk with content: ' Their'
2025-08-28T00:34:59.760656Z DEBUG inference_engine::server: Streaming token: ' diverse'
2025-08-28T00:34:59.760675Z DEBUG inference_engine::server: Sending chunk with content: ' diverse'
2025-08-28T00:34:59.889274Z DEBUG inference_engine::server: Streaming token: ' breeds'
2025-08-28T00:34:59.889293Z DEBUG inference_engine::server: Sending chunk with content: ' breeds'
2025-08-28T00:35:00.018013Z DEBUG inference_engine::server: Streaming token: ' reflect'
2025-08-28T00:35:00.018043Z DEBUG inference_engine::server: Sending chunk with content: ' reflect'
2025-08-28T00:35:00.146874Z DEBUG inference_engine::server: Streaming token: ' a'
2025-08-28T00:35:00.146903Z DEBUG inference_engine::server: Sending chunk with content: ' a'
2025-08-28T00:35:00.275232Z DEBUG inference_engine::server: Streaming token: ' fascinating'
2025-08-28T00:35:00.275257Z DEBUG inference_engine::server: Sending chunk with content: ' fascinating'
2025-08-28T00:35:00.403452Z DEBUG inference_engine::server: Streaming token: ' range'
2025-08-28T00:35:00.403472Z DEBUG inference_engine::server: Sending chunk with content: ' range'
2025-08-28T00:35:00.535110Z DEBUG inference_engine::server: Streaming token: ' of'
2025-08-28T00:35:00.535133Z DEBUG inference_engine::server: Sending chunk with content: ' of'
2025-08-28T00:35:00.663383Z DEBUG inference_engine::server: Streaming token: ' personalities'
2025-08-28T00:35:00.663402Z DEBUG inference_engine::server: Sending chunk with content: ' personalities'
2025-08-28T00:35:00.792808Z DEBUG inference_engine::server: Streaming token: ' shaped'
2025-08-28T00:35:00.792836Z DEBUG inference_engine::server: Sending chunk with content: ' shaped'
2025-08-28T00:35:00.921350Z DEBUG inference_engine::server: Streaming token: ' over'
2025-08-28T00:35:00.921378Z DEBUG inference_engine::server: Sending chunk with content: ' over'
2025-08-28T00:35:01.049207Z DEBUG inference_engine::server: Streaming token: ' countless'
2025-08-28T00:35:01.049228Z DEBUG inference_engine::server: Sending chunk with content: ' countless'
2025-08-28T00:35:01.178030Z DEBUG inference_engine::server: Streaming token: ' generations'
2025-08-28T00:35:01.178058Z DEBUG inference_engine::server: Sending chunk with content: ' generations'
2025-08-28T00:35:01.306740Z DEBUG inference_engine::server: Streaming token: '،'
2025-08-28T00:35:01.306762Z DEBUG inference_engine::server: Sending chunk with content: '،'
2025-08-28T00:35:01.434552Z DEBUG inference_engine::server: Streaming token: ' making'
2025-08-28T00:35:01.434573Z DEBUG inference_engine::server: Sending chunk with content: ' making'
2025-08-28T00:35:01.562628Z DEBUG inference_engine::server: Streaming token: ' them'
2025-08-28T00:35:01.562647Z DEBUG inference_engine::server: Sending chunk with content: ' them'
2025-08-28T00:35:01.690509Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:01.690530Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:01.819330Z DEBUG inference_engine::server: Streaming token: ' unique'
2025-08-28T00:35:01.819351Z DEBUG inference_engine::server: Sending chunk with content: ' unique'
2025-08-28T00:35:01.947700Z DEBUG inference_engine::server: Streaming token: ' members'
2025-08-28T00:35:01.947720Z DEBUG inference_engine::server: Sending chunk with content: ' members'
2025-08-28T00:35:02.076045Z DEBUG inference_engine::server: Streaming token: ' within'
2025-08-28T00:35:02.076071Z DEBUG inference_engine::server: Sending chunk with content: ' within'
2025-08-28T00:35:02.204721Z DEBUG inference_engine::server: Streaming token: ' our'
2025-08-28T00:35:02.204743Z DEBUG inference_engine::server: Sending chunk with content: ' our'
2025-08-28T00:35:02.333483Z DEBUG inference_engine::server: Streaming token: ' families'
2025-08-28T00:35:02.333506Z DEBUG inference_engine::server: Sending chunk with content: ' families'
2025-08-28T00:35:02.461905Z DEBUG inference_engine::server: Streaming token: ','
2025-08-28T00:35:02.461926Z DEBUG inference_engine::server: Sending chunk with content: ','
2025-08-28T00:35:02.589686Z DEBUG inference_engine::server: Streaming token: ' enriching'
2025-08-28T00:35:02.589710Z DEBUG inference_engine::server: Sending chunk with content: ' enriching'
2025-08-28T00:35:02.718589Z DEBUG inference_engine::server: Streaming token: ' lives'
2025-08-28T00:35:02.718618Z DEBUG inference_engine::server: Sending chunk with content: ' lives'
2025-08-28T00:35:02.846614Z DEBUG inference_engine::server: Streaming token: ' in'
2025-08-28T00:35:02.846635Z DEBUG inference_engine::server: Sending chunk with content: ' in'
2025-08-28T00:35:02.976008Z DEBUG inference_engine::server: Streaming token: ' profound'
2025-08-28T00:35:02.976028Z DEBUG inference_engine::server: Sending chunk with content: ' profound'
2025-08-28T00:35:03.107573Z DEBUG inference_engine::server: Streaming token: ' ways'
2025-08-28T00:35:03.107594Z DEBUG inference_engine::server: Sending chunk with content: ' ways'
2025-08-28T00:35:03.236069Z DEBUG inference_engine::server: Streaming token: ' regardless'
2025-08-28T00:35:03.236088Z DEBUG inference_engine::server: Sending chunk with content: ' regardless'
2025-08-28T00:35:03.364469Z DEBUG inference_engine::server: Streaming token: ' if'
2025-08-28T00:35:03.364492Z DEBUG inference_engine::server: Sending chunk with content: ' if'
2025-08-28T00:35:03.492669Z DEBUG inference_engine::server: Streaming token: ' we'
2025-08-28T00:35:03.492690Z DEBUG inference_engine::server: Sending chunk with content: ' we'
2025-08-28T00:35:03.621905Z DEBUG inference_engine::server: Streaming token: ' choose'
2025-08-28T00:35:03.621927Z DEBUG inference_engine::server: Sending chunk with content: ' choose'
2025-08-28T00:35:03.754038Z DEBUG inference_engine::server: Streaming token: ' to'
2025-08-28T00:35:03.754059Z DEBUG inference_engine::server: Sending chunk with content: ' to'
2025-08-28T00:35:03.883044Z DEBUG inference_engine::server: Streaming token: ' own'
2025-08-28T00:35:03.883066Z DEBUG inference_engine::server: Sending chunk with content: ' own'
2025-08-28T00:35:04.010685Z DEBUG inference_engine::server: Streaming token: ' one'
2025-08-28T00:35:04.010703Z DEBUG inference_engine::server: Sending chunk with content: ' one'
2025-08-28T00:35:04.139584Z DEBUG inference_engine::server: Streaming token: ' ourselves'
2025-08-28T00:35:04.139609Z DEBUG inference_engine::server: Sending chunk with content: ' ourselves'
2025-08-28T00:35:04.269128Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:04.269144Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:04.398132Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:04.398151Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:04.527627Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:04.527654Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:04.657885Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:04.657914Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:04.788586Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:04.788607Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:04.918153Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:04.918179Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:05.048431Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:05.048460Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:05.178022Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:05.178055Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:05.308805Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:05.308833Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:05.438091Z DEBUG inference_engine::server: Streaming token: ' truly'
2025-08-28T00:35:05.438113Z DEBUG inference_engine::server: Sending chunk with content: ' truly'
2025-08-28T00:35:05.561745Z  INFO inference_engine::text_generation: Streaming Text generation completed in 13.22s
2025-08-28T00:35:05.561767Z  INFO inference_engine::text_generation: Streaming Tokens generated: 100
2025-08-28T00:35:05.561770Z  INFO inference_engine::text_generation: Streaming Generation speed: 7.56 tokens/second
2025-08-28T00:35:05.561772Z  INFO inference_engine::text_generation: Streaming Average time per token: 129.65ms
2025-08-28T00:35:05.561774Z DEBUG inference_engine::text_generation: Streaming - Forward pass: 124.98ms (96.4%)
2025-08-28T00:35:05.561776Z DEBUG inference_engine::text_generation: Streaming - Repeat penalty: 74.02µs (0.1%)
2025-08-28T00:35:05.561778Z DEBUG inference_engine::text_generation: Streaming - Sampling: 5.85ms (4.5%)
2025-08-28T00:35:05.561779Z  INFO inference_engine::text_generation: Streaming Total request time: 13.25s
2025-08-28T00:35:05.561781Z DEBUG inference_engine::text_generation: Streaming - Tokenization: 217.04µs (0.0%)
2025-08-28T00:35:05.561782Z DEBUG inference_engine::text_generation: Streaming - Generation: 13.22s (99.8%)
2025-08-28T00:35:05.561783Z DEBUG inference_engine::text_generation: Streaming - Final decoding: 8.17µs (0.0%)
2025-08-28T00:35:30.845607Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: tower_http::trace::on_request: started processing request
2025-08-28T00:35:30.845670Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: inference_engine::server: Formatted prompt: <start_of_turn>user
You are a helpful assistant who responds thoughtfully and concisely.
Write a paragraph about cats<end_of_turn>
<start_of_turn>model
2025-08-28T00:35:30.845684Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: predict_otron_9000::middleware::metrics: POST /v1/chat/completions 200 OK - 0 ms
2025-08-28T00:35:30.845691Z DEBUG request{method=POST uri=/v1/chat/completions version=HTTP/1.1}: tower_http::trace::on_response: finished processing request latency=0 ms status=200
2025-08-28T00:35:30.845719Z DEBUG inference_engine::text_generation: Cleared penalty cache for new generation (streaming mode)
2025-08-28T00:35:30.845789Z DEBUG inference_engine::text_generation: Streaming Tokenization completed in 65.50µs
2025-08-28T00:35:30.845794Z DEBUG inference_engine::text_generation: Streaming Input tokens: 26
2025-08-28T00:35:30.871195Z DEBUG inference_engine::text_generation: Using special generation approach for gemma-2/gemma-3 models (streaming)
./run_server.sh: line 7: 30566 Killed: 9 cargo run --bin predict-otron-9000 --release