predict-otron-9001/Trunk.toml at 8338750bebe36970d589f8d82d1c38a376cb8c37 - predict-otron-9001 - Gitea: Git with a cup of tea

geoffsee/predict-otron-9001

mirror of https://github.com/geoffsee/predict-otron-9001.git synced 2025-09-08 22:46:44 +00:00

Files

geoffsee 8338750beb Refactor apply_cached_repeat_penalty for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.

Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions.

- Add CPU fallback support for text generation when primary device is unsupported
- Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors
- Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations
- Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing

chat completion endpoint functions with gemma3 (no streaming)

Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking

2025-08-27 16:15:01 -04:00

7 lines

204 B

TOML

Raw Blame History

 [build]
 # Set the RUSTFLAGS environment variable for getrandom's WebAssembly support
 rustflags = ["--cfg", "getrandom_backend=\"wasm_js\""]
 [serve]
 # Use the same port as in the run.sh script
 port = 8788