Refactor `apply_cached_repeat_penalty` for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.

mirror of https://github.com/geoffsee/predict-otron-9001.git synced 2025-09-08 22:46:44 +00:00

Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions.

- Add CPU fallback support for text generation when primary device is unsupported
- Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors
- Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations
- Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing

chat completion endpoint functions with gemma3 (no streaming)

Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking

This commit is contained in:

geoffsee

2025-08-26 01:30:26 -04:00

parent 7dd23213c9

commit 8338750beb

64 changed files with 14997 additions and 220 deletions

									
										3

.cargo/config.toml
									
										Normal file
									
												View File
												
				@@ -0,0 +1,3 @@

				# Ensure getrandom works on wasm32-unknown-unknown without needing manual RUSTFLAGS

				[target.wasm32-unknown-unknown]

				rustflags = ["--cfg", "getrandom_backend=\"wasm_js\""]

Refactor apply_cached_repeat_penalty for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.

3 .cargo/config.toml Normal file Unescape Escape View File

Refactor `apply_cached_repeat_penalty` for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.

3

.cargo/config.toml Normal file

View File