Commit Graph

7 Commits

Author SHA1 Message Date
geoffsee
0580dc8c5e move cli into crates and stage for release 2025-08-31 13:23:50 -04:00
geoffsee
f5d2a85f2e cleanup, add ci 2025-08-31 10:31:20 -04:00
geoffsee
bfe7c04cf5 Add Rust-based Helm Chart Generator Tool
- Scaffold `helm-chart-tool` with Cargo project files.
- Implement core functionality: parse Cargo.toml, extract Kubernetes metadata, and generate Helm charts.
- Include support for deployments, services, ingress, and helper templates.
- Add README with detailed usage instructions.
- Update `.gitignore` for generated Helm charts and related artifacts.
2025-08-28 08:39:54 -04:00
geoffsee
c8b3561e36 Remove ROOT_CAUSE_ANALYSIS.md and outdated server logs 2025-08-28 08:26:18 -04:00
geoffsee
b606adbe5d Add Docker Compose and Kubernetes metadata to Cargo.toml files 2025-08-28 07:56:34 -04:00
geoffsee
8338750beb Refactor apply_cached_repeat_penalty for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.
Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions.

- Add CPU fallback support for text generation when primary device is unsupported
- Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors
- Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations
- Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing

chat completion endpoint functions with gemma3 (no streaming)

Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking
2025-08-27 16:15:01 -04:00
geoffsee
2aa6d4cdf8 Introduce predict-otron-9000: Unified server combining embeddings and inference engines. Includes OpenAI-compatible APIs, full documentation, and example scripts. 2025-08-16 19:11:35 -04:00