Commit Graph

29 Commits

Author SHA1 Message Date
geoffsee
d06b16bb12 remove confusing comments 2025-08-28 16:09:29 -04:00
geoffsee
62dcc8f5bb ai generated README.md 2025-08-28 16:04:45 -04:00
Geoff Seemueller
f7001fc72b remove arbitrary keys for standalone config 2025-08-28 13:19:48 -04:00
Geoff Seemueller
5bce413f8f Update SERVER_CONFIG.md, replacing Local with Standalone 2025-08-28 13:18:55 -04:00
geoffsee
d9772a67d1 update diagrams to show accurate development configuration 2025-08-28 13:04:17 -04:00
geoffsee
6b709b8ec5 remove weird art 2025-08-28 12:56:07 -04:00
geoffsee
d04340d9ac update docs 2025-08-28 12:54:09 -04:00
geoffsee
0488bddfdb Create ARCHITECTURE.md - update stale references to old chat crate 2025-08-28 12:22:05 -04:00
geoffsee
770985afd2 remove stale doc 2025-08-28 12:07:19 -04:00
geoffsee
e38a2d4512 predict-otron-9000 serves a leptos SSR frontend 2025-08-28 12:06:22 -04:00
geoffsee
45d7cd8819 - Introduced ServerConfig for handling deployment modes and services.
- Added HighAvailability mode for proxying requests to external services.
- Maintained Local mode for embedded services.
- Updated `README.md` and included `SERVER_CONFIG.md` for detailed documentation.
2025-08-28 09:55:39 -04:00
geoffsee
c96831d494 Add Docker Compose setup for Predict-O-Tron 9000 and Leptos Chat 2025-08-28 08:46:57 -04:00
geoffsee
bfe7c04cf5 Add Rust-based Helm Chart Generator Tool
- Scaffold `helm-chart-tool` with Cargo project files.
- Implement core functionality: parse Cargo.toml, extract Kubernetes metadata, and generate Helm charts.
- Include support for deployments, services, ingress, and helper templates.
- Add README with detailed usage instructions.
- Update `.gitignore` for generated Helm charts and related artifacts.
2025-08-28 08:39:54 -04:00
geoffsee
c8b3561e36 Remove ROOT_CAUSE_ANALYSIS.md and outdated server logs 2025-08-28 08:26:18 -04:00
geoffsee
b606adbe5d Add Docker Compose and Kubernetes metadata to Cargo.toml files 2025-08-28 07:56:34 -04:00
geoffsee
9d6cb62b10 Add Dockerfile for Leptos Chat deployment 2025-08-28 07:54:57 -04:00
geoffsee
956d00f596 Add CLEANUP.md with identified documentation and code issues. Update README files to fix repository URL, unify descriptions, and clarify Gemma model usage. 2025-08-28 07:24:14 -04:00
geoffsee
719beb3791 - Change default server host to localhost for improved security.
- Increase default maximum tokens in CLI configuration to 256.
- Refactor and reorganize CLI
2025-08-27 21:47:31 -04:00
geoffsee
766d41af78 - Refactored build_pipeline usage to ensure pipeline arguments are cloned.
- Introduced `reset_state` for clearing cached state between requests.
- Enhanced chat UI with model selector and dynamic model fetching.
- Improved error logging and detailed debug messages for chat request flows.
- Added fresh instantiation of `TextGeneration` to prevent tensor shape mismatches.
2025-08-27 17:53:50 -04:00
geoffsee
f1b57866e1 remove stale files 2025-08-27 16:36:54 -04:00
geoffsee
9e28e259ad Add support for listing available models via CLI and HTTP endpoint 2025-08-27 16:35:08 -04:00
geoffsee
432c04d9df Removed legacy inference engine assets. 2025-08-27 16:19:31 -04:00
geoffsee
8338750beb Refactor apply_cached_repeat_penalty for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.
Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions.

- Add CPU fallback support for text generation when primary device is unsupported
- Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors
- Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations
- Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing

chat completion endpoint functions with gemma3 (no streaming)

Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking
2025-08-27 16:15:01 -04:00
geoffsee
7dd23213c9 fix image path again 2025-08-16 20:11:15 -04:00
geoffsee
dff09dc4d0 fix image path 2025-08-16 20:09:28 -04:00
geoffsee
83f2a8b295 add an image to the readme 2025-08-16 20:08:35 -04:00
geoffsee
b8ba994783 Integrate create_inference_router from inference-engine into predict-otron-9000, simplify server routing, and update dependencies to unify versions. 2025-08-16 19:53:33 -04:00
Geoff Seemueller
411ad78026 Remove stale reference in documentation. 2025-08-16 19:29:11 -04:00
geoffsee
2aa6d4cdf8 Introduce predict-otron-9000: Unified server combining embeddings and inference engines. Includes OpenAI-compatible APIs, full documentation, and example scripts. 2025-08-16 19:11:35 -04:00