predict-otron-9001

mirror of https://github.com/geoffsee/predict-otron-9001.git synced 2025-09-08 22:46:44 +00:00

Author	SHA1	Message	Date
geoffsee	d06b16bb12	remove confusing comments	2025-08-28 16:09:29 -04:00
geoffsee	62dcc8f5bb	ai generated README.md	2025-08-28 16:04:45 -04:00
Geoff Seemueller	f7001fc72b	remove arbitrary keys for standalone config	2025-08-28 13:19:48 -04:00
Geoff Seemueller	5bce413f8f	Update SERVER_CONFIG.md, replacing Local with Standalone	2025-08-28 13:18:55 -04:00
geoffsee	d9772a67d1	update diagrams to show accurate development configuration	2025-08-28 13:04:17 -04:00
geoffsee	6b709b8ec5	remove weird art	2025-08-28 12:56:07 -04:00
geoffsee	d04340d9ac	update docs	2025-08-28 12:54:09 -04:00
geoffsee	0488bddfdb	Create ARCHITECTURE.md - update stale references to old chat crate	2025-08-28 12:22:05 -04:00
geoffsee	770985afd2	remove stale doc	2025-08-28 12:07:19 -04:00
geoffsee	e38a2d4512	predict-otron-9000 serves a leptos SSR frontend	2025-08-28 12:06:22 -04:00
geoffsee	45d7cd8819	- Introduced `ServerConfig` for handling deployment modes and services. - Added HighAvailability mode for proxying requests to external services. - Maintained Local mode for embedded services. - Updated `README.md` and included `SERVER_CONFIG.md` for detailed documentation.	2025-08-28 09:55:39 -04:00
geoffsee	c96831d494	Add Docker Compose setup for Predict-O-Tron 9000 and Leptos Chat	2025-08-28 08:46:57 -04:00
geoffsee	bfe7c04cf5	Add Rust-based Helm Chart Generator Tool - Scaffold `helm-chart-tool` with Cargo project files. - Implement core functionality: parse Cargo.toml, extract Kubernetes metadata, and generate Helm charts. - Include support for deployments, services, ingress, and helper templates. - Add README with detailed usage instructions. - Update `.gitignore` for generated Helm charts and related artifacts.	2025-08-28 08:39:54 -04:00
geoffsee	c8b3561e36	Remove ROOT_CAUSE_ANALYSIS.md and outdated server logs	2025-08-28 08:26:18 -04:00
geoffsee	b606adbe5d	Add Docker Compose and Kubernetes metadata to Cargo.toml files	2025-08-28 07:56:34 -04:00
geoffsee	9d6cb62b10	Add Dockerfile for Leptos Chat deployment	2025-08-28 07:54:57 -04:00
geoffsee	956d00f596	Add `CLEANUP.md` with identified documentation and code issues. Update README files to fix repository URL, unify descriptions, and clarify Gemma model usage.	2025-08-28 07:24:14 -04:00
geoffsee	719beb3791	- Change default server host to localhost for improved security. - Increase default maximum tokens in CLI configuration to 256. - Refactor and reorganize CLI	2025-08-27 21:47:31 -04:00
geoffsee	766d41af78	- Refactored `build_pipeline` usage to ensure pipeline arguments are cloned. - Introduced `reset_state` for clearing cached state between requests. - Enhanced chat UI with model selector and dynamic model fetching. - Improved error logging and detailed debug messages for chat request flows. - Added fresh instantiation of `TextGeneration` to prevent tensor shape mismatches.	2025-08-27 17:53:50 -04:00
geoffsee	f1b57866e1	remove stale files	2025-08-27 16:36:54 -04:00
geoffsee	9e28e259ad	Add support for listing available models via CLI and HTTP endpoint	2025-08-27 16:35:08 -04:00
geoffsee	432c04d9df	Removed legacy inference engine assets.	2025-08-27 16:19:31 -04:00
geoffsee	8338750beb	Refactor `apply_cached_repeat_penalty` for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models. Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions. - Add CPU fallback support for text generation when primary device is unsupported - Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors - Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations - Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing chat completion endpoint functions with gemma3 (no streaming) Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking	2025-08-27 16:15:01 -04:00
geoffsee	7dd23213c9	fix image path again	2025-08-16 20:11:15 -04:00
geoffsee	dff09dc4d0	fix image path	2025-08-16 20:09:28 -04:00
geoffsee	83f2a8b295	add an image to the readme	2025-08-16 20:08:35 -04:00
geoffsee	b8ba994783	Integrate `create_inference_router` from `inference-engine` into `predict-otron-9000`, simplify server routing, and update dependencies to unify versions.	2025-08-16 19:53:33 -04:00
Geoff Seemueller	411ad78026	Remove stale reference in documentation.	2025-08-16 19:29:11 -04:00
geoffsee	2aa6d4cdf8	Introduce predict-otron-9000: Unified server combining embeddings and inference engines. Includes OpenAI-compatible APIs, full documentation, and example scripts.	2025-08-16 19:11:35 -04:00

29 Commits