mirror of https://github.com/geoffsee/predict-otron-9001.git synced 2025-09-08 22:46:44 +00:00

Files

geoffsee 315ef17605 supports small llama and gemma models

Refactor inference

dedicated crates for llama and gemma inferencing, not integrated

2025-08-29 20:00:41 -04:00

2025-08-29 20:00:41 -04:00

Cargo.toml

remove confusing comments

2025-08-28 16:09:29 -04:00

Dockerfile

2025-08-28 08:26:18 -04:00

README.md

update docs

2025-08-28 12:54:09 -04:00

predict-otron-9000

This is an extensible axum/tokio hybrid combining embeddings-engine, inference-engine, and leptos-app.

Notes

When server_mode is Standalone (default), the instance contains all components necessary for inference.
When server_mode is HighAvailability, automatic scaling of inference and embeddings; proxies to inference and embeddings services via dns