Files
predict-otron-9001/crates/predict-otron-9000
geoffsee 315ef17605 supports small llama and gemma models
Refactor inference

dedicated crates for llama and gemma inferencing, not integrated
2025-08-29 20:00:41 -04:00
..
2025-08-29 20:00:41 -04:00
2025-08-28 16:09:29 -04:00
2025-08-28 12:54:09 -04:00

predict-otron-9000

This is an extensible axum/tokio hybrid combining embeddings-engine, inference-engine, and leptos-app.

Notes

  • When server_mode is Standalone (default), the instance contains all components necessary for inference.
  • When server_mode is HighAvailability, automatic scaling of inference and embeddings; proxies to inference and embeddings services via dns