update docs

2025-09-08 22:46:44 +00:00 · 2025-08-28 12:54:09 -04:00
parent 0488bddfdb
commit d04340d9ac
18 changed files with 22 additions and 651 deletions
--- a/crates/embeddings-engine/README.md
+++ b/crates/embeddings-engine/README.md
@@ -0,0 +1,4 @@
+# Embeddings Engine
+
+A high-performance text embeddings service that generates vector representations of text using state-of-the-art models. 
+This crate wraps the fastembed crate to provide embeddings and partially adapts the openai specification.  
--- a/crates/helm-chart-tool/Cargo.toml
+++ b/crates/helm-chart-tool/Cargo.toml
@@ -0,0 +1,18 @@
+[package]
+name = "helm-chart-tool"
+version = "0.1.0"
+edition = "2021"
+
+[workspace]
+
+[[bin]]
+name = "helm-chart-tool"
+path = "src/main.rs"
+
+[dependencies]
+toml = "0.8"
+serde = { version = "1.0", features = ["derive"] }
+serde_json = "1.0"
+anyhow = "1.0"
+clap = { version = "4.0", features = ["derive"] }
+walkdir = "2.0"
--- a/crates/helm-chart-tool/README.md
+++ b/crates/helm-chart-tool/README.md
@@ -0,0 +1,218 @@
+# Helm Chart Tool
+
+A Rust-based tool that automatically generates Helm charts from Cargo.toml metadata in Rust workspace projects.
+
+## Overview
+
+This tool scans a Rust workspace for crates containing Docker/Kubernetes metadata in their `Cargo.toml` files and generates a complete, production-ready Helm chart with deployments, services, ingress, and configuration templates.
+
+## Features
+
+- **Automatic Service Discovery**: Scans all `Cargo.toml` files in a workspace to find services with Kubernetes metadata
+- **Complete Helm Chart Generation**: Creates Chart.yaml, values.yaml, deployment templates, service templates, ingress template, and helper templates
+- **Metadata Extraction**: Uses `[package.metadata.kube]` sections from Cargo.toml files to extract:
+  - Docker image names
+  - Service ports
+  - Replica counts
+  - Service names
+- **Production Ready**: Generated charts include health checks, resource limits, node selectors, affinity rules, and tolerations
+- **Helm Best Practices**: Follows Helm chart conventions and passes `helm lint` validation
+
+## Installation
+
+Build the tool from source:
+
+```bash
+cd helm-chart-tool
+cargo build --release
+```
+
+The binary will be available at `target/release/helm-chart-tool`.
+
+## Usage
+
+### Basic Usage
+
+```bash
+./target/release/helm-chart-tool --workspace /path/to/rust/workspace --output ./my-helm-chart
+```
+
+### Command Line Options
+
+- `--workspace, -w PATH`: Path to the workspace root (default: `.`)
+- `--output, -o PATH`: Output directory for the Helm chart (default: `./helm-chart`)
+- `--name, -n NAME`: Name of the Helm chart (default: `predict-otron-9000`)
+
+### Example
+
+```bash
+# Generate chart from current workspace
+./target/release/helm-chart-tool
+
+# Generate chart from specific workspace with custom output
+./target/release/helm-chart-tool -w /path/to/my/workspace -o ./charts/my-app -n my-application
+```
+
+## Cargo.toml Metadata Format
+
+The tool expects crates to have Kubernetes metadata in their `Cargo.toml` files:
+
+```toml
+[package]
+name = "my-service"
+version = "0.1.0"
+
+# Required: Kubernetes metadata
+[package.metadata.kube]
+image = "ghcr.io/myorg/my-service:latest"
+replicas = 1
+port = 8080
+
+# Optional: Docker Compose metadata (currently not used but parsed)
+[package.metadata.compose]
+image = "ghcr.io/myorg/my-service:latest"
+port = 8080
+```
+
+### Required Fields
+
+- `image`: Full Docker image name including registry and tag
+- `port`: Port number the service listens on
+- `replicas`: Number of replicas to deploy (optional, defaults to 1)
+
+## Generated Chart Structure
+
+The tool generates a complete Helm chart with the following structure:
+
+```
+helm-chart/
+├── Chart.yaml              # Chart metadata
+├── values.yaml             # Default configuration values
+├── .helmignore            # Files to ignore when packaging
+└── templates/
+    ├── _helpers.tpl        # Template helper functions
+    ├── ingress.yaml        # Ingress configuration (optional)
+    ├── {service}-deployment.yaml    # Deployment for each service
+    └── {service}-service.yaml       # Service for each service
+```
+
+### Generated Files
+
+#### Chart.yaml
+- Standard Helm v2 chart metadata
+- Includes keywords for AI/ML applications
+- Maintainer information
+
+#### values.yaml
+- Individual service configurations
+- Resource limits and requests
+- Service types and ports
+- Node selectors, affinity, and tolerations
+- Global settings and ingress configuration
+
+#### Deployment Templates
+- Kubernetes Deployment manifests
+- Health checks (liveness and readiness probes)
+- Resource management
+- Container port configuration from metadata
+- Support for node selectors, affinity, and tolerations
+
+#### Service Templates
+- Kubernetes Service manifests
+- ClusterIP services by default
+- Port mapping from metadata
+
+#### Ingress Template
+- Optional ingress configuration
+- Disabled by default
+- Configurable through values.yaml
+
+## Example Output
+
+When run against the predict-otron-9000 workspace, the tool generates:
+
+```bash
+$ ./target/release/helm-chart-tool --workspace .. --output ../generated-helm-chart
+Parsing workspace at: ..
+Output directory: ../generated-helm-chart
+Chart name: predict-otron-9000
+Found 4 services:
+  - leptos-app: ghcr.io/geoffsee/leptos-app:latest (port 8788)
+  - inference-engine: ghcr.io/geoffsee/inference-service:latest (port 8080)
+  - embeddings-engine: ghcr.io/geoffsee/embeddings-service:latest (port 8080)
+  - predict-otron-9000: ghcr.io/geoffsee/predict-otron-9000:latest (port 8080)
+Helm chart generated successfully!
+```
+
+## Validation
+
+The generated charts pass Helm validation:
+
+```bash
+$ helm lint generated-helm-chart
+==> Linting generated-helm-chart
+[INFO] Chart.yaml: icon is recommended
+1 chart(s) linted, 0 chart(s) failed
+```
+
+## Deployment
+
+Deploy the generated chart:
+
+```bash
+# Install the chart
+helm install my-release ./generated-helm-chart
+
+# Upgrade the chart
+helm upgrade my-release ./generated-helm-chart
+
+# Uninstall the chart
+helm uninstall my-release
+```
+
+### Customization
+
+Customize the deployment by modifying `values.yaml`:
+
+```yaml
+# Enable ingress
+ingress:
+  enabled: true
+  className: "nginx"
+  hosts:
+    - host: my-app.example.com
+
+# Adjust resources for a specific service
+predict_otron_9000:
+  replicas: 3
+  resources:
+    limits:
+      memory: "4Gi"
+      cpu: "2000m"
+    requests:
+      memory: "2Gi"
+      cpu: "1000m"
+```
+
+## Requirements
+
+- Rust 2021+ (for building the tool)
+- Helm 3.x (for deploying the generated charts)
+- Kubernetes cluster (for deployment)
+
+## Limitations
+
+- Currently assumes all services need health checks on `/health` endpoint
+- Resource limits are hardcoded defaults (can be overridden in values.yaml)
+- Ingress configuration is basic (can be customized through values.yaml)
+
+## Contributing
+
+1. Add new features to the tool
+2. Test with various Cargo.toml metadata configurations
+3. Validate generated charts with `helm lint`
+4. Ensure charts deploy successfully to test clusters
+
+## License
+
+This tool is part of the predict-otron-9000 project and follows the same license terms.
--- a/crates/helm-chart-tool/src/main.rs
+++ b/crates/helm-chart-tool/src/main.rs
@@ -0,0 +1,515 @@
+use anyhow::{Context, Result};
+use clap::{Arg, Command};
+use serde::{Deserialize, Serialize};
+use std::collections::HashMap;
+use std::fs;
+use std::path::{Path, PathBuf};
+use walkdir::WalkDir;
+
+#[derive(Debug, Deserialize)]
+struct CargoToml {
+    package: Option<Package>,
+}
+
+#[derive(Debug, Deserialize)]
+struct Package {
+    name: String,
+    metadata: Option<Metadata>,
+}
+
+#[derive(Debug, Deserialize)]
+struct Metadata {
+    kube: Option<KubeMetadata>,
+    compose: Option<ComposeMetadata>,
+}
+
+#[derive(Debug, Deserialize)]
+struct KubeMetadata {
+    image: String,
+    replicas: Option<u32>,
+    port: u16,
+}
+
+#[derive(Debug, Deserialize)]
+struct ComposeMetadata {
+    image: Option<String>,
+    port: Option<u16>,
+}
+
+#[derive(Debug, Clone)]
+struct ServiceInfo {
+    name: String,
+    image: String,
+    port: u16,
+    replicas: u32,
+}
+
+fn main() -> Result<()> {
+    let matches = Command::new("helm-chart-tool")
+        .about("Generate Helm charts from Cargo.toml metadata")
+        .arg(
+            Arg::new("workspace")
+                .short('w')
+                .long("workspace")
+                .value_name("PATH")
+                .help("Path to the workspace root")
+                .default_value("."),
+        )
+        .arg(
+            Arg::new("output")
+                .short('o')
+                .long("output")
+                .value_name("PATH")
+                .help("Output directory for the Helm chart")
+                .default_value("./helm-chart"),
+        )
+        .arg(
+            Arg::new("chart-name")
+                .short('n')
+                .long("name")
+                .value_name("NAME")
+                .help("Name of the Helm chart")
+                .default_value("predict-otron-9000"),
+        )
+        .get_matches();
+
+    let workspace_path = matches.get_one::<String>("workspace").unwrap();
+    let output_path = matches.get_one::<String>("output").unwrap();
+    let chart_name = matches.get_one::<String>("chart-name").unwrap();
+
+    println!("Parsing workspace at: {}", workspace_path);
+    println!("Output directory: {}", output_path);
+    println!("Chart name: {}", chart_name);
+
+    let services = discover_services(workspace_path)?;
+    println!("Found {} services:", services.len());
+    for service in &services {
+        println!("  - {}: {} (port {})", service.name, service.image, service.port);
+    }
+
+    generate_helm_chart(output_path, chart_name, &services)?;
+    println!("Helm chart generated successfully!");
+
+    Ok(())
+}
+
+fn discover_services(workspace_path: &str) -> Result<Vec<ServiceInfo>> {
+    let workspace_root = Path::new(workspace_path);
+    let mut services = Vec::new();
+
+    // Find all Cargo.toml files in the workspace
+    for entry in WalkDir::new(workspace_root)
+        .into_iter()
+        .filter_map(|e| e.ok())
+    {
+        if entry.file_name() == "Cargo.toml" && entry.path() != workspace_root.join("Cargo.toml") {
+            if let Ok(service_info) = parse_cargo_toml(entry.path()) {
+                services.push(service_info);
+            }
+        }
+    }
+
+    Ok(services)
+}
+
+fn parse_cargo_toml(path: &Path) -> Result<ServiceInfo> {
+    let content = fs::read_to_string(path)
+        .with_context(|| format!("Failed to read Cargo.toml at {:?}", path))?;
+    
+    let cargo_toml: CargoToml = toml::from_str(&content)
+        .with_context(|| format!("Failed to parse Cargo.toml at {:?}", path))?;
+
+    let package = cargo_toml.package
+        .ok_or_else(|| anyhow::anyhow!("No package section found in {:?}", path))?;
+
+    let metadata = package.metadata
+        .ok_or_else(|| anyhow::anyhow!("No metadata section found in {:?}", path))?;
+
+    let kube_metadata = metadata.kube
+        .ok_or_else(|| anyhow::anyhow!("No kube metadata found in {:?}", path))?;
+
+    Ok(ServiceInfo {
+        name: package.name,
+        image: kube_metadata.image,
+        port: kube_metadata.port,
+        replicas: kube_metadata.replicas.unwrap_or(1),
+    })
+}
+
+fn generate_helm_chart(output_path: &str, chart_name: &str, services: &[ServiceInfo]) -> Result<()> {
+    let chart_dir = Path::new(output_path);
+    let templates_dir = chart_dir.join("templates");
+
+    // Create directories
+    fs::create_dir_all(&templates_dir)?;
+
+    // Generate Chart.yaml
+    generate_chart_yaml(chart_dir, chart_name)?;
+
+    // Generate values.yaml
+    generate_values_yaml(chart_dir, services)?;
+
+    // Generate templates for each service
+    for service in services {
+        generate_deployment_template(&templates_dir, service)?;
+        generate_service_template(&templates_dir, service)?;
+    }
+
+    // Generate ingress template
+    generate_ingress_template(&templates_dir, services)?;
+
+    // Generate helper templates
+    generate_helpers_template(&templates_dir)?;
+
+    // Generate .helmignore
+    generate_helmignore(chart_dir)?;
+
+    Ok(())
+}
+
+fn generate_chart_yaml(chart_dir: &Path, chart_name: &str) -> Result<()> {
+    let chart_yaml = format!(
+        r#"apiVersion: v2
+name: {}
+description: A Helm chart for the predict-otron-9000 AI platform
+type: application
+version: 0.1.0
+appVersion: "0.1.0"
+keywords:
+  - ai
+  - llm
+  - inference
+  - embeddings
+  - chat
+maintainers:
+  - name: predict-otron-9000-team
+"#,
+        chart_name
+    );
+
+    fs::write(chart_dir.join("Chart.yaml"), chart_yaml)?;
+    Ok(())
+}
+
+fn generate_values_yaml(chart_dir: &Path, services: &[ServiceInfo]) -> Result<()> {
+    let mut values = String::from(
+        r#"# Default values for predict-otron-9000
+# This is a YAML-formatted file.
+
+global:
+  imagePullPolicy: IfNotPresent
+  serviceType: ClusterIP
+
+# Ingress configuration
+ingress:
+  enabled: false
+  className: ""
+  annotations: {}
+  hosts:
+    - host: predict-otron-9000.local
+      paths:
+        - path: /
+          pathType: Prefix
+          backend:
+            service:
+              name: predict-otron-9000
+              port:
+                number: 8080
+  tls: []
+
+"#,
+    );
+
+    for service in services {
+        let service_config = format!(
+            r#"{}:
+  image:
+    repository: {}
+    tag: "latest"
+    pullPolicy: IfNotPresent
+  replicas: {}
+  service:
+    type: ClusterIP
+    port: {}
+  resources:
+    limits:
+      memory: "1Gi"
+      cpu: "1000m"
+    requests:
+      memory: "512Mi"
+      cpu: "250m"
+  nodeSelector: {{}}
+  tolerations: []
+  affinity: {{}}
+
+"#,
+            service.name.replace("-", "_"),
+            service.image.split(':').next().unwrap_or(&service.image),
+            service.replicas,
+            service.port
+        );
+        values.push_str(&service_config);
+    }
+
+    fs::write(chart_dir.join("values.yaml"), values)?;
+    Ok(())
+}
+
+fn generate_deployment_template(templates_dir: &Path, service: &ServiceInfo) -> Result<()> {
+    let service_name_underscore = service.name.replace("-", "_");
+    let deployment_template = format!(
+        r#"apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: {{{{ include "predict-otron-9000.fullname" . }}}}-{}
+  labels:
+    {{{{- include "predict-otron-9000.labels" . | nindent 4 }}}}
+    app.kubernetes.io/component: {}
+spec:
+  replicas: {{{{ .Values.{}.replicas }}}}
+  selector:
+    matchLabels:
+      {{{{- include "predict-otron-9000.selectorLabels" . | nindent 6 }}}}
+      app.kubernetes.io/component: {}
+  template:
+    metadata:
+      labels:
+        {{{{- include "predict-otron-9000.selectorLabels" . | nindent 8 }}}}
+        app.kubernetes.io/component: {}
+    spec:
+      containers:
+        - name: {}
+          image: "{{{{ .Values.{}.image.repository }}}}:{{{{ .Values.{}.image.tag }}}}"
+          imagePullPolicy: {{{{ .Values.{}.image.pullPolicy }}}}
+          ports:
+            - name: http
+              containerPort: {}
+              protocol: TCP
+          livenessProbe:
+            httpGet:
+              path: /health
+              port: http
+            initialDelaySeconds: 30
+            periodSeconds: 10
+          readinessProbe:
+            httpGet:
+              path: /health
+              port: http
+            initialDelaySeconds: 5
+            periodSeconds: 5
+          resources:
+            {{{{- toYaml .Values.{}.resources | nindent 12 }}}}
+      {{{{- with .Values.{}.nodeSelector }}}}
+      nodeSelector:
+        {{{{- toYaml . | nindent 8 }}}}
+      {{{{- end }}}}
+      {{{{- with .Values.{}.affinity }}}}
+      affinity:
+        {{{{- toYaml . | nindent 8 }}}}
+      {{{{- end }}}}
+      {{{{- with .Values.{}.tolerations }}}}
+      tolerations:
+        {{{{- toYaml . | nindent 8 }}}}
+      {{{{- end }}}}
+"#,
+        service.name,
+        service.name,
+        service_name_underscore,
+        service.name,
+        service.name,
+        service.name,
+        service_name_underscore,
+        service_name_underscore,
+        service_name_underscore,
+        service.port,
+        service_name_underscore,
+        service_name_underscore,
+        service_name_underscore,
+        service_name_underscore
+    );
+
+    let filename = format!("{}-deployment.yaml", service.name);
+    fs::write(templates_dir.join(filename), deployment_template)?;
+    Ok(())
+}
+
+fn generate_service_template(templates_dir: &Path, service: &ServiceInfo) -> Result<()> {
+    let service_template = format!(
+        r#"apiVersion: v1
+kind: Service
+metadata:
+  name: {{{{ include "predict-otron-9000.fullname" . }}}}-{}
+  labels:
+    {{{{- include "predict-otron-9000.labels" . | nindent 4 }}}}
+    app.kubernetes.io/component: {}
+spec:
+  type: {{{{ .Values.{}.service.type }}}}
+  ports:
+    - port: {{{{ .Values.{}.service.port }}}}
+      targetPort: http
+      protocol: TCP
+      name: http
+  selector:
+    {{{{- include "predict-otron-9000.selectorLabels" . | nindent 4 }}}}
+    app.kubernetes.io/component: {}
+"#,
+        service.name,
+        service.name,
+        service.name.replace("-", "_"),
+        service.name.replace("-", "_"),
+        service.name
+    );
+
+    let filename = format!("{}-service.yaml", service.name);
+    fs::write(templates_dir.join(filename), service_template)?;
+    Ok(())
+}
+
+fn generate_ingress_template(templates_dir: &Path, services: &[ServiceInfo]) -> Result<()> {
+    let ingress_template = r#"{{- if .Values.ingress.enabled -}}
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: {{ include "predict-otron-9000.fullname" . }}
+  labels:
+    {{- include "predict-otron-9000.labels" . | nindent 4 }}
+  {{- with .Values.ingress.annotations }}
+  annotations:
+    {{- toYaml . | nindent 4 }}
+  {{- end }}
+spec:
+  {{- if .Values.ingress.className }}
+  ingressClassName: {{ .Values.ingress.className }}
+  {{- end }}
+  {{- if .Values.ingress.tls }}
+  tls:
+    {{- range .Values.ingress.tls }}
+    - hosts:
+        {{- range .hosts }}
+        - {{ . | quote }}
+        {{- end }}
+      secretName: {{ .secretName }}
+    {{- end }}
+  {{- end }}
+  rules:
+    {{- range .Values.ingress.hosts }}
+    - host: {{ .host | quote }}
+      http:
+        paths:
+          {{- range .paths }}
+          - path: {{ .path }}
+            {{- if .pathType }}
+            pathType: {{ .pathType }}
+            {{- end }}
+            backend:
+              service:
+                name: {{ include "predict-otron-9000.fullname" $ }}-{{ .backend.service.name }}
+                port:
+                  number: {{ .backend.service.port.number }}
+          {{- end }}
+    {{- end }}
+{{- end }}
+"#;
+
+    fs::write(templates_dir.join("ingress.yaml"), ingress_template)?;
+    Ok(())
+}
+
+fn generate_helpers_template(templates_dir: &Path) -> Result<()> {
+    let helpers_template = r#"{{/*
+Expand the name of the chart.
+*/}}
+{{- define "predict-otron-9000.name" -}}
+{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
+{{- end }}
+
+{{/*
+Create a default fully qualified app name.
+We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
+If release name contains chart name it will be used as a full name.
+*/}}
+{{- define "predict-otron-9000.fullname" -}}
+{{- if .Values.fullnameOverride }}
+{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
+{{- else }}
+{{- $name := default .Chart.Name .Values.nameOverride }}
+{{- if contains $name .Release.Name }}
+{{- .Release.Name | trunc 63 | trimSuffix "-" }}
+{{- else }}
+{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
+{{- end }}
+{{- end }}
+{{- end }}
+
+{{/*
+Create chart name and version as used by the chart label.
+*/}}
+{{- define "predict-otron-9000.chart" -}}
+{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
+{{- end }}
+
+{{/*
+Common labels
+*/}}
+{{- define "predict-otron-9000.labels" -}}
+helm.sh/chart: {{ include "predict-otron-9000.chart" . }}
+{{ include "predict-otron-9000.selectorLabels" . }}
+{{- if .Chart.AppVersion }}
+app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
+{{- end }}
+app.kubernetes.io/managed-by: {{ .Release.Service }}
+{{- end }}
+
+{{/*
+Selector labels
+*/}}
+{{- define "predict-otron-9000.selectorLabels" -}}
+app.kubernetes.io/name: {{ include "predict-otron-9000.name" . }}
+app.kubernetes.io/instance: {{ .Release.Name }}
+{{- end }}
+
+{{/*
+Create the name of the service account to use
+*/}}
+{{- define "predict-otron-9000.serviceAccountName" -}}
+{{- if .Values.serviceAccount.create }}
+{{- default (include "predict-otron-9000.fullname" .) .Values.serviceAccount.name }}
+{{- else }}
+{{- default "default" .Values.serviceAccount.name }}
+{{- end }}
+{{- end }}
+"#;
+
+    fs::write(templates_dir.join("_helpers.tpl"), helpers_template)?;
+    Ok(())
+}
+
+fn generate_helmignore(chart_dir: &Path) -> Result<()> {
+    let helmignore_content = r#"# Patterns to ignore when building packages.
+# This supports shell glob matching, relative path matching, and
+# negation (prefixed with !). Only one pattern per line.
+.DS_Store
+# Common VCS dirs
+.git/
+.gitignore
+.bzr/
+.bzrignore
+.hg/
+.hgignore
+.svn/
+# Common backup files
+*.swp
+*.bak
+*.tmp
+*.orig
+*~
+# Various IDEs
+.project
+.idea/
+*.tmproj
+.vscode/
+"#;
+
+    fs::write(chart_dir.join(".helmignore"), helmignore_content)?;
+    Ok(())
+}
--- a/crates/inference-engine/api_test.html
+++ b/crates/inference-engine/api_test.html
@@ -1,295 +0,0 @@
-<!DOCTYPE html>
-<html lang="en">
-<head>
-    <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0">
-    <title>OpenAI-Compatible API Tester</title>
-    <style>
-        body {
-            font-family: Arial, sans-serif;
-            max-width: 800px;
-            margin: 0 auto;
-            padding: 20px;
-            line-height: 1.6;
-        }
-        h1, h2 {
-            color: #333;
-        }
-        .container {
-            margin-bottom: 20px;
-        }
-        textarea {
-            width: 100%;
-            height: 150px;
-            padding: 10px;
-            margin-bottom: 10px;
-            border: 1px solid #ddd;
-            border-radius: 4px;
-            font-family: monospace;
-        }
-        button {
-            background-color: #4CAF50;
-            color: white;
-            padding: 10px 15px;
-            border: none;
-            border-radius: 4px;
-            cursor: pointer;
-            font-size: 16px;
-        }
-        button:hover {
-            background-color: #45a049;
-        }
-        pre {
-            background-color: #f5f5f5;
-            padding: 15px;
-            border-radius: 4px;
-            overflow-x: auto;
-            white-space: pre-wrap;
-        }
-        .response {
-            margin-top: 20px;
-        }
-        .error {
-            color: red;
-        }
-        .settings {
-            display: flex;
-            flex-wrap: wrap;
-            gap: 10px;
-            margin-bottom: 15px;
-        }
-        .settings div {
-            display: flex;
-            flex-direction: column;
-        }
-        label {
-            margin-bottom: 5px;
-            font-weight: bold;
-        }
-        input {
-            padding: 8px;
-            border: 1px solid #ddd;
-            border-radius: 4px;
-        }
-        .examples {
-            margin-top: 30px;
-        }
-        .example-btn {
-            background-color: #2196F3;
-            margin-right: 10px;
-            margin-bottom: 10px;
-        }
-        .example-btn:hover {
-            background-color: #0b7dda;
-        }
-    </style>
-</head>
-<body>
-    <h1>OpenAI-Compatible API Tester</h1>
-    <p>Use this page to test the OpenAI-compatible chat completions endpoint of the local inference engine.</p>
-    
-    <div class="container">
-        <h2>Request Settings</h2>
-        <div class="settings">
-            <div>
-                <label for="serverUrl">Server URL:</label>
-                <input type="text" id="serverUrl" value="http://localhost:3777" />
-            </div>
-            <div>
-                <label for="model">Model:</label>
-                <input type="text" id="model" value="gemma-3-1b-it" />
-            </div>
-            <div>
-                <label for="maxTokens">Max Tokens:</label>
-                <input type="number" id="maxTokens" value="150" />
-            </div>
-            <div>
-                <label for="temperature">Temperature:</label>
-                <input type="number" id="temperature" value="0.7" step="0.1" min="0" max="2" />
-            </div>
-            <div>
-                <label for="topP">Top P:</label>
-                <input type="number" id="topP" value="0.9" step="0.1" min="0" max="1" />
-            </div>
-        </div>
-        
-        <h2>Request Body</h2>
-        <textarea id="requestBody">{
-  "model": "gemma-3-1b-it",
-  "messages": [
-    {
-      "role": "user",
-      "content": "Hello, how are you today?"
-    }
-  ],
-  "max_tokens": 150,
-  "temperature": 0.7,
-  "top_p": 0.9
-}</textarea>
-        <button id="sendRequest">Send Request</button>
-        
-        <div class="examples">
-            <h3>Example Requests</h3>
-            <button class="example-btn" id="example1">Basic Question</button>
-            <button class="example-btn" id="example2">Multi-turn Conversation</button>
-            <button class="example-btn" id="example3">Creative Writing</button>
-            <button class="example-btn" id="example4">Code Generation</button>
-        </div>
-        
-        <div class="response">
-            <h2>Response</h2>
-            <pre id="responseOutput">Response will appear here...</pre>
-        </div>
-    </div>
-
-    <script>
-        document.addEventListener('DOMContentLoaded', function() {
-            // Update request body when settings change
-            const serverUrlInput = document.getElementById('serverUrl');
-            const modelInput = document.getElementById('model');
-            const maxTokensInput = document.getElementById('maxTokens');
-            const temperatureInput = document.getElementById('temperature');
-            const topPInput = document.getElementById('topP');
-            const requestBodyTextarea = document.getElementById('requestBody');
-            const responseOutput = document.getElementById('responseOutput');
-            
-            // Function to update request body from settings
-            function updateRequestBodyFromSettings() {
-                try {
-                    const requestBody = JSON.parse(requestBodyTextarea.value);
-                    requestBody.model = modelInput.value;
-                    requestBody.max_tokens = parseInt(maxTokensInput.value);
-                    requestBody.temperature = parseFloat(temperatureInput.value);
-                    requestBody.top_p = parseFloat(topPInput.value);
-                    requestBodyTextarea.value = JSON.stringify(requestBody, null, 2);
-                } catch (error) {
-                    console.error("Error updating request body:", error);
-                }
-            }
-            
-            // Update settings when request body changes
-            function updateSettingsFromRequestBody() {
-                try {
-                    const requestBody = JSON.parse(requestBodyTextarea.value);
-                    if (requestBody.model) modelInput.value = requestBody.model;
-                    if (requestBody.max_tokens) maxTokensInput.value = requestBody.max_tokens;
-                    if (requestBody.temperature) temperatureInput.value = requestBody.temperature;
-                    if (requestBody.top_p) topPInput.value = requestBody.top_p;
-                } catch (error) {
-                    console.error("Error updating settings:", error);
-                }
-            }
-            
-            // Add event listeners for settings changes
-            modelInput.addEventListener('change', updateRequestBodyFromSettings);
-            maxTokensInput.addEventListener('change', updateRequestBodyFromSettings);
-            temperatureInput.addEventListener('change', updateRequestBodyFromSettings);
-            topPInput.addEventListener('change', updateRequestBodyFromSettings);
-            
-            // Add event listener for request body changes
-            requestBodyTextarea.addEventListener('blur', updateSettingsFromRequestBody);
-            
-            // Send request button
-            document.getElementById('sendRequest').addEventListener('click', async function() {
-                try {
-                    responseOutput.textContent = "Sending request...";
-                    const serverUrl = serverUrlInput.value;
-                    const endpoint = '/v1/chat/completions';
-                    const url = serverUrl + endpoint;
-                    
-                    const requestBody = JSON.parse(requestBodyTextarea.value);
-                    
-                    const response = await fetch(url, {
-                        method: 'POST',
-                        headers: {
-                            'Content-Type': 'application/json',
-                        },
-                        body: JSON.stringify(requestBody)
-                    });
-                    
-                    const data = await response.json();
-                    responseOutput.textContent = JSON.stringify(data, null, 2);
-                } catch (error) {
-                    responseOutput.textContent = "Error: " + error.message;
-                    responseOutput.classList.add('error');
-                }
-            });
-            
-            // Example requests
-            document.getElementById('example1').addEventListener('click', function() {
-                requestBodyTextarea.value = JSON.stringify({
-                    model: modelInput.value,
-                    messages: [
-                        {
-                            role: "user",
-                            content: "Who was the 16th president of the United States?"
-                        }
-                    ],
-                    max_tokens: parseInt(maxTokensInput.value),
-                    temperature: parseFloat(temperatureInput.value),
-                    top_p: parseFloat(topPInput.value)
-                }, null, 2);
-            });
-            
-            document.getElementById('example2').addEventListener('click', function() {
-                requestBodyTextarea.value = JSON.stringify({
-                    model: modelInput.value,
-                    messages: [
-                        {
-                            role: "system",
-                            content: "You are a helpful assistant that provides concise answers."
-                        },
-                        {
-                            role: "user",
-                            content: "What is machine learning?"
-                        },
-                        {
-                            role: "assistant",
-                            content: "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed."
-                        },
-                        {
-                            role: "user",
-                            content: "Give me an example of a machine learning algorithm."
-                        }
-                    ],
-                    max_tokens: parseInt(maxTokensInput.value),
-                    temperature: parseFloat(temperatureInput.value),
-                    top_p: parseFloat(topPInput.value)
-                }, null, 2);
-            });
-            
-            document.getElementById('example3').addEventListener('click', function() {
-                requestBodyTextarea.value = JSON.stringify({
-                    model: modelInput.value,
-                    messages: [
-                        {
-                            role: "user",
-                            content: "Write a short poem about artificial intelligence."
-                        }
-                    ],
-                    max_tokens: parseInt(maxTokensInput.value),
-                    temperature: 0.9, // Higher temperature for creative tasks
-                    top_p: 0.9
-                }, null, 2);
-                temperatureInput.value = 0.9;
-            });
-            
-            document.getElementById('example4').addEventListener('click', function() {
-                requestBodyTextarea.value = JSON.stringify({
-                    model: modelInput.value,
-                    messages: [
-                        {
-                            role: "user",
-                            content: "Write a Python function to calculate the Fibonacci sequence up to n terms."
-                        }
-                    ],
-                    max_tokens: parseInt(maxTokensInput.value),
-                    temperature: 0.3, // Lower temperature for code generation
-                    top_p: 0.9
-                }, null, 2);
-                temperatureInput.value = 0.3;
-            });
-        });
-    </script>
-</body>
-</html>
--- a/crates/inference-engine/openai-api-test.js
+++ b/crates/inference-engine/openai-api-test.js
@@ -1,176 +0,0 @@
-// Test requests for the OpenAI-compatible endpoint in the inference server
-// This file contains IIFE (Immediately Invoked Function Expression) JavaScript requests
-// to test the /v1/chat/completions endpoint
-
-// Basic chat completion request
-(async function testBasicChatCompletion() {
-  console.log("Test 1: Basic chat completion request");
-  try {
-    const response = await fetch('http://localhost:3777/v1/chat/completions', {
-      method: 'POST',
-      headers: {
-        'Content-Type': 'application/json',
-      },
-      body: JSON.stringify({
-        model: "gemma-2-2b-it",
-        messages: [
-          {
-            role: "user",
-            content: "Who was the 16th president of the United States?"
-          }
-        ],
-        max_tokens: 100
-      })
-    });
-
-    const data = await response.json();
-    console.log("Response:", JSON.stringify(data, null, 2));
-  } catch (error) {
-    console.error("Error:", error);
-  }
-})();
-
-// Multi-turn conversation
-(async function testMultiTurnConversation() {
-  console.log("\nTest 2: Multi-turn conversation");
-  try {
-    const response = await fetch('http://localhost:3777/v1/chat/completions', {
-      method: 'POST',
-      headers: {
-        'Content-Type': 'application/json',
-      },
-      body: JSON.stringify({
-        model: "gemma-2-2b-it",
-        messages: [
-          {
-            role: "system",
-            content: "You are a helpful assistant that provides concise answers."
-          },
-          {
-            role: "user",
-            content: "What is machine learning?"
-          },
-          {
-            role: "assistant",
-            content: "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed."
-          },
-          {
-            role: "user",
-            content: "Give me an example of a machine learning algorithm."
-          }
-        ],
-        max_tokens: 150
-      })
-    });
-
-    const data = await response.json();
-    console.log("Response:", JSON.stringify(data, null, 2));
-  } catch (error) {
-    console.error("Error:", error);
-  }
-})();
-
-// Request with temperature and top_p parameters
-(async function testTemperatureAndTopP() {
-  console.log("\nTest 3: Request with temperature and top_p parameters");
-  try {
-    const response = await fetch('http://localhost:3777/v1/chat/completions', {
-      method: 'POST',
-      headers: {
-        'Content-Type': 'application/json',
-      },
-      body: JSON.stringify({
-        model: "gemma-2-2b-it",
-        messages: [
-          {
-            role: "user",
-            content: "Write a short poem about artificial intelligence."
-          }
-        ],
-        max_tokens: 200,
-        temperature: 0.8,
-        top_p: 0.9
-      })
-    });
-
-    const data = await response.json();
-    console.log("Response:", JSON.stringify(data, null, 2));
-  } catch (error) {
-    console.error("Error:", error);
-  }
-})();
-
-// Request with streaming enabled
-(async function testStreaming() {
-  console.log("\nTest 4: Request with streaming enabled");
-  try {
-    const response = await fetch('http://localhost:3777/v1/chat/completions', {
-      method: 'POST',
-      headers: {
-        'Content-Type': 'application/json',
-      },
-      body: JSON.stringify({
-        model: "gemma-2-2b-it",
-        messages: [
-          {
-            role: "user",
-            content: "Explain quantum computing in simple terms."
-          }
-        ],
-        max_tokens: 150,
-        stream: true
-      })
-    });
-
-    // Note: Streaming might not be implemented yet, this is to test the API's handling of the parameter
-    if (response.headers.get('content-type')?.includes('text/event-stream')) {
-      console.log("Streaming response detected. Reading stream...");
-      const reader = response.body.getReader();
-      const decoder = new TextDecoder();
-
-      while (true) {
-        const { done, value } = await reader.read();
-        if (done) break;
-
-        const chunk = decoder.decode(value);
-        console.log("Chunk:", chunk);
-      }
-    } else {
-      const data = await response.json();
-      console.log("Non-streaming response:", JSON.stringify(data, null, 2));
-    }
-  } catch (error) {
-    console.error("Error:", error);
-  }
-})();
-
-// Request with a different model
-(async function testDifferentModel() {
-  console.log("\nTest 5: Request with a different model");
-  try {
-    const response = await fetch('http://localhost:3777/v1/chat/completions', {
-      method: 'POST',
-      headers: {
-        'Content-Type': 'application/json',
-      },
-      body: JSON.stringify({
-        model: "gemma-2-2b-it", // Using a different model if available
-        messages: [
-          {
-            role: "user",
-            content: "What are the benefits of renewable energy?"
-          }
-        ],
-        max_tokens: 150
-      })
-    });
-
-    const data = await response.json();
-    console.log("Response:", JSON.stringify(data, null, 2));
-  } catch (error) {
-    console.error("Error:", error);
-  }
-})();
-
-console.log("\nAll test requests have been sent. Check the server logs for more details.");
-console.log("To run the server, use: cargo run --bin inference-engine -- --server");
--- a/crates/predict-otron-9000/README.md
+++ b/crates/predict-otron-9000/README.md
@@ -0,0 +1,8 @@
+# predict-otron-9000
+
+This is an extensible axum/tokio hybrid combining [embeddings-engine](../embeddings-engine), [inference-engine](../inference-engine), and [leptos-app](../leptos-app).
+
+
+# Notes
+- When `server_mode` is Standalone (default), the instance contains all components necessary for inference.
+- When `server_mode` is HighAvailability, automatic scaling of inference and embeddings; proxies to inference and embeddings services via dns