From 8d2b85b0b9e9ab8eed790bf728a39469ec434eaa Mon Sep 17 00:00:00 2001
From: geoffsee <>
Date: Sun, 31 Aug 2025 19:27:15 -0400
Subject: [PATCH] update docs
---
.gitignore | 2 -
README.md | 35 +++++-----
crates/chat-ui/README.md | 41 +++++++++++-
crates/cli/README.md | 11 ++--
crates/embeddings-engine/README.md | 100 ++++++++++++++++++++++++++++-
crates/helm-chart-tool/README.md | 2 +-
docs/ARCHITECTURE.md | 11 ++--
7 files changed, 167 insertions(+), 35 deletions(-)
diff --git a/.gitignore b/.gitignore
index 4896b72..0ce0aea 100644
--- a/.gitignore
+++ b/.gitignore
@@ -74,8 +74,6 @@ venv/
# Backup files
*.bak
*.backup
-*~
-/scripts/cli
!/scripts/cli.ts
/**/.*.bun-build
/AGENTS.md
diff --git a/README.md b/README.md
index 1aa8dd9..220e17a 100644
--- a/README.md
+++ b/README.md
@@ -42,7 +42,7 @@ The system supports both CPU and GPU acceleration (CUDA/Metal), with intelligent
### Workspace Structure
-The project uses a 7-crate Rust workspace plus TypeScript components:
+The project uses a 9-crate Rust workspace plus TypeScript components:
```
crates/
@@ -51,17 +51,18 @@ crates/
├── gemma-runner/ # Gemma model inference via Candle (Rust 2021)
├── llama-runner/ # Llama model inference via Candle (Rust 2021)
├── embeddings-engine/ # FastEmbed embeddings service (Rust 2024)
-├── leptos-app/ # WASM web frontend (Rust 2021)
+├── chat-ui/ # WASM web frontend (Rust 2021)
├── helm-chart-tool/ # Kubernetes deployment tooling (Rust 2024)
-└── scripts/
- └── cli.ts # TypeScript/Bun CLI client
+└── cli/ # CLI client crate (Rust 2024)
+ └── package/
+ └── cli.ts # TypeScript/Bun CLI client
```
### Service Architecture
- **Main Server** (port 8080): Orchestrates inference and embeddings services
- **Embeddings Service** (port 8080): Standalone FastEmbed service with OpenAI API compatibility
-- **Web Frontend** (port 8788): cargo leptos SSR app
+- **Web Frontend** (port 8788): chat-ui WASM app
- **CLI Client**: TypeScript/Bun client for testing and automation
### Deployment Modes
@@ -144,26 +145,26 @@ cargo build --bin embeddings-engine --release
#### Web Frontend (Port 8788)
```bash
-cd crates/leptos-app
+cd crates/chat-ui
./run.sh
```
-- Serves Leptos WASM frontend on port 8788
+- Serves chat-ui WASM frontend on port 8788
- Sets required RUSTFLAGS for WebAssembly getrandom support
- Auto-reloads during development
#### TypeScript CLI Client
```bash
# List available models
-bun run scripts/cli.ts --list-models
+cd crates/cli/package && bun run cli.ts --list-models
# Chat completion
-bun run scripts/cli.ts "What is the capital of France?"
+cd crates/cli/package && bun run cli.ts "What is the capital of France?"
# With specific model
-bun run scripts/cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
+cd crates/cli/package && bun run cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
# Show help
-bun run scripts/cli.ts --help
+cd crates/cli/package && bun run cli.ts --help
```
## API Usage
@@ -279,7 +280,7 @@ cargo test --workspace
**End-to-end test script:**
```bash
-./smoke_test.sh
+./scripts/smoke_test.sh
```
This script:
@@ -368,7 +369,7 @@ All services include Docker metadata in `Cargo.toml`:
- Port: 8080
**Web Frontend:**
-- Image: `ghcr.io/geoffsee/leptos-app:latest`
+- Image: `ghcr.io/geoffsee/chat-ui:latest`
- Port: 8788
**Docker Compose:**
@@ -427,7 +428,7 @@ For Kubernetes deployment details, see the [ARCHITECTURE.md](docs/ARCHITECTURE.m
**Symptom:** WASM compilation failures
**Solution:**
1. Install required targets: `rustup target add wasm32-unknown-unknown`
-2. Check RUSTFLAGS in leptos-app/run.sh
+2. Check RUSTFLAGS in chat-ui/run.sh
### Network/Timeout Issues
**Symptom:** First-time model downloads timing out
@@ -458,18 +459,18 @@ curl -s http://localhost:8080/v1/models | jq
**CLI client test:**
```bash
-bun run scripts/cli.ts "What is 2+2?"
+cd crates/cli/package && bun run cli.ts "What is 2+2?"
```
**Web frontend:**
```bash
-cd crates/leptos-app && ./run.sh &
+cd crates/chat-ui && ./run.sh &
# Navigate to http://localhost:8788
```
**Integration test:**
```bash
-./smoke_test.sh
+./scripts/smoke_test.sh
```
**Cleanup:**
diff --git a/crates/chat-ui/README.md b/crates/chat-ui/README.md
index 63181d1..edb321e 100644
--- a/crates/chat-ui/README.md
+++ b/crates/chat-ui/README.md
@@ -1,2 +1,41 @@
# chat-ui
-This is served by the predict-otron-9000 server. This needs to be built before the server.
\ No newline at end of file
+
+A WASM-based web chat interface for the predict-otron-9000 AI platform.
+
+## Overview
+
+The chat-ui provides a real-time web interface for interacting with language models through the predict-otron-9000 server. Built with Leptos and compiled to WebAssembly, it offers a modern chat experience with streaming response support.
+
+## Features
+
+- Real-time chat interface with the inference server
+- Streaming response support
+- Conversation history
+- Responsive web design
+- WebAssembly-powered for optimal performance
+
+## Building and Running
+
+### Prerequisites
+- Rust toolchain with WASM target: `rustup target add wasm32-unknown-unknown`
+- The predict-otron-9000 server must be running on port 8080
+
+### Development Server
+```bash
+cd crates/chat-ui
+./run.sh
+```
+
+This starts the development server on port 8788 with auto-reload capabilities.
+
+### Usage
+1. Start the predict-otron-9000 server: `./scripts/run_server.sh`
+2. Start the chat-ui: `cd crates/chat-ui && ./run.sh`
+3. Navigate to `http://localhost:8788`
+4. Start chatting with your AI models!
+
+## Technical Details
+- Built with Leptos framework
+- Compiled to WebAssembly for browser execution
+- Communicates with predict-otron-9000 API via HTTP
+- Sets required RUSTFLAGS for WebAssembly getrandom support
\ No newline at end of file
diff --git a/crates/cli/README.md b/crates/cli/README.md
index 0644108..f93bdab 100644
--- a/crates/cli/README.md
+++ b/crates/cli/README.md
@@ -3,7 +3,7 @@
A Rust/Typescript Hybrid
```console
-./cli [options] [prompt]
+bun run cli.ts [options] [prompt]
Simple CLI tool for testing the local OpenAI-compatible API server.
@@ -14,10 +14,11 @@ Options:
--help Show this help message
Examples:
- ./cli "What is the capital of France?"
- ./cli --model gemma-3-1b-it --prompt "Hello, world!"
- ./cli --prompt "Who was the 16th president of the United States?"
- ./cli --list-models
+ cd crates/cli/package
+ bun run cli.ts "What is the capital of France?"
+ bun run cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
+ bun run cli.ts --prompt "Who was the 16th president of the United States?"
+ bun run cli.ts --list-models
The server must be running at http://localhost:8080
```
\ No newline at end of file
diff --git a/crates/embeddings-engine/README.md b/crates/embeddings-engine/README.md
index c47ea5a..2ad58b9 100644
--- a/crates/embeddings-engine/README.md
+++ b/crates/embeddings-engine/README.md
@@ -1,4 +1,100 @@
# Embeddings Engine
-A high-performance text embeddings service that generates vector representations of text using state-of-the-art models.
-This crate wraps the fastembed crate to provide embeddings and partially adapts the openai specification.
\ No newline at end of file
+A high-performance text embeddings service that generates vector representations of text using state-of-the-art models. This crate wraps the FastEmbed library to provide embeddings with OpenAI-compatible API endpoints.
+
+## Overview
+
+The embeddings-engine provides a standalone service for generating text embeddings that can be used for semantic search, similarity comparisons, and other NLP tasks. It's designed to be compatible with OpenAI's embeddings API format.
+
+## Features
+
+- **OpenAI-Compatible API**: `/v1/embeddings` endpoint matching OpenAI's specification
+- **FastEmbed Integration**: Powered by the FastEmbed library for high-quality embeddings
+- **Multiple Model Support**: Support for various embedding models
+- **High Performance**: Optimized for fast embedding generation
+- **Standalone Service**: Can run independently or as part of the predict-otron-9000 platform
+
+## Building and Running
+
+### Prerequisites
+- Rust toolchain
+- Internet connection for initial model downloads
+
+### Standalone Server
+```bash
+cargo run --bin embeddings-engine --release
+```
+
+The service will start on port 8080 by default.
+
+## API Usage
+
+### Generate Embeddings
+
+**Endpoint**: `POST /v1/embeddings`
+
+**Request Body**:
+```json
+{
+ "input": "Your text to embed",
+ "model": "nomic-embed-text-v1.5"
+}
+```
+
+**Response**:
+```json
+{
+ "object": "list",
+ "data": [
+ {
+ "object": "embedding",
+ "index": 0,
+ "embedding": [0.1, 0.2, 0.3, ...]
+ }
+ ],
+ "model": "nomic-embed-text-v1.5",
+ "usage": {
+ "prompt_tokens": 0,
+ "total_tokens": 0
+ }
+}
+```
+
+### Example Usage
+
+**Using cURL**:
+```bash
+curl -s http://localhost:8080/v1/embeddings \
+ -H "Content-Type: application/json" \
+ -d '{
+ "input": "The quick brown fox jumps over the lazy dog",
+ "model": "nomic-embed-text-v1.5"
+ }' | jq
+```
+
+**Using Python OpenAI Client**:
+```python
+from openai import OpenAI
+
+client = OpenAI(
+ base_url="http://localhost:8080/v1",
+ api_key="dummy" # Not validated but required by client
+)
+
+response = client.embeddings.create(
+ input="Your text here",
+ model="nomic-embed-text-v1.5"
+)
+
+print(response.data[0].embedding)
+```
+
+## Configuration
+
+The service can be configured through environment variables:
+- `SERVER_PORT`: Port to run on (default: 8080)
+- `RUST_LOG`: Logging level (default: info)
+
+## Integration
+
+This service is designed to work seamlessly with the predict-otron-9000 main server, but can also be deployed independently for dedicated embeddings workloads.
\ No newline at end of file
diff --git a/crates/helm-chart-tool/README.md b/crates/helm-chart-tool/README.md
index 58ee48d..f216d55 100644
--- a/crates/helm-chart-tool/README.md
+++ b/crates/helm-chart-tool/README.md
@@ -137,7 +137,7 @@ Parsing workspace at: ..
Output directory: ../generated-helm-chart
Chart name: predict-otron-9000
Found 4 services:
- - leptos-app: ghcr.io/geoffsee/leptos-app:latest (port 8788)
+ - chat-ui: ghcr.io/geoffsee/chat-ui:latest (port 8788)
- inference-engine: ghcr.io/geoffsee/inference-service:latest (port 8080)
- embeddings-engine: ghcr.io/geoffsee/embeddings-service:latest (port 8080)
- predict-otron-9000: ghcr.io/geoffsee/predict-otron-9000:latest (port 8080)
diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md
index b256389..44ffbc6 100644
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
@@ -52,7 +52,7 @@ graph TB
## Workspace Structure
-The project uses a 7-crate Rust workspace with TypeScript tooling, designed for maximum flexibility in deployment configurations.
+The project uses a 9-crate Rust workspace with TypeScript tooling, designed for maximum flexibility in deployment configurations.
```mermaid
graph TD
@@ -69,18 +69,15 @@ graph TD
end
subgraph "Frontend"
- D[leptos-app
Edition: 2021
Port: 3000/8788
WASM/SSR]
+ D[chat-ui
Edition: 2021
Port: 8788
WASM UI]
end
subgraph "Tooling"
L[helm-chart-tool
Edition: 2024
K8s deployment]
+ E[cli
Edition: 2024
TypeScript/Bun CLI]
end
end
- subgraph "External Tooling"
- E[scripts/cli.ts
TypeScript/Bun
OpenAI SDK]
- end
-
subgraph "Dependencies"
A --> B
A --> C
@@ -193,7 +190,7 @@ graph TB
end
subgraph "Frontend"
- D[leptos-app Pod
:8788
ClusterIP Service]
+ D[chat-ui Pod
:8788
ClusterIP Service]
end
subgraph "Ingress"