update docs

2025-09-08 22:46:44 +00:00 · 2025-08-31 19:27:15 -04:00
parent 4570780666
commit 8d2b85b0b9
7 changed files with 167 additions and 35 deletions
--- a/crates/chat-ui/README.md
+++ b/crates/chat-ui/README.md
@@ -1,2 +1,41 @@
 # chat-ui
-This is served by the predict-otron-9000 server. This needs to be built before the server.
+
+A WASM-based web chat interface for the predict-otron-9000 AI platform.
+
+## Overview
+
+The chat-ui provides a real-time web interface for interacting with language models through the predict-otron-9000 server. Built with Leptos and compiled to WebAssembly, it offers a modern chat experience with streaming response support.
+
+## Features
+
+- Real-time chat interface with the inference server
+- Streaming response support
+- Conversation history
+- Responsive web design
+- WebAssembly-powered for optimal performance
+
+## Building and Running
+
+### Prerequisites
+- Rust toolchain with WASM target: `rustup target add wasm32-unknown-unknown`
+- The predict-otron-9000 server must be running on port 8080
+
+### Development Server
+```bash
+cd crates/chat-ui
+./run.sh
+```
+
+This starts the development server on port 8788 with auto-reload capabilities.
+
+### Usage
+1. Start the predict-otron-9000 server: `./scripts/run_server.sh`
+2. Start the chat-ui: `cd crates/chat-ui && ./run.sh`
+3. Navigate to `http://localhost:8788`
+4. Start chatting with your AI models!
+
+## Technical Details
+- Built with Leptos framework
+- Compiled to WebAssembly for browser execution
+- Communicates with predict-otron-9000 API via HTTP
+- Sets required RUSTFLAGS for WebAssembly getrandom support
--- a/crates/cli/README.md
+++ b/crates/cli/README.md
@@ -3,7 +3,7 @@
 A Rust/Typescript Hybrid

 ```console
-./cli [options] [prompt]
+bun run cli.ts [options] [prompt]

 Simple CLI tool for testing the local OpenAI-compatible API server.

@@ -14,10 +14,11 @@ Options:
  --help              Show this help message

 Examples:
-  ./cli "What is the capital of France?"
-  ./cli --model gemma-3-1b-it --prompt "Hello, world!"
-  ./cli --prompt "Who was the 16th president of the United States?"
-  ./cli --list-models
+  cd crates/cli/package
+  bun run cli.ts "What is the capital of France?"
+  bun run cli.ts --model gemma-3-1b-it --prompt "Hello, world!"
+  bun run cli.ts --prompt "Who was the 16th president of the United States?"
+  bun run cli.ts --list-models

 The server must be running at http://localhost:8080
 ```
--- a/crates/embeddings-engine/README.md
+++ b/crates/embeddings-engine/README.md
@@ -1,4 +1,100 @@
 # Embeddings Engine

-A high-performance text embeddings service that generates vector representations of text using state-of-the-art models. 
-This crate wraps the fastembed crate to provide embeddings and partially adapts the openai specification.  
+A high-performance text embeddings service that generates vector representations of text using state-of-the-art models. This crate wraps the FastEmbed library to provide embeddings with OpenAI-compatible API endpoints.
+
+## Overview
+
+The embeddings-engine provides a standalone service for generating text embeddings that can be used for semantic search, similarity comparisons, and other NLP tasks. It's designed to be compatible with OpenAI's embeddings API format.
+
+## Features
+
+- **OpenAI-Compatible API**: `/v1/embeddings` endpoint matching OpenAI's specification
+- **FastEmbed Integration**: Powered by the FastEmbed library for high-quality embeddings
+- **Multiple Model Support**: Support for various embedding models
+- **High Performance**: Optimized for fast embedding generation
+- **Standalone Service**: Can run independently or as part of the predict-otron-9000 platform
+
+## Building and Running
+
+### Prerequisites
+- Rust toolchain
+- Internet connection for initial model downloads
+
+### Standalone Server
+```bash
+cargo run --bin embeddings-engine --release
+```
+
+The service will start on port 8080 by default.
+
+## API Usage
+
+### Generate Embeddings
+
+**Endpoint**: `POST /v1/embeddings`
+
+**Request Body**:
+```json
+{
+  "input": "Your text to embed",
+  "model": "nomic-embed-text-v1.5"
+}
+```
+
+**Response**:
+```json
+{
+  "object": "list",
+  "data": [
+    {
+      "object": "embedding",
+      "index": 0,
+      "embedding": [0.1, 0.2, 0.3, ...]
+    }
+  ],
+  "model": "nomic-embed-text-v1.5",
+  "usage": {
+    "prompt_tokens": 0,
+    "total_tokens": 0
+  }
+}
+```
+
+### Example Usage
+
+**Using cURL**:
+```bash
+curl -s http://localhost:8080/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -d '{
+    "input": "The quick brown fox jumps over the lazy dog",
+    "model": "nomic-embed-text-v1.5"
+  }' | jq
+```
+
+**Using Python OpenAI Client**:
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    base_url="http://localhost:8080/v1",
+    api_key="dummy"  # Not validated but required by client
+)
+
+response = client.embeddings.create(
+    input="Your text here",
+    model="nomic-embed-text-v1.5"
+)
+
+print(response.data[0].embedding)
+```
+
+## Configuration
+
+The service can be configured through environment variables:
+- `SERVER_PORT`: Port to run on (default: 8080)
+- `RUST_LOG`: Logging level (default: info)
+
+## Integration
+
+This service is designed to work seamlessly with the predict-otron-9000 main server, but can also be deployed independently for dedicated embeddings workloads.
--- a/crates/helm-chart-tool/README.md
+++ b/crates/helm-chart-tool/README.md
@@ -137,7 +137,7 @@ Parsing workspace at: ..
 Output directory: ../generated-helm-chart
 Chart name: predict-otron-9000
 Found 4 services:
-  - leptos-app: ghcr.io/geoffsee/leptos-app:latest (port 8788)
+  - chat-ui: ghcr.io/geoffsee/chat-ui:latest (port 8788)
  - inference-engine: ghcr.io/geoffsee/inference-service:latest (port 8080)
  - embeddings-engine: ghcr.io/geoffsee/embeddings-service:latest (port 8080)
  - predict-otron-9000: ghcr.io/geoffsee/predict-otron-9000:latest (port 8080)