From 956d00f5967e3f7a0f73c7d01227bfd98f6889ed Mon Sep 17 00:00:00 2001 From: geoffsee <> Date: Thu, 28 Aug 2025 07:24:14 -0400 Subject: [PATCH] Add `CLEANUP.md` with identified documentation and code issues. Update README files to fix repository URL, unify descriptions, and clarify Gemma model usage. --- .aiignore | 7 ++++ CLEANUP.md | 59 +++++++++++++++++++++++++++++++ README.md | 18 ++++++---- crates/inference-engine/README.md | 13 ------- 4 files changed, 78 insertions(+), 19 deletions(-) create mode 100644 .aiignore create mode 100644 CLEANUP.md diff --git a/.aiignore b/.aiignore new file mode 100644 index 0000000..e36cc7b --- /dev/null +++ b/.aiignore @@ -0,0 +1,7 @@ +.idea/ +.fastembed_cache/ +target/ +/.output.txt +/*.iml +dist +node_modules/ diff --git a/CLEANUP.md b/CLEANUP.md new file mode 100644 index 0000000..a0fa2c4 --- /dev/null +++ b/CLEANUP.md @@ -0,0 +1,59 @@ +# CLEANUP.md + +This document tracks items requiring cleanup in the predict-otron-9000 project, identified during README updates on 2025-08-28. + +## Documentation Issues + +### Repository URL Inconsistencies +- **File**: `crates/inference-engine/README.md` (lines 27-28) +- **Issue**: References incorrect repository URL `https://github.com/seemueller-io/open-web-agent-rs.git` +- **Action**: Should reference the correct predict-otron-9000 repository URL +- **Priority**: High + +### Model Information Discrepancies +- **File**: Main `README.md` +- **Issue**: Does not specify that inference-engine specifically uses Gemma models (1B, 2B, 7B, 9B variants) +- **Action**: Main README should clarify the specific model types supported +- **Priority**: Medium + +### Build Instructions Inconsistency +- **Files**: Main `README.md` vs `crates/inference-engine/README.md` +- **Issue**: Different build commands and approaches between main and component READMEs +- **Main README**: Uses `cargo build --release` and `./run_server.sh` +- **Inference README**: Uses `cargo build -p inference-engine --release` +- **Action**: Standardize build instructions across all READMEs +- **Priority**: Medium + +### Missing Component Details in Main README +- **File**: Main `README.md` +- **Issue**: Lacks specific details about: + - Exact embedding model used (Nomic Embed Text v1.5) + - Specific LLM models supported (Gemma variants) + - WebAssembly nature of leptos-chat component +- **Action**: Add more specific technical details to main README +- **Priority**: Low + +## Code Structure Issues + +### Unified Server Reference +- **File**: Main `README.md` (line 26) +- **Issue**: Claims there's a "Main unified server that combines both engines" but unclear if this exists +- **Action**: Verify if there's actually a unified server or if this is outdated documentation +- **Priority**: Medium + +### Script References +- **File**: Main `README.md` +- **Issue**: References `./run_server.sh` but needs verification that this script works as documented +- **Action**: Test and update script documentation if necessary +- **Priority**: Low + +## API Documentation +- **Files**: Both READMEs +- **Issue**: API examples and endpoints should be cross-verified for accuracy +- **Action**: Ensure all API examples work with current implementation +- **Priority**: Low + +## Outdated Dependencies/Versions +- **Issue**: Should verify that all mentioned Rust version requirements (1.70+) are still accurate +- **Action**: Check and update version requirements if needed +- **Priority**: Low \ No newline at end of file diff --git a/README.md b/README.md index 277f364..6120d1e 100644 --- a/README.md +++ b/README.md @@ -14,19 +14,19 @@ Aliens, in a native executable. ## Features - **OpenAI Compatible**: API endpoints match OpenAI's format for easy integration - **Text Embeddings**: Generate high-quality text embeddings using the Nomic Embed Text v1.5 model -- **Text Generation**: Chat completions with OpenAI-compatible API (simplified implementation) +- **Text Generation**: Chat completions with OpenAI-compatible API using Gemma models (1B, 2B, 7B, 9B variants including base and instruction-tuned models) - **Performance Optimized**: Implements efficient caching and singleton patterns for improved throughput and reduced latency - **Performance Benchmarking**: Includes tools for measuring performance and generating HTML reports -- **Web Chat Interface**: A Leptos-based WebAssembly chat interface for interacting with the inference engine +- **Web Chat Interface**: A Leptos-based WebAssembly (WASM) chat interface for browser-based interaction with the inference engine ## Architecture ### Core Components - **`predict-otron-9000`**: Main unified server that combines both engines -- **`embeddings-engine`**: Handles text embeddings using FastEmbed and Nomic models -- **`inference-engine`**: Provides text generation capabilities (with modular design for various models) -- **`leptos-chat`**: WebAssembly-based chat interface built with Leptos framework for interacting with the inference engine +- **`embeddings-engine`**: Handles text embeddings using FastEmbed with the Nomic Embed Text v1.5 model +- **`inference-engine`**: Provides text generation capabilities using Gemma models (1B, 2B, 7B, 9B variants) via Candle transformers +- **`leptos-chat`**: WebAssembly-based chat interface built with Leptos framework for browser-based interaction with the inference engine ## Installation @@ -44,8 +44,14 @@ cd predict-otron-9000 # 2. Build the project cargo build --release -# 3. Run the server +# 3. Run the unified server ./run_server.sh + +# Alternative: Build and run individual components +# For inference engine only: +cargo run -p inference-engine --release -- --server --port 3777 +# For embeddings engine only: +cargo run -p embeddings-engine --release ``` ## Usage diff --git a/crates/inference-engine/README.md b/crates/inference-engine/README.md index a61575f..0973950 100644 --- a/crates/inference-engine/README.md +++ b/crates/inference-engine/README.md @@ -20,19 +20,6 @@ A Rust-based inference engine for running large language models locally. This to - macOS: Metal support - Linux/Windows: CUDA support (requires appropriate drivers) -### Building from Source - -1. Clone the repository: - ```bash - git clone https://github.com/seemueller-io/open-web-agent-rs.git - cd open-web-agent-rs - ``` - -2. Build the local inference engine: - ```bash - cargo build -p inference-engine --release - ``` - ## Usage ### CLI Mode