From 956d00f5967e3f7a0f73c7d01227bfd98f6889ed Mon Sep 17 00:00:00 2001
From: geoffsee <>
Date: Thu, 28 Aug 2025 07:24:14 -0400
Subject: [PATCH] Add `CLEANUP.md` with identified documentation and code
 issues. Update README files to fix repository URL, unify descriptions, and
 clarify Gemma model usage.

---
 .aiignore                         |  7 ++++
 CLEANUP.md                        | 59 +++++++++++++++++++++++++++++++
 README.md                         | 18 ++++++----
 crates/inference-engine/README.md | 13 -------
 4 files changed, 78 insertions(+), 19 deletions(-)
 create mode 100644 .aiignore
 create mode 100644 CLEANUP.md

diff --git a/.aiignore b/.aiignore
new file mode 100644
index 0000000..e36cc7b
--- /dev/null
+++ b/.aiignore
@@ -0,0 +1,7 @@
+.idea/
+.fastembed_cache/
+target/
+/.output.txt
+/*.iml
+dist
+node_modules/
diff --git a/CLEANUP.md b/CLEANUP.md
new file mode 100644
index 0000000..a0fa2c4
--- /dev/null
+++ b/CLEANUP.md
@@ -0,0 +1,59 @@
+# CLEANUP.md
+
+This document tracks items requiring cleanup in the predict-otron-9000 project, identified during README updates on 2025-08-28.
+
+## Documentation Issues
+
+### Repository URL Inconsistencies
+- **File**: `crates/inference-engine/README.md` (lines 27-28)
+- **Issue**: References incorrect repository URL `https://github.com/seemueller-io/open-web-agent-rs.git`
+- **Action**: Should reference the correct predict-otron-9000 repository URL
+- **Priority**: High
+
+### Model Information Discrepancies
+- **File**: Main `README.md`
+- **Issue**: Does not specify that inference-engine specifically uses Gemma models (1B, 2B, 7B, 9B variants)
+- **Action**: Main README should clarify the specific model types supported
+- **Priority**: Medium
+
+### Build Instructions Inconsistency
+- **Files**: Main `README.md` vs `crates/inference-engine/README.md`
+- **Issue**: Different build commands and approaches between main and component READMEs
+- **Main README**: Uses `cargo build --release` and `./run_server.sh`
+- **Inference README**: Uses `cargo build -p inference-engine --release`
+- **Action**: Standardize build instructions across all READMEs
+- **Priority**: Medium
+
+### Missing Component Details in Main README
+- **File**: Main `README.md`
+- **Issue**: Lacks specific details about:
+  - Exact embedding model used (Nomic Embed Text v1.5)
+  - Specific LLM models supported (Gemma variants)
+  - WebAssembly nature of leptos-chat component
+- **Action**: Add more specific technical details to main README
+- **Priority**: Low
+
+## Code Structure Issues
+
+### Unified Server Reference
+- **File**: Main `README.md` (line 26)
+- **Issue**: Claims there's a "Main unified server that combines both engines" but unclear if this exists
+- **Action**: Verify if there's actually a unified server or if this is outdated documentation
+- **Priority**: Medium
+
+### Script References
+- **File**: Main `README.md`
+- **Issue**: References `./run_server.sh` but needs verification that this script works as documented
+- **Action**: Test and update script documentation if necessary
+- **Priority**: Low
+
+## API Documentation
+- **Files**: Both READMEs
+- **Issue**: API examples and endpoints should be cross-verified for accuracy
+- **Action**: Ensure all API examples work with current implementation
+- **Priority**: Low
+
+## Outdated Dependencies/Versions
+- **Issue**: Should verify that all mentioned Rust version requirements (1.70+) are still accurate
+- **Action**: Check and update version requirements if needed
+- **Priority**: Low
\ No newline at end of file
diff --git a/README.md b/README.md
index 277f364..6120d1e 100644
--- a/README.md
+++ b/README.md
@@ -14,19 +14,19 @@ Aliens, in a native executable.
 ## Features
 - **OpenAI Compatible**: API endpoints match OpenAI's format for easy integration
 - **Text Embeddings**: Generate high-quality text embeddings using the Nomic Embed Text v1.5 model
-- **Text Generation**: Chat completions with OpenAI-compatible API (simplified implementation)
+- **Text Generation**: Chat completions with OpenAI-compatible API using Gemma models (1B, 2B, 7B, 9B variants including base and instruction-tuned models)
 - **Performance Optimized**: Implements efficient caching and singleton patterns for improved throughput and reduced latency
 - **Performance Benchmarking**: Includes tools for measuring performance and generating HTML reports
-- **Web Chat Interface**: A Leptos-based WebAssembly chat interface for interacting with the inference engine
+- **Web Chat Interface**: A Leptos-based WebAssembly (WASM) chat interface for browser-based interaction with the inference engine
 
 ## Architecture
 
 ### Core Components
 
 - **`predict-otron-9000`**: Main unified server that combines both engines
-- **`embeddings-engine`**: Handles text embeddings using FastEmbed and Nomic models
-- **`inference-engine`**: Provides text generation capabilities (with modular design for various models)
-- **`leptos-chat`**: WebAssembly-based chat interface built with Leptos framework for interacting with the inference engine
+- **`embeddings-engine`**: Handles text embeddings using FastEmbed with the Nomic Embed Text v1.5 model
+- **`inference-engine`**: Provides text generation capabilities using Gemma models (1B, 2B, 7B, 9B variants) via Candle transformers
+- **`leptos-chat`**: WebAssembly-based chat interface built with Leptos framework for browser-based interaction with the inference engine
 
 ## Installation
 
@@ -44,8 +44,14 @@ cd predict-otron-9000
 # 2. Build the project
 cargo build --release
 
-# 3. Run the server
+# 3. Run the unified server
 ./run_server.sh
+
+# Alternative: Build and run individual components
+# For inference engine only:
+cargo run -p inference-engine --release -- --server --port 3777
+# For embeddings engine only:
+cargo run -p embeddings-engine --release
 ```
 
 ## Usage
diff --git a/crates/inference-engine/README.md b/crates/inference-engine/README.md
index a61575f..0973950 100644
--- a/crates/inference-engine/README.md
+++ b/crates/inference-engine/README.md
@@ -20,19 +20,6 @@ A Rust-based inference engine for running large language models locally. This to
   - macOS: Metal support
   - Linux/Windows: CUDA support (requires appropriate drivers)
 
-### Building from Source
-
-1. Clone the repository:
-   ```bash
-   git clone https://github.com/seemueller-io/open-web-agent-rs.git
-   cd open-web-agent-rs
-   ```
-
-2. Build the local inference engine:
-   ```bash
-   cargo build -p inference-engine --release
-   ```
-
 ## Usage
 
 ### CLI Mode