mirror of
https://github.com/geoffsee/predict-otron-9001.git
synced 2025-09-08 22:46:44 +00:00
Add CLEANUP.md
with identified documentation and code issues. Update README files to fix repository URL, unify descriptions, and clarify Gemma model usage.
This commit is contained in:
7
.aiignore
Normal file
7
.aiignore
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
.idea/
|
||||||
|
.fastembed_cache/
|
||||||
|
target/
|
||||||
|
/.output.txt
|
||||||
|
/*.iml
|
||||||
|
dist
|
||||||
|
node_modules/
|
59
CLEANUP.md
Normal file
59
CLEANUP.md
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
# CLEANUP.md
|
||||||
|
|
||||||
|
This document tracks items requiring cleanup in the predict-otron-9000 project, identified during README updates on 2025-08-28.
|
||||||
|
|
||||||
|
## Documentation Issues
|
||||||
|
|
||||||
|
### Repository URL Inconsistencies
|
||||||
|
- **File**: `crates/inference-engine/README.md` (lines 27-28)
|
||||||
|
- **Issue**: References incorrect repository URL `https://github.com/seemueller-io/open-web-agent-rs.git`
|
||||||
|
- **Action**: Should reference the correct predict-otron-9000 repository URL
|
||||||
|
- **Priority**: High
|
||||||
|
|
||||||
|
### Model Information Discrepancies
|
||||||
|
- **File**: Main `README.md`
|
||||||
|
- **Issue**: Does not specify that inference-engine specifically uses Gemma models (1B, 2B, 7B, 9B variants)
|
||||||
|
- **Action**: Main README should clarify the specific model types supported
|
||||||
|
- **Priority**: Medium
|
||||||
|
|
||||||
|
### Build Instructions Inconsistency
|
||||||
|
- **Files**: Main `README.md` vs `crates/inference-engine/README.md`
|
||||||
|
- **Issue**: Different build commands and approaches between main and component READMEs
|
||||||
|
- **Main README**: Uses `cargo build --release` and `./run_server.sh`
|
||||||
|
- **Inference README**: Uses `cargo build -p inference-engine --release`
|
||||||
|
- **Action**: Standardize build instructions across all READMEs
|
||||||
|
- **Priority**: Medium
|
||||||
|
|
||||||
|
### Missing Component Details in Main README
|
||||||
|
- **File**: Main `README.md`
|
||||||
|
- **Issue**: Lacks specific details about:
|
||||||
|
- Exact embedding model used (Nomic Embed Text v1.5)
|
||||||
|
- Specific LLM models supported (Gemma variants)
|
||||||
|
- WebAssembly nature of leptos-chat component
|
||||||
|
- **Action**: Add more specific technical details to main README
|
||||||
|
- **Priority**: Low
|
||||||
|
|
||||||
|
## Code Structure Issues
|
||||||
|
|
||||||
|
### Unified Server Reference
|
||||||
|
- **File**: Main `README.md` (line 26)
|
||||||
|
- **Issue**: Claims there's a "Main unified server that combines both engines" but unclear if this exists
|
||||||
|
- **Action**: Verify if there's actually a unified server or if this is outdated documentation
|
||||||
|
- **Priority**: Medium
|
||||||
|
|
||||||
|
### Script References
|
||||||
|
- **File**: Main `README.md`
|
||||||
|
- **Issue**: References `./run_server.sh` but needs verification that this script works as documented
|
||||||
|
- **Action**: Test and update script documentation if necessary
|
||||||
|
- **Priority**: Low
|
||||||
|
|
||||||
|
## API Documentation
|
||||||
|
- **Files**: Both READMEs
|
||||||
|
- **Issue**: API examples and endpoints should be cross-verified for accuracy
|
||||||
|
- **Action**: Ensure all API examples work with current implementation
|
||||||
|
- **Priority**: Low
|
||||||
|
|
||||||
|
## Outdated Dependencies/Versions
|
||||||
|
- **Issue**: Should verify that all mentioned Rust version requirements (1.70+) are still accurate
|
||||||
|
- **Action**: Check and update version requirements if needed
|
||||||
|
- **Priority**: Low
|
18
README.md
18
README.md
@@ -14,19 +14,19 @@ Aliens, in a native executable.
|
|||||||
## Features
|
## Features
|
||||||
- **OpenAI Compatible**: API endpoints match OpenAI's format for easy integration
|
- **OpenAI Compatible**: API endpoints match OpenAI's format for easy integration
|
||||||
- **Text Embeddings**: Generate high-quality text embeddings using the Nomic Embed Text v1.5 model
|
- **Text Embeddings**: Generate high-quality text embeddings using the Nomic Embed Text v1.5 model
|
||||||
- **Text Generation**: Chat completions with OpenAI-compatible API (simplified implementation)
|
- **Text Generation**: Chat completions with OpenAI-compatible API using Gemma models (1B, 2B, 7B, 9B variants including base and instruction-tuned models)
|
||||||
- **Performance Optimized**: Implements efficient caching and singleton patterns for improved throughput and reduced latency
|
- **Performance Optimized**: Implements efficient caching and singleton patterns for improved throughput and reduced latency
|
||||||
- **Performance Benchmarking**: Includes tools for measuring performance and generating HTML reports
|
- **Performance Benchmarking**: Includes tools for measuring performance and generating HTML reports
|
||||||
- **Web Chat Interface**: A Leptos-based WebAssembly chat interface for interacting with the inference engine
|
- **Web Chat Interface**: A Leptos-based WebAssembly (WASM) chat interface for browser-based interaction with the inference engine
|
||||||
|
|
||||||
## Architecture
|
## Architecture
|
||||||
|
|
||||||
### Core Components
|
### Core Components
|
||||||
|
|
||||||
- **`predict-otron-9000`**: Main unified server that combines both engines
|
- **`predict-otron-9000`**: Main unified server that combines both engines
|
||||||
- **`embeddings-engine`**: Handles text embeddings using FastEmbed and Nomic models
|
- **`embeddings-engine`**: Handles text embeddings using FastEmbed with the Nomic Embed Text v1.5 model
|
||||||
- **`inference-engine`**: Provides text generation capabilities (with modular design for various models)
|
- **`inference-engine`**: Provides text generation capabilities using Gemma models (1B, 2B, 7B, 9B variants) via Candle transformers
|
||||||
- **`leptos-chat`**: WebAssembly-based chat interface built with Leptos framework for interacting with the inference engine
|
- **`leptos-chat`**: WebAssembly-based chat interface built with Leptos framework for browser-based interaction with the inference engine
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@@ -44,8 +44,14 @@ cd predict-otron-9000
|
|||||||
# 2. Build the project
|
# 2. Build the project
|
||||||
cargo build --release
|
cargo build --release
|
||||||
|
|
||||||
# 3. Run the server
|
# 3. Run the unified server
|
||||||
./run_server.sh
|
./run_server.sh
|
||||||
|
|
||||||
|
# Alternative: Build and run individual components
|
||||||
|
# For inference engine only:
|
||||||
|
cargo run -p inference-engine --release -- --server --port 3777
|
||||||
|
# For embeddings engine only:
|
||||||
|
cargo run -p embeddings-engine --release
|
||||||
```
|
```
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
@@ -20,19 +20,6 @@ A Rust-based inference engine for running large language models locally. This to
|
|||||||
- macOS: Metal support
|
- macOS: Metal support
|
||||||
- Linux/Windows: CUDA support (requires appropriate drivers)
|
- Linux/Windows: CUDA support (requires appropriate drivers)
|
||||||
|
|
||||||
### Building from Source
|
|
||||||
|
|
||||||
1. Clone the repository:
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/seemueller-io/open-web-agent-rs.git
|
|
||||||
cd open-web-agent-rs
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Build the local inference engine:
|
|
||||||
```bash
|
|
||||||
cargo build -p inference-engine --release
|
|
||||||
```
|
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
### CLI Mode
|
### CLI Mode
|
||||||
|
Reference in New Issue
Block a user