mirror of https://github.com/geoffsee/predict-otron-9001.git synced 2025-09-08 22:46:44 +00:00

Files

geoffsee 2aa6d4cdf8 Introduce predict-otron-9000: Unified server combining embeddings and inference engines. Includes OpenAI-compatible APIs, full documentation, and example scripts.

2025-08-16 19:11:35 -04:00

5.3 KiB

Raw Blame History

predict-otron-9000

Aliens, in a native executable.

Warning: Do NOT use this in production unless you are cool like that.

Features

OpenAI Compatible: API endpoints match OpenAI's format for easy integration
Text Embeddings: Generate high-quality text embeddings using the Nomic Embed Text v1.5 model
Text Generation: Chat completions with OpenAI-compatible API (simplified implementation)

Architecture

Core Components

predict-otron-9000: Main unified server that combines both engines
embeddings-engine: Handles text embeddings using FastEmbed and Nomic models
inference-engine: Provides text generation capabilities (with modular design for various models)

Installation

Prerequisites

Rust 1.70+ with 2024 edition support
Cargo package manager

Build from Source

# 1. Clone the repository
git clone <repository-url>
cd predict-otron-9000

# 2. Build the project
cargo build --release

# 3. Run the server
./run_server.sh

Usage

Starting the Server

The server can be started using the provided script or directly with cargo:

# Using the provided script
./run_server.sh

# Or directly with cargo
cargo run --bin predict-otron-9000

Configuration

Environment variables for server configuration:

SERVER_HOST: Server bind address (default: 0.0.0.0)
SERVER_PORT: Server port (default: 8080)
RUST_LOG: Logging level configuration

Example:

export SERVER_PORT=3000
export RUST_LOG=debug
./run_server.sh

API Endpoints

Text Embeddings

Generate text embeddings compatible with OpenAI's embeddings API.

Endpoint: POST /v1/embeddings

Request Body:

{
  "input": "Your text to embed",
  "model": "nomic-embed-text-v1.5"
}

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.1, 0.2, 0.3]
    }
  ],
  "model": "nomic-embed-text-v1.5",
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0
  }
}

Chat Completions

Generate chat completions (simplified implementation).

Endpoint: POST /v1/chat/completions

Request Body:

{
  "model": "gemma-2b-it",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}

Response:

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1699123456,
  "model": "gemma-2b-it",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! This is the unified predict-otron-9000 server..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 35,
    "total_tokens": 45
  }
}

Health Check

Endpoint: GET /

Returns a simple "Hello, World!" message to verify the server is running.

Development

Project Structure

predict-otron-9000/
├── Cargo.toml                 # Workspace configuration
├── README.md                  # This file
├── run_server.sh             # Server startup script
└── crates/
    ├── predict-otron-9000/   # Main unified server
    │   ├── Cargo.toml
    │   └── src/
    │       └── main.rs
    ├── embeddings-engine/    # Text embeddings functionality
    │   ├── Cargo.toml
    │   └── src/
    │       ├── lib.rs
    │       └── main.rs
    └── inference-engine/     # Text generation functionality
        ├── Cargo.toml
        ├── src/
        │   ├── lib.rs
        │   ├── cli.rs
        │   ├── server.rs
        │   ├── model.rs
        │   ├── text_generation.rs
        │   ├── token_output_stream.rs
        │   ├── utilities_lib.rs
        │   └── openai_types.rs
        └── tests/

Running Tests

# Run all tests
cargo test

# Run tests for a specific crate
cargo test -p embeddings-engine
cargo test -p inference-engine

Adding Features

Embeddings Engine: Modify crates/embeddings-engine/src/lib.rs to add new embedding models or functionality
Inference Engine: The inference engine has a modular structure - add new models in the model.rs module
Unified Server: Update crates/predict-otron-9000/src/main.rs to integrate new capabilities

Logging and Debugging

The application uses structured logging with tracing. Log levels can be controlled via the RUST_LOG environment variable:

# Debug level logging
export RUST_LOG=debug

# Trace level for detailed embeddings debugging
export RUST_LOG=trace

# Module-specific logging
export RUST_LOG=predict_otron_9000=debug,embeddings_engine=trace

Limitations

Inference Engine: Currently provides a simplified implementation for chat completions. Full model loading and text generation capabilities from the inference-engine crate are not yet integrated into the unified server.
Model Support: Embeddings are limited to the Nomic Embed Text v1.5 model.
Scalability: Single-threaded model loading may impact performance under heavy load.

Contributing

Fork the repository
Create a feature branch: git checkout -b feature-name
Make your changes and add tests
Ensure all tests pass: cargo test
Submit a pull request

5.3 KiB Raw Blame History