Files
predict-otron-9001/crates/embeddings-engine
2025-08-31 19:27:15 -04:00
..
2025-08-31 10:31:20 -04:00
2025-08-31 19:27:15 -04:00

Embeddings Engine

A high-performance text embeddings service that generates vector representations of text using state-of-the-art models. This crate wraps the FastEmbed library to provide embeddings with OpenAI-compatible API endpoints.

Overview

The embeddings-engine provides a standalone service for generating text embeddings that can be used for semantic search, similarity comparisons, and other NLP tasks. It's designed to be compatible with OpenAI's embeddings API format.

Features

  • OpenAI-Compatible API: /v1/embeddings endpoint matching OpenAI's specification
  • FastEmbed Integration: Powered by the FastEmbed library for high-quality embeddings
  • Multiple Model Support: Support for various embedding models
  • High Performance: Optimized for fast embedding generation
  • Standalone Service: Can run independently or as part of the predict-otron-9000 platform

Building and Running

Prerequisites

  • Rust toolchain
  • Internet connection for initial model downloads

Standalone Server

cargo run --bin embeddings-engine --release

The service will start on port 8080 by default.

API Usage

Generate Embeddings

Endpoint: POST /v1/embeddings

Request Body:

{
  "input": "Your text to embed",
  "model": "nomic-embed-text-v1.5"
}

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.1, 0.2, 0.3, ...]
    }
  ],
  "model": "nomic-embed-text-v1.5",
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0
  }
}

Example Usage

Using cURL:

curl -s http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The quick brown fox jumps over the lazy dog",
    "model": "nomic-embed-text-v1.5"
  }' | jq

Using Python OpenAI Client:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="dummy"  # Not validated but required by client
)

response = client.embeddings.create(
    input="Your text here",
    model="nomic-embed-text-v1.5"
)

print(response.data[0].embedding)

Configuration

The service can be configured through environment variables:

  • SERVER_PORT: Port to run on (default: 8080)
  • RUST_LOG: Logging level (default: info)

Integration

This service is designed to work seamlessly with the predict-otron-9000 main server, but can also be deployed independently for dedicated embeddings workloads.