Embeddings Engine

A high-performance text embeddings service that generates vector representations of text using state-of-the-art models. This crate wraps the FastEmbed library to provide embeddings with OpenAI-compatible API endpoints.

Overview

The embeddings-engine provides a standalone service for generating text embeddings that can be used for semantic search, similarity comparisons, and other NLP tasks. It's designed to be compatible with OpenAI's embeddings API format.

Features

OpenAI-Compatible API: /v1/embeddings endpoint matching OpenAI's specification
FastEmbed Integration: Powered by the FastEmbed library for high-quality embeddings
Multiple Model Support: Support for various embedding models
High Performance: Optimized for fast embedding generation
Standalone Service: Can run independently or as part of the predict-otron-9000 platform

Building and Running

Prerequisites

Rust toolchain
Internet connection for initial model downloads

Standalone Server

cargo run --bin embeddings-engine --release

The service will start on port 8080 by default.

API Usage

Generate Embeddings

Endpoint: POST /v1/embeddings

Request Body:

{
  "input": "Your text to embed",
  "model": "nomic-embed-text-v1.5"
}

Response:

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.1, 0.2, 0.3, ...]
    }
  ],
  "model": "nomic-embed-text-v1.5",
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0
  }
}

Example Usage

Using cURL:

curl -s http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": "The quick brown fox jumps over the lazy dog",
    "model": "nomic-embed-text-v1.5"
  }' | jq

Using Python OpenAI Client:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="dummy"  # Not validated but required by client
)

response = client.embeddings.create(
    input="Your text here",
    model="nomic-embed-text-v1.5"
)

print(response.data[0].embedding)

Configuration

The service can be configured through environment variables:

SERVER_PORT: Port to run on (default: 8080)
RUST_LOG: Logging level (default: info)

Integration

This service is designed to work seamlessly with the predict-otron-9000 main server, but can also be deployed independently for dedicated embeddings workloads.