update docs

This commit is contained in:
geoffsee
2025-08-28 12:54:09 -04:00
parent 0488bddfdb
commit d04340d9ac
18 changed files with 22 additions and 651 deletions

View File

@@ -0,0 +1,4 @@
# Embeddings Engine
A high-performance text embeddings service that generates vector representations of text using state-of-the-art models.
This crate wraps the fastembed crate to provide embeddings and partially adapts the openai specification.

View File

@@ -0,0 +1,18 @@
[package]
name = "helm-chart-tool"
version = "0.1.0"
edition = "2021"
[workspace]
[[bin]]
name = "helm-chart-tool"
path = "src/main.rs"
[dependencies]
toml = "0.8"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
anyhow = "1.0"
clap = { version = "4.0", features = ["derive"] }
walkdir = "2.0"

View File

@@ -0,0 +1,218 @@
# Helm Chart Tool
A Rust-based tool that automatically generates Helm charts from Cargo.toml metadata in Rust workspace projects.
## Overview
This tool scans a Rust workspace for crates containing Docker/Kubernetes metadata in their `Cargo.toml` files and generates a complete, production-ready Helm chart with deployments, services, ingress, and configuration templates.
## Features
- **Automatic Service Discovery**: Scans all `Cargo.toml` files in a workspace to find services with Kubernetes metadata
- **Complete Helm Chart Generation**: Creates Chart.yaml, values.yaml, deployment templates, service templates, ingress template, and helper templates
- **Metadata Extraction**: Uses `[package.metadata.kube]` sections from Cargo.toml files to extract:
- Docker image names
- Service ports
- Replica counts
- Service names
- **Production Ready**: Generated charts include health checks, resource limits, node selectors, affinity rules, and tolerations
- **Helm Best Practices**: Follows Helm chart conventions and passes `helm lint` validation
## Installation
Build the tool from source:
```bash
cd helm-chart-tool
cargo build --release
```
The binary will be available at `target/release/helm-chart-tool`.
## Usage
### Basic Usage
```bash
./target/release/helm-chart-tool --workspace /path/to/rust/workspace --output ./my-helm-chart
```
### Command Line Options
- `--workspace, -w PATH`: Path to the workspace root (default: `.`)
- `--output, -o PATH`: Output directory for the Helm chart (default: `./helm-chart`)
- `--name, -n NAME`: Name of the Helm chart (default: `predict-otron-9000`)
### Example
```bash
# Generate chart from current workspace
./target/release/helm-chart-tool
# Generate chart from specific workspace with custom output
./target/release/helm-chart-tool -w /path/to/my/workspace -o ./charts/my-app -n my-application
```
## Cargo.toml Metadata Format
The tool expects crates to have Kubernetes metadata in their `Cargo.toml` files:
```toml
[package]
name = "my-service"
version = "0.1.0"
# Required: Kubernetes metadata
[package.metadata.kube]
image = "ghcr.io/myorg/my-service:latest"
replicas = 1
port = 8080
# Optional: Docker Compose metadata (currently not used but parsed)
[package.metadata.compose]
image = "ghcr.io/myorg/my-service:latest"
port = 8080
```
### Required Fields
- `image`: Full Docker image name including registry and tag
- `port`: Port number the service listens on
- `replicas`: Number of replicas to deploy (optional, defaults to 1)
## Generated Chart Structure
The tool generates a complete Helm chart with the following structure:
```
helm-chart/
├── Chart.yaml # Chart metadata
├── values.yaml # Default configuration values
├── .helmignore # Files to ignore when packaging
└── templates/
├── _helpers.tpl # Template helper functions
├── ingress.yaml # Ingress configuration (optional)
├── {service}-deployment.yaml # Deployment for each service
└── {service}-service.yaml # Service for each service
```
### Generated Files
#### Chart.yaml
- Standard Helm v2 chart metadata
- Includes keywords for AI/ML applications
- Maintainer information
#### values.yaml
- Individual service configurations
- Resource limits and requests
- Service types and ports
- Node selectors, affinity, and tolerations
- Global settings and ingress configuration
#### Deployment Templates
- Kubernetes Deployment manifests
- Health checks (liveness and readiness probes)
- Resource management
- Container port configuration from metadata
- Support for node selectors, affinity, and tolerations
#### Service Templates
- Kubernetes Service manifests
- ClusterIP services by default
- Port mapping from metadata
#### Ingress Template
- Optional ingress configuration
- Disabled by default
- Configurable through values.yaml
## Example Output
When run against the predict-otron-9000 workspace, the tool generates:
```bash
$ ./target/release/helm-chart-tool --workspace .. --output ../generated-helm-chart
Parsing workspace at: ..
Output directory: ../generated-helm-chart
Chart name: predict-otron-9000
Found 4 services:
- leptos-app: ghcr.io/geoffsee/leptos-app:latest (port 8788)
- inference-engine: ghcr.io/geoffsee/inference-service:latest (port 8080)
- embeddings-engine: ghcr.io/geoffsee/embeddings-service:latest (port 8080)
- predict-otron-9000: ghcr.io/geoffsee/predict-otron-9000:latest (port 8080)
Helm chart generated successfully!
```
## Validation
The generated charts pass Helm validation:
```bash
$ helm lint generated-helm-chart
==> Linting generated-helm-chart
[INFO] Chart.yaml: icon is recommended
1 chart(s) linted, 0 chart(s) failed
```
## Deployment
Deploy the generated chart:
```bash
# Install the chart
helm install my-release ./generated-helm-chart
# Upgrade the chart
helm upgrade my-release ./generated-helm-chart
# Uninstall the chart
helm uninstall my-release
```
### Customization
Customize the deployment by modifying `values.yaml`:
```yaml
# Enable ingress
ingress:
enabled: true
className: "nginx"
hosts:
- host: my-app.example.com
# Adjust resources for a specific service
predict_otron_9000:
replicas: 3
resources:
limits:
memory: "4Gi"
cpu: "2000m"
requests:
memory: "2Gi"
cpu: "1000m"
```
## Requirements
- Rust 2021+ (for building the tool)
- Helm 3.x (for deploying the generated charts)
- Kubernetes cluster (for deployment)
## Limitations
- Currently assumes all services need health checks on `/health` endpoint
- Resource limits are hardcoded defaults (can be overridden in values.yaml)
- Ingress configuration is basic (can be customized through values.yaml)
## Contributing
1. Add new features to the tool
2. Test with various Cargo.toml metadata configurations
3. Validate generated charts with `helm lint`
4. Ensure charts deploy successfully to test clusters
## License
This tool is part of the predict-otron-9000 project and follows the same license terms.

View File

@@ -0,0 +1,515 @@
use anyhow::{Context, Result};
use clap::{Arg, Command};
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::fs;
use std::path::{Path, PathBuf};
use walkdir::WalkDir;
#[derive(Debug, Deserialize)]
struct CargoToml {
package: Option<Package>,
}
#[derive(Debug, Deserialize)]
struct Package {
name: String,
metadata: Option<Metadata>,
}
#[derive(Debug, Deserialize)]
struct Metadata {
kube: Option<KubeMetadata>,
compose: Option<ComposeMetadata>,
}
#[derive(Debug, Deserialize)]
struct KubeMetadata {
image: String,
replicas: Option<u32>,
port: u16,
}
#[derive(Debug, Deserialize)]
struct ComposeMetadata {
image: Option<String>,
port: Option<u16>,
}
#[derive(Debug, Clone)]
struct ServiceInfo {
name: String,
image: String,
port: u16,
replicas: u32,
}
fn main() -> Result<()> {
let matches = Command::new("helm-chart-tool")
.about("Generate Helm charts from Cargo.toml metadata")
.arg(
Arg::new("workspace")
.short('w')
.long("workspace")
.value_name("PATH")
.help("Path to the workspace root")
.default_value("."),
)
.arg(
Arg::new("output")
.short('o')
.long("output")
.value_name("PATH")
.help("Output directory for the Helm chart")
.default_value("./helm-chart"),
)
.arg(
Arg::new("chart-name")
.short('n')
.long("name")
.value_name("NAME")
.help("Name of the Helm chart")
.default_value("predict-otron-9000"),
)
.get_matches();
let workspace_path = matches.get_one::<String>("workspace").unwrap();
let output_path = matches.get_one::<String>("output").unwrap();
let chart_name = matches.get_one::<String>("chart-name").unwrap();
println!("Parsing workspace at: {}", workspace_path);
println!("Output directory: {}", output_path);
println!("Chart name: {}", chart_name);
let services = discover_services(workspace_path)?;
println!("Found {} services:", services.len());
for service in &services {
println!(" - {}: {} (port {})", service.name, service.image, service.port);
}
generate_helm_chart(output_path, chart_name, &services)?;
println!("Helm chart generated successfully!");
Ok(())
}
fn discover_services(workspace_path: &str) -> Result<Vec<ServiceInfo>> {
let workspace_root = Path::new(workspace_path);
let mut services = Vec::new();
// Find all Cargo.toml files in the workspace
for entry in WalkDir::new(workspace_root)
.into_iter()
.filter_map(|e| e.ok())
{
if entry.file_name() == "Cargo.toml" && entry.path() != workspace_root.join("Cargo.toml") {
if let Ok(service_info) = parse_cargo_toml(entry.path()) {
services.push(service_info);
}
}
}
Ok(services)
}
fn parse_cargo_toml(path: &Path) -> Result<ServiceInfo> {
let content = fs::read_to_string(path)
.with_context(|| format!("Failed to read Cargo.toml at {:?}", path))?;
let cargo_toml: CargoToml = toml::from_str(&content)
.with_context(|| format!("Failed to parse Cargo.toml at {:?}", path))?;
let package = cargo_toml.package
.ok_or_else(|| anyhow::anyhow!("No package section found in {:?}", path))?;
let metadata = package.metadata
.ok_or_else(|| anyhow::anyhow!("No metadata section found in {:?}", path))?;
let kube_metadata = metadata.kube
.ok_or_else(|| anyhow::anyhow!("No kube metadata found in {:?}", path))?;
Ok(ServiceInfo {
name: package.name,
image: kube_metadata.image,
port: kube_metadata.port,
replicas: kube_metadata.replicas.unwrap_or(1),
})
}
fn generate_helm_chart(output_path: &str, chart_name: &str, services: &[ServiceInfo]) -> Result<()> {
let chart_dir = Path::new(output_path);
let templates_dir = chart_dir.join("templates");
// Create directories
fs::create_dir_all(&templates_dir)?;
// Generate Chart.yaml
generate_chart_yaml(chart_dir, chart_name)?;
// Generate values.yaml
generate_values_yaml(chart_dir, services)?;
// Generate templates for each service
for service in services {
generate_deployment_template(&templates_dir, service)?;
generate_service_template(&templates_dir, service)?;
}
// Generate ingress template
generate_ingress_template(&templates_dir, services)?;
// Generate helper templates
generate_helpers_template(&templates_dir)?;
// Generate .helmignore
generate_helmignore(chart_dir)?;
Ok(())
}
fn generate_chart_yaml(chart_dir: &Path, chart_name: &str) -> Result<()> {
let chart_yaml = format!(
r#"apiVersion: v2
name: {}
description: A Helm chart for the predict-otron-9000 AI platform
type: application
version: 0.1.0
appVersion: "0.1.0"
keywords:
- ai
- llm
- inference
- embeddings
- chat
maintainers:
- name: predict-otron-9000-team
"#,
chart_name
);
fs::write(chart_dir.join("Chart.yaml"), chart_yaml)?;
Ok(())
}
fn generate_values_yaml(chart_dir: &Path, services: &[ServiceInfo]) -> Result<()> {
let mut values = String::from(
r#"# Default values for predict-otron-9000
# This is a YAML-formatted file.
global:
imagePullPolicy: IfNotPresent
serviceType: ClusterIP
# Ingress configuration
ingress:
enabled: false
className: ""
annotations: {}
hosts:
- host: predict-otron-9000.local
paths:
- path: /
pathType: Prefix
backend:
service:
name: predict-otron-9000
port:
number: 8080
tls: []
"#,
);
for service in services {
let service_config = format!(
r#"{}:
image:
repository: {}
tag: "latest"
pullPolicy: IfNotPresent
replicas: {}
service:
type: ClusterIP
port: {}
resources:
limits:
memory: "1Gi"
cpu: "1000m"
requests:
memory: "512Mi"
cpu: "250m"
nodeSelector: {{}}
tolerations: []
affinity: {{}}
"#,
service.name.replace("-", "_"),
service.image.split(':').next().unwrap_or(&service.image),
service.replicas,
service.port
);
values.push_str(&service_config);
}
fs::write(chart_dir.join("values.yaml"), values)?;
Ok(())
}
fn generate_deployment_template(templates_dir: &Path, service: &ServiceInfo) -> Result<()> {
let service_name_underscore = service.name.replace("-", "_");
let deployment_template = format!(
r#"apiVersion: apps/v1
kind: Deployment
metadata:
name: {{{{ include "predict-otron-9000.fullname" . }}}}-{}
labels:
{{{{- include "predict-otron-9000.labels" . | nindent 4 }}}}
app.kubernetes.io/component: {}
spec:
replicas: {{{{ .Values.{}.replicas }}}}
selector:
matchLabels:
{{{{- include "predict-otron-9000.selectorLabels" . | nindent 6 }}}}
app.kubernetes.io/component: {}
template:
metadata:
labels:
{{{{- include "predict-otron-9000.selectorLabels" . | nindent 8 }}}}
app.kubernetes.io/component: {}
spec:
containers:
- name: {}
image: "{{{{ .Values.{}.image.repository }}}}:{{{{ .Values.{}.image.tag }}}}"
imagePullPolicy: {{{{ .Values.{}.image.pullPolicy }}}}
ports:
- name: http
containerPort: {}
protocol: TCP
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 5
resources:
{{{{- toYaml .Values.{}.resources | nindent 12 }}}}
{{{{- with .Values.{}.nodeSelector }}}}
nodeSelector:
{{{{- toYaml . | nindent 8 }}}}
{{{{- end }}}}
{{{{- with .Values.{}.affinity }}}}
affinity:
{{{{- toYaml . | nindent 8 }}}}
{{{{- end }}}}
{{{{- with .Values.{}.tolerations }}}}
tolerations:
{{{{- toYaml . | nindent 8 }}}}
{{{{- end }}}}
"#,
service.name,
service.name,
service_name_underscore,
service.name,
service.name,
service.name,
service_name_underscore,
service_name_underscore,
service_name_underscore,
service.port,
service_name_underscore,
service_name_underscore,
service_name_underscore,
service_name_underscore
);
let filename = format!("{}-deployment.yaml", service.name);
fs::write(templates_dir.join(filename), deployment_template)?;
Ok(())
}
fn generate_service_template(templates_dir: &Path, service: &ServiceInfo) -> Result<()> {
let service_template = format!(
r#"apiVersion: v1
kind: Service
metadata:
name: {{{{ include "predict-otron-9000.fullname" . }}}}-{}
labels:
{{{{- include "predict-otron-9000.labels" . | nindent 4 }}}}
app.kubernetes.io/component: {}
spec:
type: {{{{ .Values.{}.service.type }}}}
ports:
- port: {{{{ .Values.{}.service.port }}}}
targetPort: http
protocol: TCP
name: http
selector:
{{{{- include "predict-otron-9000.selectorLabels" . | nindent 4 }}}}
app.kubernetes.io/component: {}
"#,
service.name,
service.name,
service.name.replace("-", "_"),
service.name.replace("-", "_"),
service.name
);
let filename = format!("{}-service.yaml", service.name);
fs::write(templates_dir.join(filename), service_template)?;
Ok(())
}
fn generate_ingress_template(templates_dir: &Path, services: &[ServiceInfo]) -> Result<()> {
let ingress_template = r#"{{- if .Values.ingress.enabled -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: {{ include "predict-otron-9000.fullname" . }}
labels:
{{- include "predict-otron-9000.labels" . | nindent 4 }}
{{- with .Values.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if .Values.ingress.className }}
ingressClassName: {{ .Values.ingress.className }}
{{- end }}
{{- if .Values.ingress.tls }}
tls:
{{- range .Values.ingress.tls }}
- hosts:
{{- range .hosts }}
- {{ . | quote }}
{{- end }}
secretName: {{ .secretName }}
{{- end }}
{{- end }}
rules:
{{- range .Values.ingress.hosts }}
- host: {{ .host | quote }}
http:
paths:
{{- range .paths }}
- path: {{ .path }}
{{- if .pathType }}
pathType: {{ .pathType }}
{{- end }}
backend:
service:
name: {{ include "predict-otron-9000.fullname" $ }}-{{ .backend.service.name }}
port:
number: {{ .backend.service.port.number }}
{{- end }}
{{- end }}
{{- end }}
"#;
fs::write(templates_dir.join("ingress.yaml"), ingress_template)?;
Ok(())
}
fn generate_helpers_template(templates_dir: &Path) -> Result<()> {
let helpers_template = r#"{{/*
Expand the name of the chart.
*/}}
{{- define "predict-otron-9000.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "predict-otron-9000.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "predict-otron-9000.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Common labels
*/}}
{{- define "predict-otron-9000.labels" -}}
helm.sh/chart: {{ include "predict-otron-9000.chart" . }}
{{ include "predict-otron-9000.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
{{/*
Selector labels
*/}}
{{- define "predict-otron-9000.selectorLabels" -}}
app.kubernetes.io/name: {{ include "predict-otron-9000.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Create the name of the service account to use
*/}}
{{- define "predict-otron-9000.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "predict-otron-9000.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
"#;
fs::write(templates_dir.join("_helpers.tpl"), helpers_template)?;
Ok(())
}
fn generate_helmignore(chart_dir: &Path) -> Result<()> {
let helmignore_content = r#"# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
"#;
fs::write(chart_dir.join(".helmignore"), helmignore_content)?;
Ok(())
}

View File

@@ -1,295 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>OpenAI-Compatible API Tester</title>
<style>
body {
font-family: Arial, sans-serif;
max-width: 800px;
margin: 0 auto;
padding: 20px;
line-height: 1.6;
}
h1, h2 {
color: #333;
}
.container {
margin-bottom: 20px;
}
textarea {
width: 100%;
height: 150px;
padding: 10px;
margin-bottom: 10px;
border: 1px solid #ddd;
border-radius: 4px;
font-family: monospace;
}
button {
background-color: #4CAF50;
color: white;
padding: 10px 15px;
border: none;
border-radius: 4px;
cursor: pointer;
font-size: 16px;
}
button:hover {
background-color: #45a049;
}
pre {
background-color: #f5f5f5;
padding: 15px;
border-radius: 4px;
overflow-x: auto;
white-space: pre-wrap;
}
.response {
margin-top: 20px;
}
.error {
color: red;
}
.settings {
display: flex;
flex-wrap: wrap;
gap: 10px;
margin-bottom: 15px;
}
.settings div {
display: flex;
flex-direction: column;
}
label {
margin-bottom: 5px;
font-weight: bold;
}
input {
padding: 8px;
border: 1px solid #ddd;
border-radius: 4px;
}
.examples {
margin-top: 30px;
}
.example-btn {
background-color: #2196F3;
margin-right: 10px;
margin-bottom: 10px;
}
.example-btn:hover {
background-color: #0b7dda;
}
</style>
</head>
<body>
<h1>OpenAI-Compatible API Tester</h1>
<p>Use this page to test the OpenAI-compatible chat completions endpoint of the local inference engine.</p>
<div class="container">
<h2>Request Settings</h2>
<div class="settings">
<div>
<label for="serverUrl">Server URL:</label>
<input type="text" id="serverUrl" value="http://localhost:3777" />
</div>
<div>
<label for="model">Model:</label>
<input type="text" id="model" value="gemma-3-1b-it" />
</div>
<div>
<label for="maxTokens">Max Tokens:</label>
<input type="number" id="maxTokens" value="150" />
</div>
<div>
<label for="temperature">Temperature:</label>
<input type="number" id="temperature" value="0.7" step="0.1" min="0" max="2" />
</div>
<div>
<label for="topP">Top P:</label>
<input type="number" id="topP" value="0.9" step="0.1" min="0" max="1" />
</div>
</div>
<h2>Request Body</h2>
<textarea id="requestBody">{
"model": "gemma-3-1b-it",
"messages": [
{
"role": "user",
"content": "Hello, how are you today?"
}
],
"max_tokens": 150,
"temperature": 0.7,
"top_p": 0.9
}</textarea>
<button id="sendRequest">Send Request</button>
<div class="examples">
<h3>Example Requests</h3>
<button class="example-btn" id="example1">Basic Question</button>
<button class="example-btn" id="example2">Multi-turn Conversation</button>
<button class="example-btn" id="example3">Creative Writing</button>
<button class="example-btn" id="example4">Code Generation</button>
</div>
<div class="response">
<h2>Response</h2>
<pre id="responseOutput">Response will appear here...</pre>
</div>
</div>
<script>
document.addEventListener('DOMContentLoaded', function() {
// Update request body when settings change
const serverUrlInput = document.getElementById('serverUrl');
const modelInput = document.getElementById('model');
const maxTokensInput = document.getElementById('maxTokens');
const temperatureInput = document.getElementById('temperature');
const topPInput = document.getElementById('topP');
const requestBodyTextarea = document.getElementById('requestBody');
const responseOutput = document.getElementById('responseOutput');
// Function to update request body from settings
function updateRequestBodyFromSettings() {
try {
const requestBody = JSON.parse(requestBodyTextarea.value);
requestBody.model = modelInput.value;
requestBody.max_tokens = parseInt(maxTokensInput.value);
requestBody.temperature = parseFloat(temperatureInput.value);
requestBody.top_p = parseFloat(topPInput.value);
requestBodyTextarea.value = JSON.stringify(requestBody, null, 2);
} catch (error) {
console.error("Error updating request body:", error);
}
}
// Update settings when request body changes
function updateSettingsFromRequestBody() {
try {
const requestBody = JSON.parse(requestBodyTextarea.value);
if (requestBody.model) modelInput.value = requestBody.model;
if (requestBody.max_tokens) maxTokensInput.value = requestBody.max_tokens;
if (requestBody.temperature) temperatureInput.value = requestBody.temperature;
if (requestBody.top_p) topPInput.value = requestBody.top_p;
} catch (error) {
console.error("Error updating settings:", error);
}
}
// Add event listeners for settings changes
modelInput.addEventListener('change', updateRequestBodyFromSettings);
maxTokensInput.addEventListener('change', updateRequestBodyFromSettings);
temperatureInput.addEventListener('change', updateRequestBodyFromSettings);
topPInput.addEventListener('change', updateRequestBodyFromSettings);
// Add event listener for request body changes
requestBodyTextarea.addEventListener('blur', updateSettingsFromRequestBody);
// Send request button
document.getElementById('sendRequest').addEventListener('click', async function() {
try {
responseOutput.textContent = "Sending request...";
const serverUrl = serverUrlInput.value;
const endpoint = '/v1/chat/completions';
const url = serverUrl + endpoint;
const requestBody = JSON.parse(requestBodyTextarea.value);
const response = await fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify(requestBody)
});
const data = await response.json();
responseOutput.textContent = JSON.stringify(data, null, 2);
} catch (error) {
responseOutput.textContent = "Error: " + error.message;
responseOutput.classList.add('error');
}
});
// Example requests
document.getElementById('example1').addEventListener('click', function() {
requestBodyTextarea.value = JSON.stringify({
model: modelInput.value,
messages: [
{
role: "user",
content: "Who was the 16th president of the United States?"
}
],
max_tokens: parseInt(maxTokensInput.value),
temperature: parseFloat(temperatureInput.value),
top_p: parseFloat(topPInput.value)
}, null, 2);
});
document.getElementById('example2').addEventListener('click', function() {
requestBodyTextarea.value = JSON.stringify({
model: modelInput.value,
messages: [
{
role: "system",
content: "You are a helpful assistant that provides concise answers."
},
{
role: "user",
content: "What is machine learning?"
},
{
role: "assistant",
content: "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed."
},
{
role: "user",
content: "Give me an example of a machine learning algorithm."
}
],
max_tokens: parseInt(maxTokensInput.value),
temperature: parseFloat(temperatureInput.value),
top_p: parseFloat(topPInput.value)
}, null, 2);
});
document.getElementById('example3').addEventListener('click', function() {
requestBodyTextarea.value = JSON.stringify({
model: modelInput.value,
messages: [
{
role: "user",
content: "Write a short poem about artificial intelligence."
}
],
max_tokens: parseInt(maxTokensInput.value),
temperature: 0.9, // Higher temperature for creative tasks
top_p: 0.9
}, null, 2);
temperatureInput.value = 0.9;
});
document.getElementById('example4').addEventListener('click', function() {
requestBodyTextarea.value = JSON.stringify({
model: modelInput.value,
messages: [
{
role: "user",
content: "Write a Python function to calculate the Fibonacci sequence up to n terms."
}
],
max_tokens: parseInt(maxTokensInput.value),
temperature: 0.3, // Lower temperature for code generation
top_p: 0.9
}, null, 2);
temperatureInput.value = 0.3;
});
});
</script>
</body>
</html>

View File

@@ -1,176 +0,0 @@
// Test requests for the OpenAI-compatible endpoint in the inference server
// This file contains IIFE (Immediately Invoked Function Expression) JavaScript requests
// to test the /v1/chat/completions endpoint
// Basic chat completion request
(async function testBasicChatCompletion() {
console.log("Test 1: Basic chat completion request");
try {
const response = await fetch('http://localhost:3777/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: "gemma-2-2b-it",
messages: [
{
role: "user",
content: "Who was the 16th president of the United States?"
}
],
max_tokens: 100
})
});
const data = await response.json();
console.log("Response:", JSON.stringify(data, null, 2));
} catch (error) {
console.error("Error:", error);
}
})();
// Multi-turn conversation
(async function testMultiTurnConversation() {
console.log("\nTest 2: Multi-turn conversation");
try {
const response = await fetch('http://localhost:3777/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: "gemma-2-2b-it",
messages: [
{
role: "system",
content: "You are a helpful assistant that provides concise answers."
},
{
role: "user",
content: "What is machine learning?"
},
{
role: "assistant",
content: "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed."
},
{
role: "user",
content: "Give me an example of a machine learning algorithm."
}
],
max_tokens: 150
})
});
const data = await response.json();
console.log("Response:", JSON.stringify(data, null, 2));
} catch (error) {
console.error("Error:", error);
}
})();
// Request with temperature and top_p parameters
(async function testTemperatureAndTopP() {
console.log("\nTest 3: Request with temperature and top_p parameters");
try {
const response = await fetch('http://localhost:3777/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: "gemma-2-2b-it",
messages: [
{
role: "user",
content: "Write a short poem about artificial intelligence."
}
],
max_tokens: 200,
temperature: 0.8,
top_p: 0.9
})
});
const data = await response.json();
console.log("Response:", JSON.stringify(data, null, 2));
} catch (error) {
console.error("Error:", error);
}
})();
// Request with streaming enabled
(async function testStreaming() {
console.log("\nTest 4: Request with streaming enabled");
try {
const response = await fetch('http://localhost:3777/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: "gemma-2-2b-it",
messages: [
{
role: "user",
content: "Explain quantum computing in simple terms."
}
],
max_tokens: 150,
stream: true
})
});
// Note: Streaming might not be implemented yet, this is to test the API's handling of the parameter
if (response.headers.get('content-type')?.includes('text/event-stream')) {
console.log("Streaming response detected. Reading stream...");
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
console.log("Chunk:", chunk);
}
} else {
const data = await response.json();
console.log("Non-streaming response:", JSON.stringify(data, null, 2));
}
} catch (error) {
console.error("Error:", error);
}
})();
// Request with a different model
(async function testDifferentModel() {
console.log("\nTest 5: Request with a different model");
try {
const response = await fetch('http://localhost:3777/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: "gemma-2-2b-it", // Using a different model if available
messages: [
{
role: "user",
content: "What are the benefits of renewable energy?"
}
],
max_tokens: 150
})
});
const data = await response.json();
console.log("Response:", JSON.stringify(data, null, 2));
} catch (error) {
console.error("Error:", error);
}
})();
console.log("\nAll test requests have been sent. Check the server logs for more details.");
console.log("To run the server, use: cargo run --bin inference-engine -- --server");

View File

@@ -0,0 +1,8 @@
# predict-otron-9000
This is an extensible axum/tokio hybrid combining [embeddings-engine](../embeddings-engine), [inference-engine](../inference-engine), and [leptos-app](../leptos-app).
# Notes
- When `server_mode` is Standalone (default), the instance contains all components necessary for inference.
- When `server_mode` is HighAvailability, automatic scaling of inference and embeddings; proxies to inference and embeddings services via dns