mirror of
https://github.com/geoffsee/predict-otron-9001.git
synced 2025-09-08 22:46:44 +00:00
update docs
This commit is contained in:
4
crates/embeddings-engine/README.md
Normal file
4
crates/embeddings-engine/README.md
Normal file
@@ -0,0 +1,4 @@
|
||||
# Embeddings Engine
|
||||
|
||||
A high-performance text embeddings service that generates vector representations of text using state-of-the-art models.
|
||||
This crate wraps the fastembed crate to provide embeddings and partially adapts the openai specification.
|
18
crates/helm-chart-tool/Cargo.toml
Normal file
18
crates/helm-chart-tool/Cargo.toml
Normal file
@@ -0,0 +1,18 @@
|
||||
[package]
|
||||
name = "helm-chart-tool"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
|
||||
[workspace]
|
||||
|
||||
[[bin]]
|
||||
name = "helm-chart-tool"
|
||||
path = "src/main.rs"
|
||||
|
||||
[dependencies]
|
||||
toml = "0.8"
|
||||
serde = { version = "1.0", features = ["derive"] }
|
||||
serde_json = "1.0"
|
||||
anyhow = "1.0"
|
||||
clap = { version = "4.0", features = ["derive"] }
|
||||
walkdir = "2.0"
|
218
crates/helm-chart-tool/README.md
Normal file
218
crates/helm-chart-tool/README.md
Normal file
@@ -0,0 +1,218 @@
|
||||
# Helm Chart Tool
|
||||
|
||||
A Rust-based tool that automatically generates Helm charts from Cargo.toml metadata in Rust workspace projects.
|
||||
|
||||
## Overview
|
||||
|
||||
This tool scans a Rust workspace for crates containing Docker/Kubernetes metadata in their `Cargo.toml` files and generates a complete, production-ready Helm chart with deployments, services, ingress, and configuration templates.
|
||||
|
||||
## Features
|
||||
|
||||
- **Automatic Service Discovery**: Scans all `Cargo.toml` files in a workspace to find services with Kubernetes metadata
|
||||
- **Complete Helm Chart Generation**: Creates Chart.yaml, values.yaml, deployment templates, service templates, ingress template, and helper templates
|
||||
- **Metadata Extraction**: Uses `[package.metadata.kube]` sections from Cargo.toml files to extract:
|
||||
- Docker image names
|
||||
- Service ports
|
||||
- Replica counts
|
||||
- Service names
|
||||
- **Production Ready**: Generated charts include health checks, resource limits, node selectors, affinity rules, and tolerations
|
||||
- **Helm Best Practices**: Follows Helm chart conventions and passes `helm lint` validation
|
||||
|
||||
## Installation
|
||||
|
||||
Build the tool from source:
|
||||
|
||||
```bash
|
||||
cd helm-chart-tool
|
||||
cargo build --release
|
||||
```
|
||||
|
||||
The binary will be available at `target/release/helm-chart-tool`.
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Usage
|
||||
|
||||
```bash
|
||||
./target/release/helm-chart-tool --workspace /path/to/rust/workspace --output ./my-helm-chart
|
||||
```
|
||||
|
||||
### Command Line Options
|
||||
|
||||
- `--workspace, -w PATH`: Path to the workspace root (default: `.`)
|
||||
- `--output, -o PATH`: Output directory for the Helm chart (default: `./helm-chart`)
|
||||
- `--name, -n NAME`: Name of the Helm chart (default: `predict-otron-9000`)
|
||||
|
||||
### Example
|
||||
|
||||
```bash
|
||||
# Generate chart from current workspace
|
||||
./target/release/helm-chart-tool
|
||||
|
||||
# Generate chart from specific workspace with custom output
|
||||
./target/release/helm-chart-tool -w /path/to/my/workspace -o ./charts/my-app -n my-application
|
||||
```
|
||||
|
||||
## Cargo.toml Metadata Format
|
||||
|
||||
The tool expects crates to have Kubernetes metadata in their `Cargo.toml` files:
|
||||
|
||||
```toml
|
||||
[package]
|
||||
name = "my-service"
|
||||
version = "0.1.0"
|
||||
|
||||
# Required: Kubernetes metadata
|
||||
[package.metadata.kube]
|
||||
image = "ghcr.io/myorg/my-service:latest"
|
||||
replicas = 1
|
||||
port = 8080
|
||||
|
||||
# Optional: Docker Compose metadata (currently not used but parsed)
|
||||
[package.metadata.compose]
|
||||
image = "ghcr.io/myorg/my-service:latest"
|
||||
port = 8080
|
||||
```
|
||||
|
||||
### Required Fields
|
||||
|
||||
- `image`: Full Docker image name including registry and tag
|
||||
- `port`: Port number the service listens on
|
||||
- `replicas`: Number of replicas to deploy (optional, defaults to 1)
|
||||
|
||||
## Generated Chart Structure
|
||||
|
||||
The tool generates a complete Helm chart with the following structure:
|
||||
|
||||
```
|
||||
helm-chart/
|
||||
├── Chart.yaml # Chart metadata
|
||||
├── values.yaml # Default configuration values
|
||||
├── .helmignore # Files to ignore when packaging
|
||||
└── templates/
|
||||
├── _helpers.tpl # Template helper functions
|
||||
├── ingress.yaml # Ingress configuration (optional)
|
||||
├── {service}-deployment.yaml # Deployment for each service
|
||||
└── {service}-service.yaml # Service for each service
|
||||
```
|
||||
|
||||
### Generated Files
|
||||
|
||||
#### Chart.yaml
|
||||
- Standard Helm v2 chart metadata
|
||||
- Includes keywords for AI/ML applications
|
||||
- Maintainer information
|
||||
|
||||
#### values.yaml
|
||||
- Individual service configurations
|
||||
- Resource limits and requests
|
||||
- Service types and ports
|
||||
- Node selectors, affinity, and tolerations
|
||||
- Global settings and ingress configuration
|
||||
|
||||
#### Deployment Templates
|
||||
- Kubernetes Deployment manifests
|
||||
- Health checks (liveness and readiness probes)
|
||||
- Resource management
|
||||
- Container port configuration from metadata
|
||||
- Support for node selectors, affinity, and tolerations
|
||||
|
||||
#### Service Templates
|
||||
- Kubernetes Service manifests
|
||||
- ClusterIP services by default
|
||||
- Port mapping from metadata
|
||||
|
||||
#### Ingress Template
|
||||
- Optional ingress configuration
|
||||
- Disabled by default
|
||||
- Configurable through values.yaml
|
||||
|
||||
## Example Output
|
||||
|
||||
When run against the predict-otron-9000 workspace, the tool generates:
|
||||
|
||||
```bash
|
||||
$ ./target/release/helm-chart-tool --workspace .. --output ../generated-helm-chart
|
||||
Parsing workspace at: ..
|
||||
Output directory: ../generated-helm-chart
|
||||
Chart name: predict-otron-9000
|
||||
Found 4 services:
|
||||
- leptos-app: ghcr.io/geoffsee/leptos-app:latest (port 8788)
|
||||
- inference-engine: ghcr.io/geoffsee/inference-service:latest (port 8080)
|
||||
- embeddings-engine: ghcr.io/geoffsee/embeddings-service:latest (port 8080)
|
||||
- predict-otron-9000: ghcr.io/geoffsee/predict-otron-9000:latest (port 8080)
|
||||
Helm chart generated successfully!
|
||||
```
|
||||
|
||||
## Validation
|
||||
|
||||
The generated charts pass Helm validation:
|
||||
|
||||
```bash
|
||||
$ helm lint generated-helm-chart
|
||||
==> Linting generated-helm-chart
|
||||
[INFO] Chart.yaml: icon is recommended
|
||||
1 chart(s) linted, 0 chart(s) failed
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
Deploy the generated chart:
|
||||
|
||||
```bash
|
||||
# Install the chart
|
||||
helm install my-release ./generated-helm-chart
|
||||
|
||||
# Upgrade the chart
|
||||
helm upgrade my-release ./generated-helm-chart
|
||||
|
||||
# Uninstall the chart
|
||||
helm uninstall my-release
|
||||
```
|
||||
|
||||
### Customization
|
||||
|
||||
Customize the deployment by modifying `values.yaml`:
|
||||
|
||||
```yaml
|
||||
# Enable ingress
|
||||
ingress:
|
||||
enabled: true
|
||||
className: "nginx"
|
||||
hosts:
|
||||
- host: my-app.example.com
|
||||
|
||||
# Adjust resources for a specific service
|
||||
predict_otron_9000:
|
||||
replicas: 3
|
||||
resources:
|
||||
limits:
|
||||
memory: "4Gi"
|
||||
cpu: "2000m"
|
||||
requests:
|
||||
memory: "2Gi"
|
||||
cpu: "1000m"
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- Rust 2021+ (for building the tool)
|
||||
- Helm 3.x (for deploying the generated charts)
|
||||
- Kubernetes cluster (for deployment)
|
||||
|
||||
## Limitations
|
||||
|
||||
- Currently assumes all services need health checks on `/health` endpoint
|
||||
- Resource limits are hardcoded defaults (can be overridden in values.yaml)
|
||||
- Ingress configuration is basic (can be customized through values.yaml)
|
||||
|
||||
## Contributing
|
||||
|
||||
1. Add new features to the tool
|
||||
2. Test with various Cargo.toml metadata configurations
|
||||
3. Validate generated charts with `helm lint`
|
||||
4. Ensure charts deploy successfully to test clusters
|
||||
|
||||
## License
|
||||
|
||||
This tool is part of the predict-otron-9000 project and follows the same license terms.
|
515
crates/helm-chart-tool/src/main.rs
Normal file
515
crates/helm-chart-tool/src/main.rs
Normal file
@@ -0,0 +1,515 @@
|
||||
use anyhow::{Context, Result};
|
||||
use clap::{Arg, Command};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::collections::HashMap;
|
||||
use std::fs;
|
||||
use std::path::{Path, PathBuf};
|
||||
use walkdir::WalkDir;
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct CargoToml {
|
||||
package: Option<Package>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct Package {
|
||||
name: String,
|
||||
metadata: Option<Metadata>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct Metadata {
|
||||
kube: Option<KubeMetadata>,
|
||||
compose: Option<ComposeMetadata>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct KubeMetadata {
|
||||
image: String,
|
||||
replicas: Option<u32>,
|
||||
port: u16,
|
||||
}
|
||||
|
||||
#[derive(Debug, Deserialize)]
|
||||
struct ComposeMetadata {
|
||||
image: Option<String>,
|
||||
port: Option<u16>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone)]
|
||||
struct ServiceInfo {
|
||||
name: String,
|
||||
image: String,
|
||||
port: u16,
|
||||
replicas: u32,
|
||||
}
|
||||
|
||||
fn main() -> Result<()> {
|
||||
let matches = Command::new("helm-chart-tool")
|
||||
.about("Generate Helm charts from Cargo.toml metadata")
|
||||
.arg(
|
||||
Arg::new("workspace")
|
||||
.short('w')
|
||||
.long("workspace")
|
||||
.value_name("PATH")
|
||||
.help("Path to the workspace root")
|
||||
.default_value("."),
|
||||
)
|
||||
.arg(
|
||||
Arg::new("output")
|
||||
.short('o')
|
||||
.long("output")
|
||||
.value_name("PATH")
|
||||
.help("Output directory for the Helm chart")
|
||||
.default_value("./helm-chart"),
|
||||
)
|
||||
.arg(
|
||||
Arg::new("chart-name")
|
||||
.short('n')
|
||||
.long("name")
|
||||
.value_name("NAME")
|
||||
.help("Name of the Helm chart")
|
||||
.default_value("predict-otron-9000"),
|
||||
)
|
||||
.get_matches();
|
||||
|
||||
let workspace_path = matches.get_one::<String>("workspace").unwrap();
|
||||
let output_path = matches.get_one::<String>("output").unwrap();
|
||||
let chart_name = matches.get_one::<String>("chart-name").unwrap();
|
||||
|
||||
println!("Parsing workspace at: {}", workspace_path);
|
||||
println!("Output directory: {}", output_path);
|
||||
println!("Chart name: {}", chart_name);
|
||||
|
||||
let services = discover_services(workspace_path)?;
|
||||
println!("Found {} services:", services.len());
|
||||
for service in &services {
|
||||
println!(" - {}: {} (port {})", service.name, service.image, service.port);
|
||||
}
|
||||
|
||||
generate_helm_chart(output_path, chart_name, &services)?;
|
||||
println!("Helm chart generated successfully!");
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn discover_services(workspace_path: &str) -> Result<Vec<ServiceInfo>> {
|
||||
let workspace_root = Path::new(workspace_path);
|
||||
let mut services = Vec::new();
|
||||
|
||||
// Find all Cargo.toml files in the workspace
|
||||
for entry in WalkDir::new(workspace_root)
|
||||
.into_iter()
|
||||
.filter_map(|e| e.ok())
|
||||
{
|
||||
if entry.file_name() == "Cargo.toml" && entry.path() != workspace_root.join("Cargo.toml") {
|
||||
if let Ok(service_info) = parse_cargo_toml(entry.path()) {
|
||||
services.push(service_info);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
Ok(services)
|
||||
}
|
||||
|
||||
fn parse_cargo_toml(path: &Path) -> Result<ServiceInfo> {
|
||||
let content = fs::read_to_string(path)
|
||||
.with_context(|| format!("Failed to read Cargo.toml at {:?}", path))?;
|
||||
|
||||
let cargo_toml: CargoToml = toml::from_str(&content)
|
||||
.with_context(|| format!("Failed to parse Cargo.toml at {:?}", path))?;
|
||||
|
||||
let package = cargo_toml.package
|
||||
.ok_or_else(|| anyhow::anyhow!("No package section found in {:?}", path))?;
|
||||
|
||||
let metadata = package.metadata
|
||||
.ok_or_else(|| anyhow::anyhow!("No metadata section found in {:?}", path))?;
|
||||
|
||||
let kube_metadata = metadata.kube
|
||||
.ok_or_else(|| anyhow::anyhow!("No kube metadata found in {:?}", path))?;
|
||||
|
||||
Ok(ServiceInfo {
|
||||
name: package.name,
|
||||
image: kube_metadata.image,
|
||||
port: kube_metadata.port,
|
||||
replicas: kube_metadata.replicas.unwrap_or(1),
|
||||
})
|
||||
}
|
||||
|
||||
fn generate_helm_chart(output_path: &str, chart_name: &str, services: &[ServiceInfo]) -> Result<()> {
|
||||
let chart_dir = Path::new(output_path);
|
||||
let templates_dir = chart_dir.join("templates");
|
||||
|
||||
// Create directories
|
||||
fs::create_dir_all(&templates_dir)?;
|
||||
|
||||
// Generate Chart.yaml
|
||||
generate_chart_yaml(chart_dir, chart_name)?;
|
||||
|
||||
// Generate values.yaml
|
||||
generate_values_yaml(chart_dir, services)?;
|
||||
|
||||
// Generate templates for each service
|
||||
for service in services {
|
||||
generate_deployment_template(&templates_dir, service)?;
|
||||
generate_service_template(&templates_dir, service)?;
|
||||
}
|
||||
|
||||
// Generate ingress template
|
||||
generate_ingress_template(&templates_dir, services)?;
|
||||
|
||||
// Generate helper templates
|
||||
generate_helpers_template(&templates_dir)?;
|
||||
|
||||
// Generate .helmignore
|
||||
generate_helmignore(chart_dir)?;
|
||||
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn generate_chart_yaml(chart_dir: &Path, chart_name: &str) -> Result<()> {
|
||||
let chart_yaml = format!(
|
||||
r#"apiVersion: v2
|
||||
name: {}
|
||||
description: A Helm chart for the predict-otron-9000 AI platform
|
||||
type: application
|
||||
version: 0.1.0
|
||||
appVersion: "0.1.0"
|
||||
keywords:
|
||||
- ai
|
||||
- llm
|
||||
- inference
|
||||
- embeddings
|
||||
- chat
|
||||
maintainers:
|
||||
- name: predict-otron-9000-team
|
||||
"#,
|
||||
chart_name
|
||||
);
|
||||
|
||||
fs::write(chart_dir.join("Chart.yaml"), chart_yaml)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn generate_values_yaml(chart_dir: &Path, services: &[ServiceInfo]) -> Result<()> {
|
||||
let mut values = String::from(
|
||||
r#"# Default values for predict-otron-9000
|
||||
# This is a YAML-formatted file.
|
||||
|
||||
global:
|
||||
imagePullPolicy: IfNotPresent
|
||||
serviceType: ClusterIP
|
||||
|
||||
# Ingress configuration
|
||||
ingress:
|
||||
enabled: false
|
||||
className: ""
|
||||
annotations: {}
|
||||
hosts:
|
||||
- host: predict-otron-9000.local
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: predict-otron-9000
|
||||
port:
|
||||
number: 8080
|
||||
tls: []
|
||||
|
||||
"#,
|
||||
);
|
||||
|
||||
for service in services {
|
||||
let service_config = format!(
|
||||
r#"{}:
|
||||
image:
|
||||
repository: {}
|
||||
tag: "latest"
|
||||
pullPolicy: IfNotPresent
|
||||
replicas: {}
|
||||
service:
|
||||
type: ClusterIP
|
||||
port: {}
|
||||
resources:
|
||||
limits:
|
||||
memory: "1Gi"
|
||||
cpu: "1000m"
|
||||
requests:
|
||||
memory: "512Mi"
|
||||
cpu: "250m"
|
||||
nodeSelector: {{}}
|
||||
tolerations: []
|
||||
affinity: {{}}
|
||||
|
||||
"#,
|
||||
service.name.replace("-", "_"),
|
||||
service.image.split(':').next().unwrap_or(&service.image),
|
||||
service.replicas,
|
||||
service.port
|
||||
);
|
||||
values.push_str(&service_config);
|
||||
}
|
||||
|
||||
fs::write(chart_dir.join("values.yaml"), values)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn generate_deployment_template(templates_dir: &Path, service: &ServiceInfo) -> Result<()> {
|
||||
let service_name_underscore = service.name.replace("-", "_");
|
||||
let deployment_template = format!(
|
||||
r#"apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: {{{{ include "predict-otron-9000.fullname" . }}}}-{}
|
||||
labels:
|
||||
{{{{- include "predict-otron-9000.labels" . | nindent 4 }}}}
|
||||
app.kubernetes.io/component: {}
|
||||
spec:
|
||||
replicas: {{{{ .Values.{}.replicas }}}}
|
||||
selector:
|
||||
matchLabels:
|
||||
{{{{- include "predict-otron-9000.selectorLabels" . | nindent 6 }}}}
|
||||
app.kubernetes.io/component: {}
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
{{{{- include "predict-otron-9000.selectorLabels" . | nindent 8 }}}}
|
||||
app.kubernetes.io/component: {}
|
||||
spec:
|
||||
containers:
|
||||
- name: {}
|
||||
image: "{{{{ .Values.{}.image.repository }}}}:{{{{ .Values.{}.image.tag }}}}"
|
||||
imagePullPolicy: {{{{ .Values.{}.image.pullPolicy }}}}
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: {}
|
||||
protocol: TCP
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 5
|
||||
resources:
|
||||
{{{{- toYaml .Values.{}.resources | nindent 12 }}}}
|
||||
{{{{- with .Values.{}.nodeSelector }}}}
|
||||
nodeSelector:
|
||||
{{{{- toYaml . | nindent 8 }}}}
|
||||
{{{{- end }}}}
|
||||
{{{{- with .Values.{}.affinity }}}}
|
||||
affinity:
|
||||
{{{{- toYaml . | nindent 8 }}}}
|
||||
{{{{- end }}}}
|
||||
{{{{- with .Values.{}.tolerations }}}}
|
||||
tolerations:
|
||||
{{{{- toYaml . | nindent 8 }}}}
|
||||
{{{{- end }}}}
|
||||
"#,
|
||||
service.name,
|
||||
service.name,
|
||||
service_name_underscore,
|
||||
service.name,
|
||||
service.name,
|
||||
service.name,
|
||||
service_name_underscore,
|
||||
service_name_underscore,
|
||||
service_name_underscore,
|
||||
service.port,
|
||||
service_name_underscore,
|
||||
service_name_underscore,
|
||||
service_name_underscore,
|
||||
service_name_underscore
|
||||
);
|
||||
|
||||
let filename = format!("{}-deployment.yaml", service.name);
|
||||
fs::write(templates_dir.join(filename), deployment_template)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn generate_service_template(templates_dir: &Path, service: &ServiceInfo) -> Result<()> {
|
||||
let service_template = format!(
|
||||
r#"apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: {{{{ include "predict-otron-9000.fullname" . }}}}-{}
|
||||
labels:
|
||||
{{{{- include "predict-otron-9000.labels" . | nindent 4 }}}}
|
||||
app.kubernetes.io/component: {}
|
||||
spec:
|
||||
type: {{{{ .Values.{}.service.type }}}}
|
||||
ports:
|
||||
- port: {{{{ .Values.{}.service.port }}}}
|
||||
targetPort: http
|
||||
protocol: TCP
|
||||
name: http
|
||||
selector:
|
||||
{{{{- include "predict-otron-9000.selectorLabels" . | nindent 4 }}}}
|
||||
app.kubernetes.io/component: {}
|
||||
"#,
|
||||
service.name,
|
||||
service.name,
|
||||
service.name.replace("-", "_"),
|
||||
service.name.replace("-", "_"),
|
||||
service.name
|
||||
);
|
||||
|
||||
let filename = format!("{}-service.yaml", service.name);
|
||||
fs::write(templates_dir.join(filename), service_template)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn generate_ingress_template(templates_dir: &Path, services: &[ServiceInfo]) -> Result<()> {
|
||||
let ingress_template = r#"{{- if .Values.ingress.enabled -}}
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: {{ include "predict-otron-9000.fullname" . }}
|
||||
labels:
|
||||
{{- include "predict-otron-9000.labels" . | nindent 4 }}
|
||||
{{- with .Values.ingress.annotations }}
|
||||
annotations:
|
||||
{{- toYaml . | nindent 4 }}
|
||||
{{- end }}
|
||||
spec:
|
||||
{{- if .Values.ingress.className }}
|
||||
ingressClassName: {{ .Values.ingress.className }}
|
||||
{{- end }}
|
||||
{{- if .Values.ingress.tls }}
|
||||
tls:
|
||||
{{- range .Values.ingress.tls }}
|
||||
- hosts:
|
||||
{{- range .hosts }}
|
||||
- {{ . | quote }}
|
||||
{{- end }}
|
||||
secretName: {{ .secretName }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
rules:
|
||||
{{- range .Values.ingress.hosts }}
|
||||
- host: {{ .host | quote }}
|
||||
http:
|
||||
paths:
|
||||
{{- range .paths }}
|
||||
- path: {{ .path }}
|
||||
{{- if .pathType }}
|
||||
pathType: {{ .pathType }}
|
||||
{{- end }}
|
||||
backend:
|
||||
service:
|
||||
name: {{ include "predict-otron-9000.fullname" $ }}-{{ .backend.service.name }}
|
||||
port:
|
||||
number: {{ .backend.service.port.number }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
"#;
|
||||
|
||||
fs::write(templates_dir.join("ingress.yaml"), ingress_template)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn generate_helpers_template(templates_dir: &Path) -> Result<()> {
|
||||
let helpers_template = r#"{{/*
|
||||
Expand the name of the chart.
|
||||
*/}}
|
||||
{{- define "predict-otron-9000.name" -}}
|
||||
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create a default fully qualified app name.
|
||||
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
|
||||
If release name contains chart name it will be used as a full name.
|
||||
*/}}
|
||||
{{- define "predict-otron-9000.fullname" -}}
|
||||
{{- if .Values.fullnameOverride }}
|
||||
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
|
||||
{{- else }}
|
||||
{{- $name := default .Chart.Name .Values.nameOverride }}
|
||||
{{- if contains $name .Release.Name }}
|
||||
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
|
||||
{{- else }}
|
||||
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create chart name and version as used by the chart label.
|
||||
*/}}
|
||||
{{- define "predict-otron-9000.chart" -}}
|
||||
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Common labels
|
||||
*/}}
|
||||
{{- define "predict-otron-9000.labels" -}}
|
||||
helm.sh/chart: {{ include "predict-otron-9000.chart" . }}
|
||||
{{ include "predict-otron-9000.selectorLabels" . }}
|
||||
{{- if .Chart.AppVersion }}
|
||||
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
|
||||
{{- end }}
|
||||
app.kubernetes.io/managed-by: {{ .Release.Service }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Selector labels
|
||||
*/}}
|
||||
{{- define "predict-otron-9000.selectorLabels" -}}
|
||||
app.kubernetes.io/name: {{ include "predict-otron-9000.name" . }}
|
||||
app.kubernetes.io/instance: {{ .Release.Name }}
|
||||
{{- end }}
|
||||
|
||||
{{/*
|
||||
Create the name of the service account to use
|
||||
*/}}
|
||||
{{- define "predict-otron-9000.serviceAccountName" -}}
|
||||
{{- if .Values.serviceAccount.create }}
|
||||
{{- default (include "predict-otron-9000.fullname" .) .Values.serviceAccount.name }}
|
||||
{{- else }}
|
||||
{{- default "default" .Values.serviceAccount.name }}
|
||||
{{- end }}
|
||||
{{- end }}
|
||||
"#;
|
||||
|
||||
fs::write(templates_dir.join("_helpers.tpl"), helpers_template)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn generate_helmignore(chart_dir: &Path) -> Result<()> {
|
||||
let helmignore_content = r#"# Patterns to ignore when building packages.
|
||||
# This supports shell glob matching, relative path matching, and
|
||||
# negation (prefixed with !). Only one pattern per line.
|
||||
.DS_Store
|
||||
# Common VCS dirs
|
||||
.git/
|
||||
.gitignore
|
||||
.bzr/
|
||||
.bzrignore
|
||||
.hg/
|
||||
.hgignore
|
||||
.svn/
|
||||
# Common backup files
|
||||
*.swp
|
||||
*.bak
|
||||
*.tmp
|
||||
*.orig
|
||||
*~
|
||||
# Various IDEs
|
||||
.project
|
||||
.idea/
|
||||
*.tmproj
|
||||
.vscode/
|
||||
"#;
|
||||
|
||||
fs::write(chart_dir.join(".helmignore"), helmignore_content)?;
|
||||
Ok(())
|
||||
}
|
@@ -1,295 +0,0 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<title>OpenAI-Compatible API Tester</title>
|
||||
<style>
|
||||
body {
|
||||
font-family: Arial, sans-serif;
|
||||
max-width: 800px;
|
||||
margin: 0 auto;
|
||||
padding: 20px;
|
||||
line-height: 1.6;
|
||||
}
|
||||
h1, h2 {
|
||||
color: #333;
|
||||
}
|
||||
.container {
|
||||
margin-bottom: 20px;
|
||||
}
|
||||
textarea {
|
||||
width: 100%;
|
||||
height: 150px;
|
||||
padding: 10px;
|
||||
margin-bottom: 10px;
|
||||
border: 1px solid #ddd;
|
||||
border-radius: 4px;
|
||||
font-family: monospace;
|
||||
}
|
||||
button {
|
||||
background-color: #4CAF50;
|
||||
color: white;
|
||||
padding: 10px 15px;
|
||||
border: none;
|
||||
border-radius: 4px;
|
||||
cursor: pointer;
|
||||
font-size: 16px;
|
||||
}
|
||||
button:hover {
|
||||
background-color: #45a049;
|
||||
}
|
||||
pre {
|
||||
background-color: #f5f5f5;
|
||||
padding: 15px;
|
||||
border-radius: 4px;
|
||||
overflow-x: auto;
|
||||
white-space: pre-wrap;
|
||||
}
|
||||
.response {
|
||||
margin-top: 20px;
|
||||
}
|
||||
.error {
|
||||
color: red;
|
||||
}
|
||||
.settings {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 10px;
|
||||
margin-bottom: 15px;
|
||||
}
|
||||
.settings div {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
}
|
||||
label {
|
||||
margin-bottom: 5px;
|
||||
font-weight: bold;
|
||||
}
|
||||
input {
|
||||
padding: 8px;
|
||||
border: 1px solid #ddd;
|
||||
border-radius: 4px;
|
||||
}
|
||||
.examples {
|
||||
margin-top: 30px;
|
||||
}
|
||||
.example-btn {
|
||||
background-color: #2196F3;
|
||||
margin-right: 10px;
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
.example-btn:hover {
|
||||
background-color: #0b7dda;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>OpenAI-Compatible API Tester</h1>
|
||||
<p>Use this page to test the OpenAI-compatible chat completions endpoint of the local inference engine.</p>
|
||||
|
||||
<div class="container">
|
||||
<h2>Request Settings</h2>
|
||||
<div class="settings">
|
||||
<div>
|
||||
<label for="serverUrl">Server URL:</label>
|
||||
<input type="text" id="serverUrl" value="http://localhost:3777" />
|
||||
</div>
|
||||
<div>
|
||||
<label for="model">Model:</label>
|
||||
<input type="text" id="model" value="gemma-3-1b-it" />
|
||||
</div>
|
||||
<div>
|
||||
<label for="maxTokens">Max Tokens:</label>
|
||||
<input type="number" id="maxTokens" value="150" />
|
||||
</div>
|
||||
<div>
|
||||
<label for="temperature">Temperature:</label>
|
||||
<input type="number" id="temperature" value="0.7" step="0.1" min="0" max="2" />
|
||||
</div>
|
||||
<div>
|
||||
<label for="topP">Top P:</label>
|
||||
<input type="number" id="topP" value="0.9" step="0.1" min="0" max="1" />
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<h2>Request Body</h2>
|
||||
<textarea id="requestBody">{
|
||||
"model": "gemma-3-1b-it",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello, how are you today?"
|
||||
}
|
||||
],
|
||||
"max_tokens": 150,
|
||||
"temperature": 0.7,
|
||||
"top_p": 0.9
|
||||
}</textarea>
|
||||
<button id="sendRequest">Send Request</button>
|
||||
|
||||
<div class="examples">
|
||||
<h3>Example Requests</h3>
|
||||
<button class="example-btn" id="example1">Basic Question</button>
|
||||
<button class="example-btn" id="example2">Multi-turn Conversation</button>
|
||||
<button class="example-btn" id="example3">Creative Writing</button>
|
||||
<button class="example-btn" id="example4">Code Generation</button>
|
||||
</div>
|
||||
|
||||
<div class="response">
|
||||
<h2>Response</h2>
|
||||
<pre id="responseOutput">Response will appear here...</pre>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
document.addEventListener('DOMContentLoaded', function() {
|
||||
// Update request body when settings change
|
||||
const serverUrlInput = document.getElementById('serverUrl');
|
||||
const modelInput = document.getElementById('model');
|
||||
const maxTokensInput = document.getElementById('maxTokens');
|
||||
const temperatureInput = document.getElementById('temperature');
|
||||
const topPInput = document.getElementById('topP');
|
||||
const requestBodyTextarea = document.getElementById('requestBody');
|
||||
const responseOutput = document.getElementById('responseOutput');
|
||||
|
||||
// Function to update request body from settings
|
||||
function updateRequestBodyFromSettings() {
|
||||
try {
|
||||
const requestBody = JSON.parse(requestBodyTextarea.value);
|
||||
requestBody.model = modelInput.value;
|
||||
requestBody.max_tokens = parseInt(maxTokensInput.value);
|
||||
requestBody.temperature = parseFloat(temperatureInput.value);
|
||||
requestBody.top_p = parseFloat(topPInput.value);
|
||||
requestBodyTextarea.value = JSON.stringify(requestBody, null, 2);
|
||||
} catch (error) {
|
||||
console.error("Error updating request body:", error);
|
||||
}
|
||||
}
|
||||
|
||||
// Update settings when request body changes
|
||||
function updateSettingsFromRequestBody() {
|
||||
try {
|
||||
const requestBody = JSON.parse(requestBodyTextarea.value);
|
||||
if (requestBody.model) modelInput.value = requestBody.model;
|
||||
if (requestBody.max_tokens) maxTokensInput.value = requestBody.max_tokens;
|
||||
if (requestBody.temperature) temperatureInput.value = requestBody.temperature;
|
||||
if (requestBody.top_p) topPInput.value = requestBody.top_p;
|
||||
} catch (error) {
|
||||
console.error("Error updating settings:", error);
|
||||
}
|
||||
}
|
||||
|
||||
// Add event listeners for settings changes
|
||||
modelInput.addEventListener('change', updateRequestBodyFromSettings);
|
||||
maxTokensInput.addEventListener('change', updateRequestBodyFromSettings);
|
||||
temperatureInput.addEventListener('change', updateRequestBodyFromSettings);
|
||||
topPInput.addEventListener('change', updateRequestBodyFromSettings);
|
||||
|
||||
// Add event listener for request body changes
|
||||
requestBodyTextarea.addEventListener('blur', updateSettingsFromRequestBody);
|
||||
|
||||
// Send request button
|
||||
document.getElementById('sendRequest').addEventListener('click', async function() {
|
||||
try {
|
||||
responseOutput.textContent = "Sending request...";
|
||||
const serverUrl = serverUrlInput.value;
|
||||
const endpoint = '/v1/chat/completions';
|
||||
const url = serverUrl + endpoint;
|
||||
|
||||
const requestBody = JSON.parse(requestBodyTextarea.value);
|
||||
|
||||
const response = await fetch(url, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify(requestBody)
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
responseOutput.textContent = JSON.stringify(data, null, 2);
|
||||
} catch (error) {
|
||||
responseOutput.textContent = "Error: " + error.message;
|
||||
responseOutput.classList.add('error');
|
||||
}
|
||||
});
|
||||
|
||||
// Example requests
|
||||
document.getElementById('example1').addEventListener('click', function() {
|
||||
requestBodyTextarea.value = JSON.stringify({
|
||||
model: modelInput.value,
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: "Who was the 16th president of the United States?"
|
||||
}
|
||||
],
|
||||
max_tokens: parseInt(maxTokensInput.value),
|
||||
temperature: parseFloat(temperatureInput.value),
|
||||
top_p: parseFloat(topPInput.value)
|
||||
}, null, 2);
|
||||
});
|
||||
|
||||
document.getElementById('example2').addEventListener('click', function() {
|
||||
requestBodyTextarea.value = JSON.stringify({
|
||||
model: modelInput.value,
|
||||
messages: [
|
||||
{
|
||||
role: "system",
|
||||
content: "You are a helpful assistant that provides concise answers."
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: "What is machine learning?"
|
||||
},
|
||||
{
|
||||
role: "assistant",
|
||||
content: "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed."
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: "Give me an example of a machine learning algorithm."
|
||||
}
|
||||
],
|
||||
max_tokens: parseInt(maxTokensInput.value),
|
||||
temperature: parseFloat(temperatureInput.value),
|
||||
top_p: parseFloat(topPInput.value)
|
||||
}, null, 2);
|
||||
});
|
||||
|
||||
document.getElementById('example3').addEventListener('click', function() {
|
||||
requestBodyTextarea.value = JSON.stringify({
|
||||
model: modelInput.value,
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: "Write a short poem about artificial intelligence."
|
||||
}
|
||||
],
|
||||
max_tokens: parseInt(maxTokensInput.value),
|
||||
temperature: 0.9, // Higher temperature for creative tasks
|
||||
top_p: 0.9
|
||||
}, null, 2);
|
||||
temperatureInput.value = 0.9;
|
||||
});
|
||||
|
||||
document.getElementById('example4').addEventListener('click', function() {
|
||||
requestBodyTextarea.value = JSON.stringify({
|
||||
model: modelInput.value,
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: "Write a Python function to calculate the Fibonacci sequence up to n terms."
|
||||
}
|
||||
],
|
||||
max_tokens: parseInt(maxTokensInput.value),
|
||||
temperature: 0.3, // Lower temperature for code generation
|
||||
top_p: 0.9
|
||||
}, null, 2);
|
||||
temperatureInput.value = 0.3;
|
||||
});
|
||||
});
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
@@ -1,176 +0,0 @@
|
||||
// Test requests for the OpenAI-compatible endpoint in the inference server
|
||||
// This file contains IIFE (Immediately Invoked Function Expression) JavaScript requests
|
||||
// to test the /v1/chat/completions endpoint
|
||||
|
||||
// Basic chat completion request
|
||||
(async function testBasicChatCompletion() {
|
||||
console.log("Test 1: Basic chat completion request");
|
||||
try {
|
||||
const response = await fetch('http://localhost:3777/v1/chat/completions', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: "gemma-2-2b-it",
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: "Who was the 16th president of the United States?"
|
||||
}
|
||||
],
|
||||
max_tokens: 100
|
||||
})
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
console.log("Response:", JSON.stringify(data, null, 2));
|
||||
} catch (error) {
|
||||
console.error("Error:", error);
|
||||
}
|
||||
})();
|
||||
|
||||
// Multi-turn conversation
|
||||
(async function testMultiTurnConversation() {
|
||||
console.log("\nTest 2: Multi-turn conversation");
|
||||
try {
|
||||
const response = await fetch('http://localhost:3777/v1/chat/completions', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: "gemma-2-2b-it",
|
||||
messages: [
|
||||
{
|
||||
role: "system",
|
||||
content: "You are a helpful assistant that provides concise answers."
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: "What is machine learning?"
|
||||
},
|
||||
{
|
||||
role: "assistant",
|
||||
content: "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed."
|
||||
},
|
||||
{
|
||||
role: "user",
|
||||
content: "Give me an example of a machine learning algorithm."
|
||||
}
|
||||
],
|
||||
max_tokens: 150
|
||||
})
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
console.log("Response:", JSON.stringify(data, null, 2));
|
||||
} catch (error) {
|
||||
console.error("Error:", error);
|
||||
}
|
||||
})();
|
||||
|
||||
// Request with temperature and top_p parameters
|
||||
(async function testTemperatureAndTopP() {
|
||||
console.log("\nTest 3: Request with temperature and top_p parameters");
|
||||
try {
|
||||
const response = await fetch('http://localhost:3777/v1/chat/completions', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: "gemma-2-2b-it",
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: "Write a short poem about artificial intelligence."
|
||||
}
|
||||
],
|
||||
max_tokens: 200,
|
||||
temperature: 0.8,
|
||||
top_p: 0.9
|
||||
})
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
console.log("Response:", JSON.stringify(data, null, 2));
|
||||
} catch (error) {
|
||||
console.error("Error:", error);
|
||||
}
|
||||
})();
|
||||
|
||||
// Request with streaming enabled
|
||||
(async function testStreaming() {
|
||||
console.log("\nTest 4: Request with streaming enabled");
|
||||
try {
|
||||
const response = await fetch('http://localhost:3777/v1/chat/completions', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: "gemma-2-2b-it",
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: "Explain quantum computing in simple terms."
|
||||
}
|
||||
],
|
||||
max_tokens: 150,
|
||||
stream: true
|
||||
})
|
||||
});
|
||||
|
||||
// Note: Streaming might not be implemented yet, this is to test the API's handling of the parameter
|
||||
if (response.headers.get('content-type')?.includes('text/event-stream')) {
|
||||
console.log("Streaming response detected. Reading stream...");
|
||||
const reader = response.body.getReader();
|
||||
const decoder = new TextDecoder();
|
||||
|
||||
while (true) {
|
||||
const { done, value } = await reader.read();
|
||||
if (done) break;
|
||||
|
||||
const chunk = decoder.decode(value);
|
||||
console.log("Chunk:", chunk);
|
||||
}
|
||||
} else {
|
||||
const data = await response.json();
|
||||
console.log("Non-streaming response:", JSON.stringify(data, null, 2));
|
||||
}
|
||||
} catch (error) {
|
||||
console.error("Error:", error);
|
||||
}
|
||||
})();
|
||||
|
||||
// Request with a different model
|
||||
(async function testDifferentModel() {
|
||||
console.log("\nTest 5: Request with a different model");
|
||||
try {
|
||||
const response = await fetch('http://localhost:3777/v1/chat/completions', {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
},
|
||||
body: JSON.stringify({
|
||||
model: "gemma-2-2b-it", // Using a different model if available
|
||||
messages: [
|
||||
{
|
||||
role: "user",
|
||||
content: "What are the benefits of renewable energy?"
|
||||
}
|
||||
],
|
||||
max_tokens: 150
|
||||
})
|
||||
});
|
||||
|
||||
const data = await response.json();
|
||||
console.log("Response:", JSON.stringify(data, null, 2));
|
||||
} catch (error) {
|
||||
console.error("Error:", error);
|
||||
}
|
||||
})();
|
||||
|
||||
console.log("\nAll test requests have been sent. Check the server logs for more details.");
|
||||
console.log("To run the server, use: cargo run --bin inference-engine -- --server");
|
8
crates/predict-otron-9000/README.md
Normal file
8
crates/predict-otron-9000/README.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# predict-otron-9000
|
||||
|
||||
This is an extensible axum/tokio hybrid combining [embeddings-engine](../embeddings-engine), [inference-engine](../inference-engine), and [leptos-app](../leptos-app).
|
||||
|
||||
|
||||
# Notes
|
||||
- When `server_mode` is Standalone (default), the instance contains all components necessary for inference.
|
||||
- When `server_mode` is HighAvailability, automatic scaling of inference and embeddings; proxies to inference and embeddings services via dns
|
Reference in New Issue
Block a user