mirror of https://github.com/geoffsee/predict-otron-9001.git synced 2025-09-08 22:46:44 +00:00

Files

geoffsee 8338750beb Refactor apply_cached_repeat_penalty for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.

Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions.

- Add CPU fallback support for text generation when primary device is unsupported
- Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors
- Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations
- Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing

chat completion endpoint functions with gemma3 (no streaming)

Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking

2025-08-27 16:15:01 -04:00

10 KiB

Raw Permalink Blame History

Testing Guide for Predict-otron-9000

This document provides comprehensive guidance on testing the Predict-otron-9000 system, including how to run existing tests and how to write new ones. The testing strategy covers different levels of testing from unit tests to performance evaluation.

Testing Overview
Unit Testing
Integration Testing
End-to-End Testing
Performance Testing
How to Run Existing Tests
Writing New Tests
Test Coverage

Testing Overview

Predict-otron-9000 follows a multi-layered testing approach to ensure the reliability and performance of its components:

Unit Tests: Test individual components in isolation
Integration Tests: Test interactions between components
End-to-End Tests: Test the complete system from user input to output
Performance Tests: Evaluate system performance under various conditions

Unit Testing

Unit tests focus on testing individual components in isolation. The project uses Rust's built-in testing framework with the #[test] attribute.

Inference Engine

The inference engine has dedicated unit tests in the tests directory:

text_generation_tests.rs: Tests for the text generation components
token_output_stream_tests.rs: Tests for token stream handling
model_tests.rs: Tests for model-related functionality

These tests focus on individual components like the Which enum, TokenOutputStream, and LogitsProcessor.

Embeddings Engine

The embeddings engine has unit tests embedded in the main source file:

Tests for HTTP endpoints (test_root and test_embeddings_create)
Validates response formats and embedding dimensions

Running Unit Tests

To run unit tests for a specific crate:

# Run all tests for a specific crate
cd crates/inference-engine
cargo test

# Run a specific test
cargo test test_token_output_stream

# Run tests with output
cargo test -- --nocapture

Writing New Unit Tests

To add new unit tests:

For the inference engine, add test functions to the appropriate file in the tests directory
For the embeddings engine, add test functions to the tests module in main.rs

Example of a new unit test for the inference engine:

#[test]
fn test_my_new_feature() {
    // Arrange: Set up the test data
    let input = "Test input";
    
    // Act: Call the function being tested
    let result = my_function(input);
    
    // Assert: Verify the results
    assert_eq!(result, expected_output);
}

Integration Testing

Integration tests verify that different components work correctly together.

Current Integration Tests

The embeddings engine tests in main.rs function as integration tests by testing the HTTP API endpoints

Writing New Integration Tests

To add new integration tests:

Create a new test file in the tests directory
Use the Axum testing utilities to simulate HTTP requests

Example of an integration test for the API:

#[tokio::test]
async fn test_chat_completions_endpoint() {
    // Arrange: Create a test app
    let app = create_app();
    
    // Create a test request
    let request_body = serde_json::json!({
        "model": "gemma-3-1b-it",
        "messages": [{"role": "user", "content": "Hello"}]
    });
    
    // Act: Send the request
    let response = app
        .oneshot(
            axum::http::Request::builder()
                .method(axum::http::Method::POST)
                .uri("/v1/chat/completions")
                .header("content-type", "application/json")
                .body(Body::from(request_body.to_string()))
                .unwrap(),
        )
        .await
        .unwrap();
    
    // Assert: Verify the response
    assert_eq!(response.status(), StatusCode::OK);
    
    // Verify response format
    let body = to_bytes(response.into_body(), usize::MAX).await.unwrap();
    let response_json: serde_json::Value = serde_json::from_slice(&body).unwrap();
    assert!(response_json.get("choices").is_some());
}

End-to-End Testing

End-to-end tests validate the entire system from client request to server response.

Manual End-to-End Testing

Start the server:

./run_server.sh

Use curl or other HTTP clients to test the endpoints:

# Test embeddings endpoint
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-3-small", "input": "Hello, world!"}'

# Test chat completions endpoint
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemma-3-1b-it", "messages": [{"role": "user", "content": "Hello"}]}'

Automated End-to-End Testing

You can create automated end-to-end tests using shell scripts:

Create a new script in the project root:

#!/bin/bash
# e2e_test.sh

# Start the server in the background
./run_server.sh &
SERVER_PID=$!

# Wait for server to start
sleep 5

# Run tests
echo "Testing embeddings endpoint..."
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-3-small", "input": "Test input"}' \
  -o /tmp/embeddings_response.json

# Validate response
if grep -q "embedding" /tmp/embeddings_response.json; then
  echo "Embeddings test passed"
else
  echo "Embeddings test failed"
  exit 1
fi

# Clean up
kill $SERVER_PID
echo "All tests passed!"

Make the script executable and run it:

chmod +x e2e_test.sh
./e2e_test.sh

Performance Testing

Performance testing evaluates the system's response time, throughput, and resource usage.

Existing Performance Tests

The project includes two performance testing scripts:

performance_test_embeddings.sh: Tests the embeddings engine with various input sizes
performance_test_inference.sh: Tests the inference engine with different prompt sizes

Running Performance Tests

Ensure the server is running, then execute the performance test scripts:

# Test embeddings performance
./performance_test_embeddings.sh

# Test inference performance
./performance_test_inference.sh

Creating New Performance Tests

To create new performance tests:

Use the existing scripts as templates
Modify the test parameters (iterations, input sizes, etc.)
Add specific metrics you want to measure

Example of a new performance test focusing on concurrent requests:

#!/bin/bash
# concurrent_performance_test.sh

SERVER_URL="http://localhost:8080"
CONCURRENT_REQUESTS=10
TEST_INPUT="This is a test input for concurrent performance testing."

echo "Testing with $CONCURRENT_REQUESTS concurrent requests..."

# Function to send a single request
send_request() {
    curl -s -X POST \
        -H "Content-Type: application/json" \
        -d "{\"model\": \"text-embedding-3-small\", \"input\": \"$TEST_INPUT\"}" \
        "$SERVER_URL/v1/embeddings" > /dev/null
    echo "Request completed"
}

# Start server if not running
# [server startup code here]

# Send concurrent requests
start_time=$(date +%s.%N)

for i in $(seq 1 $CONCURRENT_REQUESTS); do
    send_request &
done

# Wait for all requests to complete
wait

end_time=$(date +%s.%N)
elapsed=$(echo "$end_time - $start_time" | bc)

echo "All $CONCURRENT_REQUESTS requests completed in ${elapsed}s"
echo "Average time per request: $(echo "$elapsed / $CONCURRENT_REQUESTS" | bc -l)s"

How to Run Existing Tests

Running All Tests

To run all tests in the project:

# From the project root
cargo test --workspace

Running Specific Tests

To run tests for a specific crate:

cargo test -p inference-engine
cargo test -p embeddings-engine

To run a specific test:

cargo test -p inference-engine test_token_output_stream

Running Tests with Output

To see the output of tests, including println! statements:

cargo test -- --nocapture

Running Performance Tests

# Make sure server is running
./run_server.sh &

# Run performance tests
./performance_test_embeddings.sh
./performance_test_inference.sh

Writing New Tests

Test Organization

Unit Tests: Place in the tests directory or in a tests module within the source file
Integration Tests: Create in the tests directory with a focus on component interactions
End-to-End Tests: Implement as shell scripts or separate Rust binaries
Performance Tests: Create shell scripts that measure specific performance metrics

Test Naming Conventions

Use descriptive test names that indicate what is being tested
Prefix test functions with test_
For complex tests, use comments to explain the test purpose

Test Best Practices

Arrange-Act-Assert: Structure tests with clear setup, action, and verification phases
Independence: Tests should not depend on each other
Determinism: Tests should produce the same result every time
Focused Scope: Each test should verify a single behavior
Error Messages: Use descriptive assertions that explain the expected vs. actual results

Example of a well-structured test:

#[test]
fn test_embedding_dimension_matches_specification() {
    // Arrange: Set up the test environment
    let model = create_test_model();
    let input = "Test input";
    
    // Act: Generate the embedding
    let embedding = model.embed(input);
    
    // Assert: Verify the dimension
    assert_eq!(
        embedding.len(), 
        768, 
        "Embedding dimension should be 768, but got {}", 
        embedding.len()
    );
}

Test Coverage

The project currently has test coverage for:

Inference Engine: Basic unit tests for key components
Embeddings Engine: API endpoint tests
Performance: Scripts for benchmarking both engines

Areas that could benefit from additional testing:

Main Server Component: The predict-otron-9000 crate has limited test coverage
Error Handling: Tests for error conditions and edge cases
Concurrency: Testing behavior under concurrent load
Long-Running Tests: Stability tests for extended operation

To improve test coverage:

Use cargo tarpaulin or similar tools to measure code coverage
Identify uncovered code paths
Add tests for error conditions and edge cases
Implement integration tests for the main server component

By following this testing guide, you can ensure that the Predict-otron-9000 system maintains its reliability, performance, and correctness as it evolves.

10 KiB Raw Permalink Blame History

Testing Guide for Predict-otron-9000

Table of Contents

Testing Overview

Unit Testing

Inference Engine

Embeddings Engine

Running Unit Tests

Writing New Unit Tests

Integration Testing

Current Integration Tests

Writing New Integration Tests

End-to-End Testing

Manual End-to-End Testing

Automated End-to-End Testing

Performance Testing

Existing Performance Tests

Running Performance Tests

Creating New Performance Tests

How to Run Existing Tests

Running All Tests

Running Specific Tests

Running Tests with Output

Running Performance Tests

Writing New Tests

Test Organization

Test Naming Conventions

Test Best Practices

Test Coverage

10 KiB

Raw Permalink Blame History