predict-otron-9001/docs/TESTING.md

# Testing Guide for Predict-otron-9000

This document provides comprehensive guidance on testing the Predict-otron-9000 system, including how to run existing tests and how to write new ones. The testing strategy covers different levels of testing from unit tests to performance evaluation.

## Table of Contents

- [Testing Overview](#testing-overview)
- [Unit Testing](#unit-testing)
- [Integration Testing](#integration-testing)
- [End-to-End Testing](#end-to-end-testing)
- [Performance Testing](#performance-testing)
- [How to Run Existing Tests](#how-to-run-existing-tests)
- [Writing New Tests](#writing-new-tests)
- [Test Coverage](#test-coverage)

## Testing Overview

Predict-otron-9000 follows a multi-layered testing approach to ensure the reliability and performance of its components:

1. **Unit Tests**: Test individual components in isolation
2. **Integration Tests**: Test interactions between components
3. **End-to-End Tests**: Test the complete system from user input to output
4. **Performance Tests**: Evaluate system performance under various conditions

## Unit Testing

Unit tests focus on testing individual components in isolation. The project uses Rust's built-in testing framework with the `#[test]` attribute.

### Inference Engine

The inference engine has dedicated unit tests in the `tests` directory:

- `text_generation_tests.rs`: Tests for the text generation components
- `token_output_stream_tests.rs`: Tests for token stream handling
- `model_tests.rs`: Tests for model-related functionality

These tests focus on individual components like the `Which` enum, `TokenOutputStream`, and `LogitsProcessor`.

### Embeddings Engine

The embeddings engine has unit tests embedded in the main source file:

- Tests for HTTP endpoints (`test_root` and `test_embeddings_create`)
- Validates response formats and embedding dimensions

### Running Unit Tests

To run unit tests for a specific crate:

```bash
# Run all tests for a specific crate
cd crates/inference-engine
cargo test

# Run a specific test
cargo test test_token_output_stream

# Run tests with output
cargo test -- --nocapture
```

### Writing New Unit Tests

To add new unit tests:

1. For the inference engine, add test functions to the appropriate file in the `tests` directory
2. For the embeddings engine, add test functions to the `tests` module in `main.rs`

Example of a new unit test for the inference engine:

```rust
#[test]
fn test_my_new_feature() {
    // Arrange: Set up the test data
    let input = "Test input";

    // Act: Call the function being tested
    let result = my_function(input);

    // Assert: Verify the results
    assert_eq!(result, expected_output);
}
```

## Integration Testing

Integration tests verify that different components work correctly together.

### Current Integration Tests

- The embeddings engine tests in `main.rs` function as integration tests by testing the HTTP API endpoints

### Writing New Integration Tests

To add new integration tests:

1. Create a new test file in the `tests` directory
2. Use the Axum testing utilities to simulate HTTP requests

Example of an integration test for the API:

```rust
#[tokio::test]
async fn test_chat_completions_endpoint() {
    // Arrange: Create a test app
    let app = create_app();

    // Create a test request
    let request_body = serde_json::json!({
        "model": "gemma-3-1b-it",
        "messages": [{"role": "user", "content": "Hello"}]
    });

    // Act: Send the request
    let response = app
        .oneshot(
            axum::http::Request::builder()
                .method(axum::http::Method::POST)
                .uri("/v1/chat/completions")
                .header("content-type", "application/json")
                .body(Body::from(request_body.to_string()))
                .unwrap(),
        )
        .await
        .unwrap();

    // Assert: Verify the response
    assert_eq!(response.status(), StatusCode::OK);

    // Verify response format
    let body = to_bytes(response.into_body(), usize::MAX).await.unwrap();
    let response_json: serde_json::Value = serde_json::from_slice(&body).unwrap();
    assert!(response_json.get("choices").is_some());
}
```

## End-to-End Testing

End-to-end tests validate the entire system from client request to server response.

### Manual End-to-End Testing

1. Start the server:
```bash
./run_server.sh
```

2. Use curl or other HTTP clients to test the endpoints:

```bash
# Test embeddings endpoint
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-3-small", "input": "Hello, world!"}'

# Test chat completions endpoint
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gemma-3-1b-it", "messages": [{"role": "user", "content": "Hello"}]}'
```

### Automated End-to-End Testing

You can create automated end-to-end tests using shell scripts:

1. Create a new script in the project root:

```bash
#!/bin/bash
# e2e_test.sh

# Start the server in the background
./run_server.sh &
SERVER_PID=$!

# Wait for server to start
sleep 5

# Run tests
echo "Testing embeddings endpoint..."
curl -X POST http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "text-embedding-3-small", "input": "Test input"}' \
  -o /tmp/embeddings_response.json

# Validate response
if grep -q "embedding" /tmp/embeddings_response.json; then
  echo "Embeddings test passed"
else
  echo "Embeddings test failed"
  exit 1
fi

# Clean up
kill $SERVER_PID
echo "All tests passed!"
```

2. Make the script executable and run it:

```bash
chmod +x e2e_test.sh
./e2e_test.sh
```

## Performance Testing

Performance testing evaluates the system's response time, throughput, and resource usage.

### Existing Performance Tests

The project includes two performance testing scripts:

1. `performance_test_embeddings.sh`: Tests the embeddings engine with various input sizes
2. `performance_test_inference.sh`: Tests the inference engine with different prompt sizes

### Running Performance Tests

Ensure the server is running, then execute the performance test scripts:

```bash
# Test embeddings performance
./performance_test_embeddings.sh

# Test inference performance
./performance_test_inference.sh
```

### Creating New Performance Tests

To create new performance tests:

1. Use the existing scripts as templates
2. Modify the test parameters (iterations, input sizes, etc.)
3. Add specific metrics you want to measure

Example of a new performance test focusing on concurrent requests:

```bash
#!/bin/bash
# concurrent_performance_test.sh

SERVER_URL="http://localhost:8080"
CONCURRENT_REQUESTS=10
TEST_INPUT="This is a test input for concurrent performance testing."

echo "Testing with $CONCURRENT_REQUESTS concurrent requests..."

# Function to send a single request
send_request() {
    curl -s -X POST \
        -H "Content-Type: application/json" \
        -d "{\"model\": \"text-embedding-3-small\", \"input\": \"$TEST_INPUT\"}" \
        "$SERVER_URL/v1/embeddings" > /dev/null
    echo "Request completed"
}

# Start server if not running
# [server startup code here]

# Send concurrent requests
start_time=$(date +%s.%N)

for i in $(seq 1 $CONCURRENT_REQUESTS); do
    send_request &
done

# Wait for all requests to complete
wait

end_time=$(date +%s.%N)
elapsed=$(echo "$end_time - $start_time" | bc)

echo "All $CONCURRENT_REQUESTS requests completed in ${elapsed}s"
echo "Average time per request: $(echo "$elapsed / $CONCURRENT_REQUESTS" | bc -l)s"
```

## How to Run Existing Tests

### Running All Tests

To run all tests in the project:

```bash
# From the project root
cargo test --workspace
```

### Running Specific Tests

To run tests for a specific crate:

```bash
cargo test -p inference-engine
cargo test -p embeddings-engine
```

To run a specific test:

```bash
cargo test -p inference-engine test_token_output_stream
```

### Running Tests with Output

To see the output of tests, including `println!` statements:

```bash
cargo test -- --nocapture
```

### Running Performance Tests

```bash
# Make sure server is running
./run_server.sh &

# Run performance tests
./performance_test_embeddings.sh
./performance_test_inference.sh
```

## Writing New Tests

### Test Organization

- **Unit Tests**: Place in the `tests` directory or in a `tests` module within the source file
- **Integration Tests**: Create in the `tests` directory with a focus on component interactions
- **End-to-End Tests**: Implement as shell scripts or separate Rust binaries
- **Performance Tests**: Create shell scripts that measure specific performance metrics

### Test Naming Conventions

- Use descriptive test names that indicate what is being tested
- Prefix test functions with `test_`
- For complex tests, use comments to explain the test purpose

### Test Best Practices

1. **Arrange-Act-Assert**: Structure tests with clear setup, action, and verification phases
2. **Independence**: Tests should not depend on each other
3. **Determinism**: Tests should produce the same result every time
4. **Focused Scope**: Each test should verify a single behavior
5. **Error Messages**: Use descriptive assertions that explain the expected vs. actual results

Example of a well-structured test:

```rust
#[test]
fn test_embedding_dimension_matches_specification() {
    // Arrange: Set up the test environment
    let model = create_test_model();
    let input = "Test input";

    // Act: Generate the embedding
    let embedding = model.embed(input);

    // Assert: Verify the dimension
    assert_eq!(
        embedding.len(),
        768,
        "Embedding dimension should be 768, but got {}",
        embedding.len()
    );
}
```

## Test Coverage

The project currently has test coverage for:

- **Inference Engine**: Basic unit tests for key components
- **Embeddings Engine**: API endpoint tests
- **Performance**: Scripts for benchmarking both engines

Areas that could benefit from additional testing:

1. **Main Server Component**: The `predict-otron-9000` crate has limited test coverage
2. **Error Handling**: Tests for error conditions and edge cases
3. **Concurrency**: Testing behavior under concurrent load
4. **Long-Running Tests**: Stability tests for extended operation

To improve test coverage:

1. Use `cargo tarpaulin` or similar tools to measure code coverage
2. Identify uncovered code paths
3. Add tests for error conditions and edge cases
4. Implement integration tests for the main server component

---

By following this testing guide, you can ensure that the Predict-otron-9000 system maintains its reliability, performance, and correctness as it evolves.