mirror of
https://github.com/geoffsee/predict-otron-9001.git
synced 2025-09-08 22:46:44 +00:00

Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions. - Add CPU fallback support for text generation when primary device is unsupported - Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors - Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations - Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing chat completion endpoint functions with gemma3 (no streaming) Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking
392 lines
10 KiB
Markdown
392 lines
10 KiB
Markdown
# Testing Guide for Predict-otron-9000
|
|
|
|
This document provides comprehensive guidance on testing the Predict-otron-9000 system, including how to run existing tests and how to write new ones. The testing strategy covers different levels of testing from unit tests to performance evaluation.
|
|
|
|
## Table of Contents
|
|
|
|
- [Testing Overview](#testing-overview)
|
|
- [Unit Testing](#unit-testing)
|
|
- [Integration Testing](#integration-testing)
|
|
- [End-to-End Testing](#end-to-end-testing)
|
|
- [Performance Testing](#performance-testing)
|
|
- [How to Run Existing Tests](#how-to-run-existing-tests)
|
|
- [Writing New Tests](#writing-new-tests)
|
|
- [Test Coverage](#test-coverage)
|
|
|
|
## Testing Overview
|
|
|
|
Predict-otron-9000 follows a multi-layered testing approach to ensure the reliability and performance of its components:
|
|
|
|
1. **Unit Tests**: Test individual components in isolation
|
|
2. **Integration Tests**: Test interactions between components
|
|
3. **End-to-End Tests**: Test the complete system from user input to output
|
|
4. **Performance Tests**: Evaluate system performance under various conditions
|
|
|
|
## Unit Testing
|
|
|
|
Unit tests focus on testing individual components in isolation. The project uses Rust's built-in testing framework with the `#[test]` attribute.
|
|
|
|
### Inference Engine
|
|
|
|
The inference engine has dedicated unit tests in the `tests` directory:
|
|
|
|
- `text_generation_tests.rs`: Tests for the text generation components
|
|
- `token_output_stream_tests.rs`: Tests for token stream handling
|
|
- `model_tests.rs`: Tests for model-related functionality
|
|
|
|
These tests focus on individual components like the `Which` enum, `TokenOutputStream`, and `LogitsProcessor`.
|
|
|
|
### Embeddings Engine
|
|
|
|
The embeddings engine has unit tests embedded in the main source file:
|
|
|
|
- Tests for HTTP endpoints (`test_root` and `test_embeddings_create`)
|
|
- Validates response formats and embedding dimensions
|
|
|
|
### Running Unit Tests
|
|
|
|
To run unit tests for a specific crate:
|
|
|
|
```bash
|
|
# Run all tests for a specific crate
|
|
cd crates/inference-engine
|
|
cargo test
|
|
|
|
# Run a specific test
|
|
cargo test test_token_output_stream
|
|
|
|
# Run tests with output
|
|
cargo test -- --nocapture
|
|
```
|
|
|
|
### Writing New Unit Tests
|
|
|
|
To add new unit tests:
|
|
|
|
1. For the inference engine, add test functions to the appropriate file in the `tests` directory
|
|
2. For the embeddings engine, add test functions to the `tests` module in `main.rs`
|
|
|
|
Example of a new unit test for the inference engine:
|
|
|
|
```rust
|
|
#[test]
|
|
fn test_my_new_feature() {
|
|
// Arrange: Set up the test data
|
|
let input = "Test input";
|
|
|
|
// Act: Call the function being tested
|
|
let result = my_function(input);
|
|
|
|
// Assert: Verify the results
|
|
assert_eq!(result, expected_output);
|
|
}
|
|
```
|
|
|
|
## Integration Testing
|
|
|
|
Integration tests verify that different components work correctly together.
|
|
|
|
### Current Integration Tests
|
|
|
|
- The embeddings engine tests in `main.rs` function as integration tests by testing the HTTP API endpoints
|
|
|
|
### Writing New Integration Tests
|
|
|
|
To add new integration tests:
|
|
|
|
1. Create a new test file in the `tests` directory
|
|
2. Use the Axum testing utilities to simulate HTTP requests
|
|
|
|
Example of an integration test for the API:
|
|
|
|
```rust
|
|
#[tokio::test]
|
|
async fn test_chat_completions_endpoint() {
|
|
// Arrange: Create a test app
|
|
let app = create_app();
|
|
|
|
// Create a test request
|
|
let request_body = serde_json::json!({
|
|
"model": "gemma-3-1b-it",
|
|
"messages": [{"role": "user", "content": "Hello"}]
|
|
});
|
|
|
|
// Act: Send the request
|
|
let response = app
|
|
.oneshot(
|
|
axum::http::Request::builder()
|
|
.method(axum::http::Method::POST)
|
|
.uri("/v1/chat/completions")
|
|
.header("content-type", "application/json")
|
|
.body(Body::from(request_body.to_string()))
|
|
.unwrap(),
|
|
)
|
|
.await
|
|
.unwrap();
|
|
|
|
// Assert: Verify the response
|
|
assert_eq!(response.status(), StatusCode::OK);
|
|
|
|
// Verify response format
|
|
let body = to_bytes(response.into_body(), usize::MAX).await.unwrap();
|
|
let response_json: serde_json::Value = serde_json::from_slice(&body).unwrap();
|
|
assert!(response_json.get("choices").is_some());
|
|
}
|
|
```
|
|
|
|
## End-to-End Testing
|
|
|
|
End-to-end tests validate the entire system from client request to server response.
|
|
|
|
### Manual End-to-End Testing
|
|
|
|
1. Start the server:
|
|
```bash
|
|
./run_server.sh
|
|
```
|
|
|
|
2. Use curl or other HTTP clients to test the endpoints:
|
|
|
|
```bash
|
|
# Test embeddings endpoint
|
|
curl -X POST http://localhost:8080/v1/embeddings \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "text-embedding-3-small", "input": "Hello, world!"}'
|
|
|
|
# Test chat completions endpoint
|
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "gemma-3-1b-it", "messages": [{"role": "user", "content": "Hello"}]}'
|
|
```
|
|
|
|
### Automated End-to-End Testing
|
|
|
|
You can create automated end-to-end tests using shell scripts:
|
|
|
|
1. Create a new script in the project root:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# e2e_test.sh
|
|
|
|
# Start the server in the background
|
|
./run_server.sh &
|
|
SERVER_PID=$!
|
|
|
|
# Wait for server to start
|
|
sleep 5
|
|
|
|
# Run tests
|
|
echo "Testing embeddings endpoint..."
|
|
curl -X POST http://localhost:8080/v1/embeddings \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"model": "text-embedding-3-small", "input": "Test input"}' \
|
|
-o /tmp/embeddings_response.json
|
|
|
|
# Validate response
|
|
if grep -q "embedding" /tmp/embeddings_response.json; then
|
|
echo "Embeddings test passed"
|
|
else
|
|
echo "Embeddings test failed"
|
|
exit 1
|
|
fi
|
|
|
|
# Clean up
|
|
kill $SERVER_PID
|
|
echo "All tests passed!"
|
|
```
|
|
|
|
2. Make the script executable and run it:
|
|
|
|
```bash
|
|
chmod +x e2e_test.sh
|
|
./e2e_test.sh
|
|
```
|
|
|
|
## Performance Testing
|
|
|
|
Performance testing evaluates the system's response time, throughput, and resource usage.
|
|
|
|
### Existing Performance Tests
|
|
|
|
The project includes two performance testing scripts:
|
|
|
|
1. `performance_test_embeddings.sh`: Tests the embeddings engine with various input sizes
|
|
2. `performance_test_inference.sh`: Tests the inference engine with different prompt sizes
|
|
|
|
### Running Performance Tests
|
|
|
|
Ensure the server is running, then execute the performance test scripts:
|
|
|
|
```bash
|
|
# Test embeddings performance
|
|
./performance_test_embeddings.sh
|
|
|
|
# Test inference performance
|
|
./performance_test_inference.sh
|
|
```
|
|
|
|
### Creating New Performance Tests
|
|
|
|
To create new performance tests:
|
|
|
|
1. Use the existing scripts as templates
|
|
2. Modify the test parameters (iterations, input sizes, etc.)
|
|
3. Add specific metrics you want to measure
|
|
|
|
Example of a new performance test focusing on concurrent requests:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# concurrent_performance_test.sh
|
|
|
|
SERVER_URL="http://localhost:8080"
|
|
CONCURRENT_REQUESTS=10
|
|
TEST_INPUT="This is a test input for concurrent performance testing."
|
|
|
|
echo "Testing with $CONCURRENT_REQUESTS concurrent requests..."
|
|
|
|
# Function to send a single request
|
|
send_request() {
|
|
curl -s -X POST \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"model\": \"text-embedding-3-small\", \"input\": \"$TEST_INPUT\"}" \
|
|
"$SERVER_URL/v1/embeddings" > /dev/null
|
|
echo "Request completed"
|
|
}
|
|
|
|
# Start server if not running
|
|
# [server startup code here]
|
|
|
|
# Send concurrent requests
|
|
start_time=$(date +%s.%N)
|
|
|
|
for i in $(seq 1 $CONCURRENT_REQUESTS); do
|
|
send_request &
|
|
done
|
|
|
|
# Wait for all requests to complete
|
|
wait
|
|
|
|
end_time=$(date +%s.%N)
|
|
elapsed=$(echo "$end_time - $start_time" | bc)
|
|
|
|
echo "All $CONCURRENT_REQUESTS requests completed in ${elapsed}s"
|
|
echo "Average time per request: $(echo "$elapsed / $CONCURRENT_REQUESTS" | bc -l)s"
|
|
```
|
|
|
|
## How to Run Existing Tests
|
|
|
|
### Running All Tests
|
|
|
|
To run all tests in the project:
|
|
|
|
```bash
|
|
# From the project root
|
|
cargo test --workspace
|
|
```
|
|
|
|
### Running Specific Tests
|
|
|
|
To run tests for a specific crate:
|
|
|
|
```bash
|
|
cargo test -p inference-engine
|
|
cargo test -p embeddings-engine
|
|
```
|
|
|
|
To run a specific test:
|
|
|
|
```bash
|
|
cargo test -p inference-engine test_token_output_stream
|
|
```
|
|
|
|
### Running Tests with Output
|
|
|
|
To see the output of tests, including `println!` statements:
|
|
|
|
```bash
|
|
cargo test -- --nocapture
|
|
```
|
|
|
|
### Running Performance Tests
|
|
|
|
```bash
|
|
# Make sure server is running
|
|
./run_server.sh &
|
|
|
|
# Run performance tests
|
|
./performance_test_embeddings.sh
|
|
./performance_test_inference.sh
|
|
```
|
|
|
|
## Writing New Tests
|
|
|
|
### Test Organization
|
|
|
|
- **Unit Tests**: Place in the `tests` directory or in a `tests` module within the source file
|
|
- **Integration Tests**: Create in the `tests` directory with a focus on component interactions
|
|
- **End-to-End Tests**: Implement as shell scripts or separate Rust binaries
|
|
- **Performance Tests**: Create shell scripts that measure specific performance metrics
|
|
|
|
### Test Naming Conventions
|
|
|
|
- Use descriptive test names that indicate what is being tested
|
|
- Prefix test functions with `test_`
|
|
- For complex tests, use comments to explain the test purpose
|
|
|
|
### Test Best Practices
|
|
|
|
1. **Arrange-Act-Assert**: Structure tests with clear setup, action, and verification phases
|
|
2. **Independence**: Tests should not depend on each other
|
|
3. **Determinism**: Tests should produce the same result every time
|
|
4. **Focused Scope**: Each test should verify a single behavior
|
|
5. **Error Messages**: Use descriptive assertions that explain the expected vs. actual results
|
|
|
|
Example of a well-structured test:
|
|
|
|
```rust
|
|
#[test]
|
|
fn test_embedding_dimension_matches_specification() {
|
|
// Arrange: Set up the test environment
|
|
let model = create_test_model();
|
|
let input = "Test input";
|
|
|
|
// Act: Generate the embedding
|
|
let embedding = model.embed(input);
|
|
|
|
// Assert: Verify the dimension
|
|
assert_eq!(
|
|
embedding.len(),
|
|
768,
|
|
"Embedding dimension should be 768, but got {}",
|
|
embedding.len()
|
|
);
|
|
}
|
|
```
|
|
|
|
## Test Coverage
|
|
|
|
The project currently has test coverage for:
|
|
|
|
- **Inference Engine**: Basic unit tests for key components
|
|
- **Embeddings Engine**: API endpoint tests
|
|
- **Performance**: Scripts for benchmarking both engines
|
|
|
|
Areas that could benefit from additional testing:
|
|
|
|
1. **Main Server Component**: The `predict-otron-9000` crate has limited test coverage
|
|
2. **Error Handling**: Tests for error conditions and edge cases
|
|
3. **Concurrency**: Testing behavior under concurrent load
|
|
4. **Long-Running Tests**: Stability tests for extended operation
|
|
|
|
To improve test coverage:
|
|
|
|
1. Use `cargo tarpaulin` or similar tools to measure code coverage
|
|
2. Identify uncovered code paths
|
|
3. Add tests for error conditions and edge cases
|
|
4. Implement integration tests for the main server component
|
|
|
|
---
|
|
|
|
By following this testing guide, you can ensure that the Predict-otron-9000 system maintains its reliability, performance, and correctness as it evolves. |