Refactor apply_cached_repeat_penalty for optimized caching and reuse, add extensive unit tests, and integrate special handling for gemma-specific models.

Removed `test_request.sh`, deprecated functionality, and unused imports; introduced a new CLI tool (`cli.ts`) for testing inference engine and adjusted handling of non-streaming/streaming chat completions. - Add CPU fallback support for text generation when primary device is unsupported - Introduce `execute_with_fallback` method to handle device compatibility and shape mismatch errors - Extend unit tests to reproduce tensor shape mismatch errors specific to model configurations - Increase HTTP timeout limits in `curl_chat_stream.sh` script for reliable API testing chat completion endpoint functions with gemma3 (no streaming) Add benchmarking guide with HTML reporting, Leptos chat crate, and middleware for metrics tracking
2025-09-08 22:46:44 +00:00 · 2025-08-26 01:30:26 -04:00
parent 7dd23213c9
commit 8338750beb
64 changed files with 14997 additions and 220 deletions
--- a/docs/TESTING.md
+++ b/docs/TESTING.md
@@ -0,0 +1,392 @@
+# Testing Guide for Predict-otron-9000
+
+This document provides comprehensive guidance on testing the Predict-otron-9000 system, including how to run existing tests and how to write new ones. The testing strategy covers different levels of testing from unit tests to performance evaluation.
+
+## Table of Contents
+
+- [Testing Overview](#testing-overview)
+- [Unit Testing](#unit-testing)
+- [Integration Testing](#integration-testing)
+- [End-to-End Testing](#end-to-end-testing)
+- [Performance Testing](#performance-testing)
+- [How to Run Existing Tests](#how-to-run-existing-tests)
+- [Writing New Tests](#writing-new-tests)
+- [Test Coverage](#test-coverage)
+
+## Testing Overview
+
+Predict-otron-9000 follows a multi-layered testing approach to ensure the reliability and performance of its components:
+
+1. **Unit Tests**: Test individual components in isolation
+2. **Integration Tests**: Test interactions between components
+3. **End-to-End Tests**: Test the complete system from user input to output
+4. **Performance Tests**: Evaluate system performance under various conditions
+
+## Unit Testing
+
+Unit tests focus on testing individual components in isolation. The project uses Rust's built-in testing framework with the `#[test]` attribute.
+
+### Inference Engine
+
+The inference engine has dedicated unit tests in the `tests` directory:
+
+- `text_generation_tests.rs`: Tests for the text generation components
+- `token_output_stream_tests.rs`: Tests for token stream handling
+- `model_tests.rs`: Tests for model-related functionality
+
+These tests focus on individual components like the `Which` enum, `TokenOutputStream`, and `LogitsProcessor`.
+
+### Embeddings Engine
+
+The embeddings engine has unit tests embedded in the main source file:
+
+- Tests for HTTP endpoints (`test_root` and `test_embeddings_create`)
+- Validates response formats and embedding dimensions
+
+### Running Unit Tests
+
+To run unit tests for a specific crate:
+
+```bash
+# Run all tests for a specific crate
+cd crates/inference-engine
+cargo test
+
+# Run a specific test
+cargo test test_token_output_stream
+
+# Run tests with output
+cargo test -- --nocapture
+```
+
+### Writing New Unit Tests
+
+To add new unit tests:
+
+1. For the inference engine, add test functions to the appropriate file in the `tests` directory
+2. For the embeddings engine, add test functions to the `tests` module in `main.rs`
+
+Example of a new unit test for the inference engine:
+
+```rust
+#[test]
+fn test_my_new_feature() {
+    // Arrange: Set up the test data
+    let input = "Test input";
+    
+    // Act: Call the function being tested
+    let result = my_function(input);
+    
+    // Assert: Verify the results
+    assert_eq!(result, expected_output);
+}
+```
+
+## Integration Testing
+
+Integration tests verify that different components work correctly together. 
+
+### Current Integration Tests
+
+- The embeddings engine tests in `main.rs` function as integration tests by testing the HTTP API endpoints
+
+### Writing New Integration Tests
+
+To add new integration tests:
+
+1. Create a new test file in the `tests` directory
+2. Use the Axum testing utilities to simulate HTTP requests
+
+Example of an integration test for the API:
+
+```rust
+#[tokio::test]
+async fn test_chat_completions_endpoint() {
+    // Arrange: Create a test app
+    let app = create_app();
+    
+    // Create a test request
+    let request_body = serde_json::json!({
+        "model": "gemma-3-1b-it",
+        "messages": [{"role": "user", "content": "Hello"}]
+    });
+    
+    // Act: Send the request
+    let response = app
+        .oneshot(
+            axum::http::Request::builder()
+                .method(axum::http::Method::POST)
+                .uri("/v1/chat/completions")
+                .header("content-type", "application/json")
+                .body(Body::from(request_body.to_string()))
+                .unwrap(),
+        )
+        .await
+        .unwrap();
+    
+    // Assert: Verify the response
+    assert_eq!(response.status(), StatusCode::OK);
+    
+    // Verify response format
+    let body = to_bytes(response.into_body(), usize::MAX).await.unwrap();
+    let response_json: serde_json::Value = serde_json::from_slice(&body).unwrap();
+    assert!(response_json.get("choices").is_some());
+}
+```
+
+## End-to-End Testing
+
+End-to-end tests validate the entire system from client request to server response.
+
+### Manual End-to-End Testing
+
+1. Start the server:
+```bash
+./run_server.sh
+```
+
+2. Use curl or other HTTP clients to test the endpoints:
+
+```bash
+# Test embeddings endpoint
+curl -X POST http://localhost:8080/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -d '{"model": "text-embedding-3-small", "input": "Hello, world!"}'
+
+# Test chat completions endpoint
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{"model": "gemma-3-1b-it", "messages": [{"role": "user", "content": "Hello"}]}'
+```
+
+### Automated End-to-End Testing
+
+You can create automated end-to-end tests using shell scripts:
+
+1. Create a new script in the project root:
+
+```bash
+#!/bin/bash
+# e2e_test.sh
+
+# Start the server in the background
+./run_server.sh &
+SERVER_PID=$!
+
+# Wait for server to start
+sleep 5
+
+# Run tests
+echo "Testing embeddings endpoint..."
+curl -X POST http://localhost:8080/v1/embeddings \
+  -H "Content-Type: application/json" \
+  -d '{"model": "text-embedding-3-small", "input": "Test input"}' \
+  -o /tmp/embeddings_response.json
+
+# Validate response
+if grep -q "embedding" /tmp/embeddings_response.json; then
+  echo "Embeddings test passed"
+else
+  echo "Embeddings test failed"
+  exit 1
+fi
+
+# Clean up
+kill $SERVER_PID
+echo "All tests passed!"
+```
+
+2. Make the script executable and run it:
+
+```bash
+chmod +x e2e_test.sh
+./e2e_test.sh
+```
+
+## Performance Testing
+
+Performance testing evaluates the system's response time, throughput, and resource usage.
+
+### Existing Performance Tests
+
+The project includes two performance testing scripts:
+
+1. `performance_test_embeddings.sh`: Tests the embeddings engine with various input sizes
+2. `performance_test_inference.sh`: Tests the inference engine with different prompt sizes
+
+### Running Performance Tests
+
+Ensure the server is running, then execute the performance test scripts:
+
+```bash
+# Test embeddings performance
+./performance_test_embeddings.sh
+
+# Test inference performance
+./performance_test_inference.sh
+```
+
+### Creating New Performance Tests
+
+To create new performance tests:
+
+1. Use the existing scripts as templates
+2. Modify the test parameters (iterations, input sizes, etc.)
+3. Add specific metrics you want to measure
+
+Example of a new performance test focusing on concurrent requests:
+
+```bash
+#!/bin/bash
+# concurrent_performance_test.sh
+
+SERVER_URL="http://localhost:8080"
+CONCURRENT_REQUESTS=10
+TEST_INPUT="This is a test input for concurrent performance testing."
+
+echo "Testing with $CONCURRENT_REQUESTS concurrent requests..."
+
+# Function to send a single request
+send_request() {
+    curl -s -X POST \
+        -H "Content-Type: application/json" \
+        -d "{\"model\": \"text-embedding-3-small\", \"input\": \"$TEST_INPUT\"}" \
+        "$SERVER_URL/v1/embeddings" > /dev/null
+    echo "Request completed"
+}
+
+# Start server if not running
+# [server startup code here]
+
+# Send concurrent requests
+start_time=$(date +%s.%N)
+
+for i in $(seq 1 $CONCURRENT_REQUESTS); do
+    send_request &
+done
+
+# Wait for all requests to complete
+wait
+
+end_time=$(date +%s.%N)
+elapsed=$(echo "$end_time - $start_time" | bc)
+
+echo "All $CONCURRENT_REQUESTS requests completed in ${elapsed}s"
+echo "Average time per request: $(echo "$elapsed / $CONCURRENT_REQUESTS" | bc -l)s"
+```
+
+## How to Run Existing Tests
+
+### Running All Tests
+
+To run all tests in the project:
+
+```bash
+# From the project root
+cargo test --workspace
+```
+
+### Running Specific Tests
+
+To run tests for a specific crate:
+
+```bash
+cargo test -p inference-engine
+cargo test -p embeddings-engine
+```
+
+To run a specific test:
+
+```bash
+cargo test -p inference-engine test_token_output_stream
+```
+
+### Running Tests with Output
+
+To see the output of tests, including `println!` statements:
+
+```bash
+cargo test -- --nocapture
+```
+
+### Running Performance Tests
+
+```bash
+# Make sure server is running
+./run_server.sh &
+
+# Run performance tests
+./performance_test_embeddings.sh
+./performance_test_inference.sh
+```
+
+## Writing New Tests
+
+### Test Organization
+
+- **Unit Tests**: Place in the `tests` directory or in a `tests` module within the source file
+- **Integration Tests**: Create in the `tests` directory with a focus on component interactions
+- **End-to-End Tests**: Implement as shell scripts or separate Rust binaries
+- **Performance Tests**: Create shell scripts that measure specific performance metrics
+
+### Test Naming Conventions
+
+- Use descriptive test names that indicate what is being tested
+- Prefix test functions with `test_`
+- For complex tests, use comments to explain the test purpose
+
+### Test Best Practices
+
+1. **Arrange-Act-Assert**: Structure tests with clear setup, action, and verification phases
+2. **Independence**: Tests should not depend on each other
+3. **Determinism**: Tests should produce the same result every time
+4. **Focused Scope**: Each test should verify a single behavior
+5. **Error Messages**: Use descriptive assertions that explain the expected vs. actual results
+
+Example of a well-structured test:
+
+```rust
+#[test]
+fn test_embedding_dimension_matches_specification() {
+    // Arrange: Set up the test environment
+    let model = create_test_model();
+    let input = "Test input";
+    
+    // Act: Generate the embedding
+    let embedding = model.embed(input);
+    
+    // Assert: Verify the dimension
+    assert_eq!(
+        embedding.len(), 
+        768, 
+        "Embedding dimension should be 768, but got {}", 
+        embedding.len()
+    );
+}
+```
+
+## Test Coverage
+
+The project currently has test coverage for:
+
+- **Inference Engine**: Basic unit tests for key components
+- **Embeddings Engine**: API endpoint tests
+- **Performance**: Scripts for benchmarking both engines
+
+Areas that could benefit from additional testing:
+
+1. **Main Server Component**: The `predict-otron-9000` crate has limited test coverage
+2. **Error Handling**: Tests for error conditions and edge cases
+3. **Concurrency**: Testing behavior under concurrent load
+4. **Long-Running Tests**: Stability tests for extended operation
+
+To improve test coverage:
+
+1. Use `cargo tarpaulin` or similar tools to measure code coverage
+2. Identify uncovered code paths
+3. Add tests for error conditions and edge cases
+4. Implement integration tests for the main server component
+
+---
+
+By following this testing guide, you can ensure that the Predict-otron-9000 system maintains its reliability, performance, and correctness as it evolves.