Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Testing Strategy

Relevant source files

This page documents the testing approach for the rust-sec-fetcher codebase, covering both Rust unit tests and Python integration tests. The testing strategy emphasizes isolated unit testing with mocking for Rust components and end-to-end integration testing for the Python ML pipeline. For information about the CI/CD automation of these tests, see CI/CD Pipeline.


Test Architecture Overview

The codebase employs a dual-layer testing strategy that mirrors its dual-language architecture. Rust components use unit tests with HTTP mocking, while Python components use containerized integration tests with real database instances.

graph TB
    subgraph "Rust Unit Tests"
        SecClientTest["sec_client_tests.rs\nTest SecClient HTTP layer"]
DistillTest["distill_us_gaap_fundamental_concepts_tests.rs\nTest concept transformation"]
ConfigTest["config_manager_tests.rs\nTest configuration loading"]
end
    
    subgraph "Test Infrastructure"
        MockitoServer["mockito::Server\nHTTP mock server"]
TempDir["tempfile::TempDir\nTemporary config files"]
end
    
    subgraph "Python Integration Tests"
        IntegrationScript["us_gaap_store_integration_test.sh\nTest orchestration"]
PytestRunner["pytest\nTest execution"]
end
    
    subgraph "Docker Test Environment"
        MySQLContainer["us_gaap_test_db\nMySQL 8.0 container"]
SimdRDriveContainer["simd-r-drive-ws-server\nWebSocket server"]
SQLSchema["us_gaap_schema_2025.sql\nDatabase schema fixture"]
end
    
 
   SecClientTest --> MockitoServer
 
   ConfigTest --> TempDir
    
 
   IntegrationScript --> MySQLContainer
 
   IntegrationScript --> SimdRDriveContainer
 
   IntegrationScript --> SQLSchema
 
   IntegrationScript --> PytestRunner
    
 
   PytestRunner --> MySQLContainer
 
   PytestRunner --> SimdRDriveContainer

Test Component Relationships

Sources: tests/sec_client_tests.rs:1-159 tests/distill_us_gaap_fundamental_concepts_tests.rs:1-1275 tests/config_manager_tests.rs:1-95 python/narrative_stack/us_gaap_store_integration_test.sh:1-39


Rust Unit Testing Strategy

The Rust test suite focuses on three critical areas: HTTP client behavior, data transformation correctness, and configuration management. Tests are isolated using mocking frameworks and temporary file systems.

SecClient HTTP Testing

The SecClient tests verify HTTP operations, retry logic, and authentication header generation using the mockito library for HTTP mocking.

Test FunctionPurposeMock Behavior
test_user_agentValidates User-Agent header formatN/A (direct method call)
test_invalid_email_panicEnsures invalid emails cause panicN/A (panic verification)
test_fetch_json_without_retry_successTests successful JSON fetchReturns 200 with valid JSON
test_fetch_json_with_retry_successTests successful fetch with retry availableReturns 200 immediately
test_fetch_json_with_retry_failureVerifies retry exhaustionReturns 500 three times
test_fetch_json_with_retry_backoffTests retry with eventual successReturns 500 once, then 200

User Agent Validation Test Pattern

Sources: tests/sec_client_tests.rs:6-21

The test creates a minimal AppConfig with only the email field set tests/sec_client_tests.rs:8-10 constructs a ConfigManager and SecClient tests/sec_client_tests.rs:10-12 then verifies the User-Agent format matches the expected pattern including the package version from CARGO_PKG_VERSION tests/sec_client_tests.rs:14-20

HTTP Retry and Backoff Testing

The retry logic tests use mockito::Server to simulate various HTTP response scenarios:

Sources: tests/sec_client_tests.rs:64-158

graph TB
    SetupMock["Setup mockito::Server"]
ConfigureResponse["Configure mock response\nStatus, Body, Expect count"]
CreateClient["Create SecClient\nwith max_retries config"]
FetchJSON["Call client.fetch_json()"]
VerifyResult["Verify result:\nSuccess or Error"]
VerifyCallCount["Verify mock was called\nexpected number of times"]
SetupMock --> ConfigureResponse
 
   ConfigureResponse --> CreateClient
 
   CreateClient --> FetchJSON
 
   FetchJSON --> VerifyResult
 
   FetchJSON --> VerifyCallCount

The retry failure test configures a mock to return HTTP 500 status and expects exactly 3 calls (initial + 2 retries) tests/sec_client_tests.rs:96-103 setting max_retries = 2 in the config tests/sec_client_tests.rs106 The test verifies the result is an error after exhausting retries tests/sec_client_tests.rs:111-119

The backoff test simulates transient failures by creating two mock endpoints: one that returns 500 once, and another that returns 200 tests/sec_client_tests.rs:126-140 This validates that the retry mechanism successfully recovers from temporary failures.

Sources: tests/sec_client_tests.rs:94-158

US GAAP Concept Transformation Testing

The distill_us_gaap_fundamental_concepts_tests.rs file contains 1,275 lines of comprehensive tests covering all 64 FundamentalConcept variants. These tests verify the complex mapping logic that normalizes diverse US GAAP terminology into a standardized taxonomy.

Test Coverage by Concept Category

CategoryTest FunctionsConcept Variants TestedLines
Assets & Liabilities695-148
Income Statement152520-464
Cash Flow1215550-683
Equity713150-286
Revenue Variations2 (with 57+ assertions)45+954-1188

Hierarchical Mapping Test Pattern

The tests verify that specific concepts map to both their precise category and parent categories:

Sources: tests/distill_us_gaap_fundamental_concepts_tests.rs:131-139

graph TB
    InputConcept["Input: 'AssetsCurrent'"]
CallDistill["Call distill_us_gaap_fundamental_concepts()"]
ExpectVector["Expect Vec with 2 elements"]
CheckSpecific["Assert contains:\nFundamentalConcept::CurrentAssets"]
CheckParent["Assert contains:\nFundamentalConcept::Assets"]
InputConcept --> CallDistill
 
   CallDistill --> ExpectVector
 
   ExpectVector --> CheckSpecific
 
   ExpectVector --> CheckParent

The test for AssetsCurrent verifies it maps to both CurrentAssets (specific) and Assets (parent category) tests/distill_us_gaap_fundamental_concepts_tests.rs:132-138 This hierarchical pattern appears throughout the test suite for concepts that have parent-child relationships.

Synonym Consolidation Testing

Multiple test functions verify that synonym variations of the same concept map to a single canonical form:

Sources: tests/distill_us_gaap_fundamental_concepts_tests.rs:80-112

graph LR
    CostOfRevenue["CostOfRevenue"]
CostOfGoodsAndServicesSold["CostOfGoodsAndServicesSold"]
CostOfServices["CostOfServices"]
CostOfGoodsSold["CostOfGoodsSold"]
CostOfGoodsSoldExcluding["CostOfGoodsSold...\nExcludingDepreciation"]
CostOfGoodsSoldElectric["CostOfGoodsSoldElectric"]
Canonical["FundamentalConcept::\nCostOfRevenue"]
CostOfRevenue --> Canonical
 
   CostOfGoodsAndServicesSold --> Canonical
 
   CostOfServices --> Canonical
 
   CostOfGoodsSold --> Canonical
 
   CostOfGoodsSoldExcluding --> Canonical
 
   CostOfGoodsSoldElectric --> Canonical

The test_cost_of_revenue function contains 6 assertions verifying different US GAAP terminology variants all map to FundamentalConcept::CostOfRevenue tests/distill_us_gaap_fundamental_concepts_tests.rs:81-111 This pattern ensures consistent normalization across different reporting styles.

Industry-Specific Revenue Testing

The revenue tests demonstrate the most complex mapping scenario, where 57+ industry-specific revenue concepts all normalize to FundamentalConcept::Revenues:

Sources: tests/distill_us_gaap_fundamental_concepts_tests.rs:954-1188

graph TB
    subgraph "Standard Revenue Terms"
        Revenues["Revenues"]
SalesRevenueNet["SalesRevenueNet"]
SalesRevenueServicesNet["SalesRevenueServicesNet"]
end
    
    subgraph "Industry-Specific Terms"
        HealthCare["HealthCareOrganizationRevenue"]
RealEstate["RealEstateRevenueNet"]
OilGas["OilAndGasRevenue"]
Financial["FinancialServicesRevenue"]
Advertising["AdvertisingRevenue"]
Subscription["SubscriptionRevenue"]
Mining["RevenueMineralSales"]
end
    
    subgraph "Specialized Terms"
        Franchisor["FranchisorRevenue"]
Admissions["AdmissionsRevenue"]
Licenses["LicensesRevenue"]
Royalty["RoyaltyRevenue"]
Clearing["ClearingFeesRevenue"]
Passenger["PassengerRevenue"]
end
    
    Canonical["FundamentalConcept::Revenues"]
Revenues --> Canonical
 
   SalesRevenueNet --> Canonical
 
   SalesRevenueServicesNet --> Canonical
 
   HealthCare --> Canonical
 
   RealEstate --> Canonical
 
   OilGas --> Canonical
 
   Financial --> Canonical
 
   Advertising --> Canonical
 
   Subscription --> Canonical
 
   Mining --> Canonical
 
   Franchisor --> Canonical
 
   Admissions --> Canonical
 
   Licenses --> Canonical
 
   Royalty --> Canonical
 
   Clearing --> Canonical
 
   Passenger --> Canonical

The test_revenues function contains 47 distinct assertions, each verifying a different industry-specific revenue term maps to the canonical FundamentalConcept::Revenues tests/distill_us_gaap_fundamental_concepts_tests.rs:955-1188 Some terms also map to more specific sub-categories, such as InterestAndDividendIncomeOperating mapping to both InterestAndDividendIncomeOperating and Revenues tests/distill_us_gaap_fundamental_concepts_tests.rs:989-994

graph TB
    CreateTempDir["tempfile::tempdir()"]
CreatePath["dir.path().join('config.toml')"]
WriteContents["Write TOML contents\nto file"]
LoadConfig["ConfigManager::from_config(path)"]
AssertValues["Assert config values\nmatch expectations"]
DropTempDir["Drop TempDir\n(automatic cleanup)"]
CreateTempDir --> CreatePath
 
   CreatePath --> WriteContents
 
   WriteContents --> LoadConfig
 
   LoadConfig --> AssertValues
 
   AssertValues --> DropTempDir

Configuration Manager Testing

The config_manager_tests.rs file tests configuration loading, validation, and error handling using temporary files created with the tempfile crate.

Temporary Config File Test Pattern

Sources: tests/config_manager_tests.rs:8-57

The helper function create_temp_config creates a temporary directory and writes config contents to a config.toml file tests/config_manager_tests.rs:9-17 The function returns both the TempDir (to prevent premature cleanup) and the PathBuf tests/config_manager_tests.rs16 Tests explicitly drop the TempDir at the end to ensure cleanup tests/config_manager_tests.rs56

Invalid Configuration Key Detection

The test suite verifies that unknown configuration keys are rejected with helpful error messages:

TestConfig ContentExpected Behavior
test_load_custom_configValid keys onlySuccess, values loaded
test_load_non_existent_configN/A (file doesn't exist)Error returned
test_fails_on_invalid_keyContains invalid_keyError with valid keys listed

Sources: tests/config_manager_tests.rs:68-94

The invalid key test verifies that the error message contains documentation of valid configuration keys, including their types tests/config_manager_tests.rs:88-93:

  • email (String | Null)
  • max_concurrent (Integer | Null)
  • max_retries (Integer | Null)
  • min_delay_ms (Integer | Null)

Python Integration Testing Strategy

The Python testing strategy uses Docker Compose to orchestrate a complete test environment with MySQL database and WebSocket services, enabling end-to-end validation of the data pipeline.

graph TB
    ScriptStart["us_gaap_store_integration_test.sh"]
SetupEnv["Set environment variables\nPROJECT_NAME=us_gaap_it\nCOMPOSE=docker compose -p ..."]
ActivateVenv["source .venv/bin/activate"]
InstallDeps["uv pip install -e . --group dev"]
RegisterTrap["trap 'cleanup' EXIT"]
StartContainers["docker compose up -d\n--profile test"]
WaitMySQL["Wait for MySQL ping\nmysqladmin ping loop"]
CreateDB["CREATE DATABASE us_gaap_test"]
LoadSchema["mysql < us_gaap_schema_2025.sql"]
RunPytest["pytest -s -v\ntest_us_gaap_store.py"]
Cleanup["Cleanup trap:\ndocker compose down\n--volumes --remove-orphans"]
ScriptStart --> SetupEnv
 
   SetupEnv --> ActivateVenv
 
   ActivateVenv --> InstallDeps
 
   InstallDeps --> RegisterTrap
 
   RegisterTrap --> StartContainers
 
   StartContainers --> WaitMySQL
 
   WaitMySQL --> CreateDB
 
   CreateDB --> LoadSchema
 
   LoadSchema --> RunPytest
 
   RunPytest --> Cleanup

Integration Test Orchestration Flow

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Docker Compose Test Profile

The script uses an isolated Docker Compose project name to prevent conflicts with development containers python/narrative_stack/us_gaap_store_integration_test.sh:8-9:

  • PROJECT_NAME="us_gaap_it"
  • COMPOSE="docker compose -p $PROJECT_NAME --profile test"

The --profile test flag ensures only test-specific services are started python/narrative_stack/us_gaap_store_integration_test.sh23 which include:

  • db_test (MySQL container named us_gaap_test_db)
  • simd_r_drive_ws_server_test (WebSocket server)

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:8-23

MySQL Test Database Setup

The integration test script performs a multi-step database initialization:

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:25-35

The script waits for MySQL to be ready using a ping loop python/narrative_stack/us_gaap_store_integration_test.sh:25-28:

Then creates the test database python/narrative_stack/us_gaap_store_integration_test.sh:30-31 and loads the schema from a fixture file python/narrative_stack/us_gaap_store_integration_test.sh:33-35

Test Cleanup and Isolation

The script registers a cleanup trap to ensure test containers are always removed python/narrative_stack/us_gaap_store_integration_test.sh:14-19:

This trap executes on both successful completion and script failure, ensuring:

  • All containers in the us_gaap_it project are stopped and removed
  • All volumes are deleted (--volumes)
  • Orphaned containers from previous runs are cleaned up (--remove-orphans)

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:14-19

Pytest Execution Environment

The test environment is configured with specific paths and options python/narrative_stack/us_gaap_store_integration_test.sh:37-38:

ConfigurationValuePurpose
PYTHONPATHsrcEnsure module imports work
pytest flags-s -vShow output, verbose mode
Test pathtests/integration/test_us_gaap_store.pyIntegration test module

The -s flag disables output capturing, allowing print statements and logs to display immediately during test execution, which is useful for debugging long-running integration tests. The -v flag enables verbose mode with detailed test function names.

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:37-38


Test Fixtures and Mocking Patterns

The codebase uses different mocking strategies appropriate to each language and component.

graph TB
    CreateServer["mockito::Server::new_async()"]
ConfigureMock["server.mock(method, path)\n.with_status(code)\n.with_body(json)\n.expect(count)"]
CreateAsyncMock["create_async().await"]
GetServerUrl["server.url()"]
MakeRequest["client.fetch_json(url)"]
VerifyMock["Mock automatically verifies\nexpected call count"]
CreateServer --> ConfigureMock
 
   ConfigureMock --> CreateAsyncMock
 
   CreateAsyncMock --> GetServerUrl
 
   GetServerUrl --> MakeRequest
 
   MakeRequest --> VerifyMock

Rust Mocking with mockito

The mockito library provides HTTP server mocking for testing the SecClient:

Sources: tests/sec_client_tests.rs:36-62

Mock configuration example from test_fetch_json_without_retry_success tests/sec_client_tests.rs:39-45:

  • Method: GET
  • Path: /files/company_tickers.json
  • Status: 200
  • Header: Content-Type: application/json
  • Body: JSON string with sample ticker data
  • Call using server.url() to get the mock endpoint
graph LR
    CreateTempDir["tempfile::tempdir()"]
GetPath["dir.path().join('config.toml')"]
CreateFile["fs::File::create(path)"]
WriteContents["writeln!(file, contents)"]
ReturnBoth["Return (TempDir, PathBuf)"]
CreateTempDir --> GetPath
 
   GetPath --> CreateFile
 
   CreateFile --> WriteContents
 
   WriteContents --> ReturnBoth

Temporary File Fixtures

Configuration tests use the tempfile crate to create isolated test environments:

Sources: tests/config_manager_tests.rs:8-17

The create_temp_config helper returns both the TempDir and PathBuf to prevent premature cleanup tests/config_manager_tests.rs16 The TempDir must remain in scope until after the test completes, as dropping it deletes the temporary directory.

SQL Schema Fixtures

The Python integration tests use a versioned SQL schema file as a fixture:

FilePurposeUsage
tests/integration/assets/us_gaap_schema_2025.sqlDefine us_gaap_test database structureLoaded via docker exec python/narrative_stack/us_gaap_store_integration_test.sh:33-35

This approach ensures tests run against a known database structure that matches production schema, enabling reliable testing of database operations and query logic.

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:33-35


Test Execution Commands

The following table summarizes how to run different test suites:

Test SuiteCommandWorking DirectoryPrerequisites
All Rust unit testscargo testRepository rootRust toolchain
Specific Rust test filecargo test --test sec_client_testsRepository rootRust toolchain
Single Rust test functioncargo test test_user_agentRepository rootRust toolchain
Python integration tests./us_gaap_store_integration_test.shpython/narrative_stack/Docker, Git LFS, uv
Python integration with pytestpytest -s -v tests/integration/python/narrative_stack/Docker, containers running

Prerequisites for Integration Tests

The Python integration test script requires python/narrative_stack/us_gaap_store_integration_test.sh:1-2:

  • Git LFS enabled and updated (for large test fixtures)
  • Docker with Docker Compose v2
  • Python virtual environment with uv package manager
  • Test data files committed with Git LFS

Sources: tests/sec_client_tests.rs:1-159 tests/distill_us_gaap_fundamental_concepts_tests.rs:1-1275 tests/config_manager_tests.rs:1-95 python/narrative_stack/us_gaap_store_integration_test.sh:1-39