Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Development Guide

Relevant source files

Purpose and Scope

This guide provides an overview of development practices, code organization, and workflows for contributing to the rust-sec-fetcher project. It covers environment setup, code organization principles, development workflows, and common development tasks.

For detailed information about specific development topics, see:

Development Environment Setup

Prerequisites

The project requires the following tools installed:

ToolPurposeVersion Requirement
RustCore application development1.87+
PythonML pipeline and preprocessing3.8+
DockerIntegration testing and servicesLatest stable
Git LFSLarge file support for test assetsLatest stable
MySQLDatabase for US GAAP storage5.7+ or 8.0+

Rust Development Setup

  1. Clone the repository and navigate to the root directory

  2. Build the Rust application:

  3. Run tests to verify setup:

The Rust workspace is configured in Cargo.toml with all necessary dependencies declared. Key development dependencies include:

  • mockito for HTTP mocking in tests
  • tempfile for temporary file/directory creation in tests
  • tokio test macros for async test support

Python Development Setup

  1. Create a virtual environment:

  2. Install dependencies using uv:

  3. Verify installation by running integration tests (requires Docker):

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Configuration Setup

The application requires a configuration file at ~/.config/sec-fetcher/config.toml or a custom path specified via command-line argument. Minimum configuration:

For non-interactive testing, use AppConfig directly in test code as shown in tests/config_manager_tests.rs:36-57

Sources: tests/config_manager_tests.rs:36-57 tests/sec_client_tests.rs:8-20

Code Organization and Architecture

Repository Structure

Sources: src/network/sec_client.rs:1-181 tests/config_manager_tests.rs:1-95 tests/sec_client_tests.rs:1-159 python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Module Dependency Flow

The dependency flow follows a layered architecture:

  1. Configuration Layer : ConfigManager loads settings from TOML files and credentials from keyring
  2. Network Layer : SecClient wraps HTTP client with caching and throttling middleware
  3. Data Fetching Layer : Network module functions fetch raw data from SEC APIs
  4. Transformation Layer : Transformers normalize raw data into standardized concepts
  5. Model Layer : Data structures represent domain entities

Sources: src/network/sec_client.rs:1-181 tests/config_manager_tests.rs:1-95

Development Workflow

Standard Development Cycle

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Running Tests Locally

Rust Unit Tests

Run all Rust tests with cargo:

Run specific test modules:

Run with output visibility:

Test Structure Mapping:

Test FileTests ComponentKey Test Functions
tests/config_manager_tests.rsConfigManagertest_load_custom_config, test_load_non_existent_config, test_fails_on_invalid_key
tests/sec_client_tests.rsSecClienttest_user_agent, test_fetch_json_without_retry_success, test_fetch_json_with_retry_failure

Sources: tests/config_manager_tests.rs:1-95 tests/sec_client_tests.rs:1-159

Python Integration Tests

Integration tests require Docker services. Run via the provided shell script:

This script performs the following steps as defined in python/narrative_stack/us_gaap_store_integration_test.sh:1-39:

  1. Activates Python virtual environment
  2. Installs dependencies with uv pip install -e . --group dev
  3. Starts Docker Compose services (db_test, simd_r_drive_ws_server_test)
  4. Waits for MySQL availability
  5. Creates us_gaap_test database
  6. Loads schema from tests/integration/assets/us_gaap_schema_2025.sql
  7. Runs pytest integration tests
  8. Tears down containers on exit

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Writing Tests

Unit Test Pattern (Rust)

The codebase follows standard Rust testing patterns with mockito for HTTP mocking:

Key patterns demonstrated in tests/sec_client_tests.rs:35-62:

  • Use #[tokio::test] for async tests
  • Create mockito::Server for HTTP endpoint mocking
  • Construct AppConfig programmatically for test isolation
  • Use ConfigManager::from_app_config() to bypass file system dependencies
  • Assert on specific JSON fields in responses

Sources: tests/sec_client_tests.rs:35-62

Test Fixture Pattern

The codebase uses temporary directories for file-based tests:

This pattern ensures test isolation and automatic cleanup as shown in tests/config_manager_tests.rs:8-17

Sources: tests/config_manager_tests.rs:8-17

Error Case Testing

Test error conditions explicitly:

This test from tests/sec_client_tests.rs:93-120 verifies retry behavior by expecting exactly 3 HTTP requests (initial + 2 retries) before failing.

Sources: tests/sec_client_tests.rs:93-120

Common Development Tasks

Adding a New SEC Data Endpoint

To add support for fetching a new SEC data endpoint:

  1. Add URL enum variant in src/models/url.rs
  2. Create fetch function in src/network/ following the pattern of existing functions
  3. Define data models in src/models/ for the response structure
  4. Add transformation logic in src/transformers/ if normalization is needed
  5. Write unit tests in tests/ using mockito::Server for mocking
  6. Update main.rs to integrate the new endpoint into the processing pipeline

Example function signature pattern:

Adding a New FundamentalConcept Mapping

The distill_us_gaap_fundamental_concepts function maps raw SEC concept names to the FundamentalConcept enum. To add a new concept:

  1. Add enum variant to FundamentalConcept in src/models/fundamental_concept.rs
  2. Update the match arms in src/transformers/distill_us_gaap_fundamental_concepts.rs
  3. Add test case to verify the mapping in tests/distill_tests.rs

See the existing mapping patterns in the transformer module for hierarchical mappings (concepts that map to multiple parent categories).

Modifying HTTP Client Behavior

The SecClient is configured in src/network/sec_client.rs:21-89 Key configuration points:

ConfigurationLocationPurpose
CachePolicysrc/network/sec_client.rs:45-50Controls cache TTL and behavior
ThrottlePolicysrc/network/sec_client.rs:53-59Controls rate limiting and retries
User-Agentsrc/network/sec_client.rs:91-108Constructs SEC-compliant User-Agent header

To modify throttling behavior, adjust the ThrottlePolicy parameters:

  • base_delay_ms: Minimum delay between requests
  • max_concurrent: Maximum concurrent requests
  • max_retries: Number of retry attempts on failure
  • adaptive_jitter_ms: Random jitter to prevent thundering herd

Sources: src/network/sec_client.rs:21-89

Working with Caches

The system uses two cache types managed by the Caches module:

  1. HTTP Cache : Stores raw HTTP responses with configurable TTL (default: 1 week)
  2. Preprocessor Cache : Stores transformed/preprocessed data

Cache instances are accessed via Caches::get_http_cache_store() as shown in src/network/sec_client.rs73

During development, you may need to clear caches when testing data transformations. Cache data is persisted via the simd-r-drive backend.

Sources: src/network/sec_client.rs73

Code Quality Standards

TODO Comments and Technical Debt

The codebase uses TODO comments to mark areas for improvement. Examples from src/network/sec_client.rs:

When adding TODO comments:

  1. Be specific about what needs to be done
  2. Include context about why it's not done now
  3. Reference related issues if applicable

Panic vs Result

The codebase follows Rust best practices:

  • Use Result<T, E> for recoverable errors
  • Use panic! only for non-recoverable errors or programming errors

Example from src/network/sec_client.rs:95-98:

This panics because an invalid email makes all SEC API calls fail, representing a configuration error rather than a runtime error.

Sources: src/network/sec_client.rs:95-98

Error Validation in Tests

Configuration validation is tested by verifying error messages contain expected content, as shown in tests/config_manager_tests.rs:68-94:

This pattern ensures configuration errors are informative to users.

Sources: tests/config_manager_tests.rs:68-94

Integration Test Architecture

The integration test script from python/narrative_stack/us_gaap_store_integration_test.sh:1-39 orchestrates:

  1. Python environment setup with dependencies
  2. Docker Compose service startup (isolated project name: us_gaap_it)
  3. MySQL container health check via mysqladmin ping
  4. Database creation and schema loading
  5. pytest execution with verbose output
  6. Automatic cleanup via EXIT trap

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Best Practices Summary

PracticeImplementationReference
Test isolationUse temporary directories and AppConfig::default()tests/config_manager_tests.rs:9-17
HTTP mockingUse mockito::Server for endpoint simulationtests/sec_client_tests.rs:37-45
Async testingUse #[tokio::test] attributetests/sec_client_tests.rs35
Error handlingPrefer Result<T, E> over panicsrc/network/sec_client.rs:140-165
ConfigurationUse ConfigManager::from_app_config() in teststests/sec_client_tests.rs10
Integration testingUse Docker Compose with isolated project namespython/narrative_stack/us_gaap_store_integration_test.sh8
CleanupUse trap handlers for guaranteed cleanuppython/narrative_stack/us_gaap_store_integration_test.sh:14-19

Sources: tests/config_manager_tests.rs:9-17 tests/sec_client_tests.rs:35-62 src/network/sec_client.rs:140-165 python/narrative_stack/us_gaap_store_integration_test.sh:1-39