This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Development Guide

Relevant source files

Purpose and Scope

This guide provides an overview of development practices, code organization, and workflows for contributing to the rust-sec-fetcher project. It covers environment setup, code organization principles, development workflows, and common development tasks.

For detailed information about specific development topics, see:

Testing strategies and test fixtures: Testing Strategy
Continuous integration and automated testing: CI/CD Pipeline
Docker container configuration: Docker Deployment

Development Environment Setup

Prerequisites

The project requires the following tools installed:

Tool	Purpose	Version Requirement
Rust	Core application development	1.87+
Python	ML pipeline and preprocessing	3.8+
Docker	Integration testing and services	Latest stable
Git LFS	Large file support for test assets	Latest stable
MySQL	Database for US GAAP storage	5.7+ or 8.0+

Rust Development Setup

Clone the repository and navigate to the root directory
Build the Rust application:
Run tests to verify setup:

The Rust workspace is configured in Cargo.toml with all necessary dependencies declared. Key development dependencies include:

mockito for HTTP mocking in tests
tempfile for temporary file/directory creation in tests
tokio test macros for async test support

Python Development Setup

Create a virtual environment:
Install dependencies using uv:
Verify installation by running integration tests (requires Docker):

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Configuration Setup

The application requires a configuration file at ~/.config/sec-fetcher/config.toml or a custom path specified via command-line argument. Minimum configuration:

For non-interactive testing, use AppConfig directly in test code as shown in tests/config_manager_tests.rs:36-57

Sources: tests/config_manager_tests.rs:36-57 tests/sec_client_tests.rs:8-20

Code Organization and Architecture

Repository Structure

Sources: src/network/sec_client.rs:1-181 tests/config_manager_tests.rs:1-95 tests/sec_client_tests.rs:1-159 python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Module Dependency Flow

The dependency flow follows a layered architecture:

Configuration Layer : ConfigManager loads settings from TOML files and credentials from keyring
Network Layer : SecClient wraps HTTP client with caching and throttling middleware
Data Fetching Layer : Network module functions fetch raw data from SEC APIs
Transformation Layer : Transformers normalize raw data into standardized concepts
Model Layer : Data structures represent domain entities

Sources: src/network/sec_client.rs:1-181 tests/config_manager_tests.rs:1-95

Development Workflow

Standard Development Cycle

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Running Tests Locally

Rust Unit Tests

Run all Rust tests with cargo:

Run specific test modules:

Run with output visibility:

Test Structure Mapping:

Test File	Tests Component	Key Test Functions
`tests/config_manager_tests.rs`	`ConfigManager`	`test_load_custom_config`, `test_load_non_existent_config`, `test_fails_on_invalid_key`
`tests/sec_client_tests.rs`	`SecClient`	`test_user_agent`, `test_fetch_json_without_retry_success`, `test_fetch_json_with_retry_failure`

Sources: tests/config_manager_tests.rs:1-95 tests/sec_client_tests.rs:1-159

Python Integration Tests

Integration tests require Docker services. Run via the provided shell script:

This script performs the following steps as defined in python/narrative_stack/us_gaap_store_integration_test.sh:1-39:

Activates Python virtual environment
Installs dependencies with uv pip install -e . --group dev
Starts Docker Compose services (db_test, simd_r_drive_ws_server_test)
Waits for MySQL availability
Creates us_gaap_test database
Loads schema from tests/integration/assets/us_gaap_schema_2025.sql
Runs pytest integration tests
Tears down containers on exit

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Writing Tests

Unit Test Pattern (Rust)

The codebase follows standard Rust testing patterns with mockito for HTTP mocking:

Key patterns demonstrated in tests/sec_client_tests.rs:35-62:

Use #[tokio::test] for async tests
Create mockito::Server for HTTP endpoint mocking
Construct AppConfig programmatically for test isolation
Use ConfigManager::from_app_config() to bypass file system dependencies
Assert on specific JSON fields in responses

Sources: tests/sec_client_tests.rs:35-62

Test Fixture Pattern

The codebase uses temporary directories for file-based tests:

This pattern ensures test isolation and automatic cleanup as shown in tests/config_manager_tests.rs:8-17

Sources: tests/config_manager_tests.rs:8-17

Error Case Testing

Test error conditions explicitly:

This test from tests/sec_client_tests.rs:93-120 verifies retry behavior by expecting exactly 3 HTTP requests (initial + 2 retries) before failing.

Sources: tests/sec_client_tests.rs:93-120

Common Development Tasks

Adding a New SEC Data Endpoint

To add support for fetching a new SEC data endpoint:

Add URL enum variant in src/models/url.rs
Create fetch function in src/network/ following the pattern of existing functions
Define data models in src/models/ for the response structure
Add transformation logic in src/transformers/ if normalization is needed
Write unit tests in tests/ using mockito::Server for mocking
Update main.rs to integrate the new endpoint into the processing pipeline

Example function signature pattern:

Adding a New FundamentalConcept Mapping

The distill_us_gaap_fundamental_concepts function maps raw SEC concept names to the FundamentalConcept enum. To add a new concept:

Add enum variant to FundamentalConcept in src/models/fundamental_concept.rs
Update the match arms in src/transformers/distill_us_gaap_fundamental_concepts.rs
Add test case to verify the mapping in tests/distill_tests.rs

See the existing mapping patterns in the transformer module for hierarchical mappings (concepts that map to multiple parent categories).

Modifying HTTP Client Behavior

The SecClient is configured in src/network/sec_client.rs:21-89 Key configuration points:

Configuration	Location	Purpose
`CachePolicy`	src/network/sec_client.rs:45-50	Controls cache TTL and behavior
`ThrottlePolicy`	src/network/sec_client.rs:53-59	Controls rate limiting and retries
User-Agent	src/network/sec_client.rs:91-108	Constructs SEC-compliant User-Agent header

To modify throttling behavior, adjust the ThrottlePolicy parameters:

base_delay_ms: Minimum delay between requests
max_concurrent: Maximum concurrent requests
max_retries: Number of retry attempts on failure
adaptive_jitter_ms: Random jitter to prevent thundering herd

Sources: src/network/sec_client.rs:21-89

Working with Caches

The system uses two cache types managed by the Caches module:

HTTP Cache : Stores raw HTTP responses with configurable TTL (default: 1 week)
Preprocessor Cache : Stores transformed/preprocessed data

Cache instances are accessed via Caches::get_http_cache_store() as shown in src/network/sec_client.rs73

During development, you may need to clear caches when testing data transformations. Cache data is persisted via the simd-r-drive backend.

Sources: src/network/sec_client.rs73

Code Quality Standards

TODO Comments and Technical Debt

The codebase uses TODO comments to mark areas for improvement. Examples from src/network/sec_client.rs:

src/network/sec_client.rs46: Cache TTL should be configurable
src/network/sec_client.rs57: Adaptive jitter should be configurable
src/network/sec_client.rs100: Repository URL should be included in User-Agent

When adding TODO comments:

Be specific about what needs to be done
Include context about why it's not done now
Reference related issues if applicable

Panic vs Result

The codebase follows Rust best practices:

Use Result<T, E> for recoverable errors
Use panic! only for non-recoverable errors or programming errors

Example from src/network/sec_client.rs:95-98:

This panics because an invalid email makes all SEC API calls fail, representing a configuration error rather than a runtime error.

Sources: src/network/sec_client.rs:95-98

Error Validation in Tests

Configuration validation is tested by verifying error messages contain expected content, as shown in tests/config_manager_tests.rs:68-94:

This pattern ensures configuration errors are informative to users.

Sources: tests/config_manager_tests.rs:68-94

Integration Test Architecture

The integration test script from python/narrative_stack/us_gaap_store_integration_test.sh:1-39 orchestrates:

Python environment setup with dependencies
Docker Compose service startup (isolated project name: us_gaap_it)
MySQL container health check via mysqladmin ping
Database creation and schema loading
pytest execution with verbose output
Automatic cleanup via EXIT trap

Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Best Practices Summary

Practice	Implementation	Reference
Test isolation	Use temporary directories and `AppConfig::default()`	tests/config_manager_tests.rs:9-17
HTTP mocking	Use `mockito::Server` for endpoint simulation	tests/sec_client_tests.rs:37-45
Async testing	Use `#[tokio::test]` attribute	tests/sec_client_tests.rs35
Error handling	Prefer `Result<T, E>` over panic	src/network/sec_client.rs:140-165
Configuration	Use `ConfigManager::from_app_config()` in tests	tests/sec_client_tests.rs10
Integration testing	Use Docker Compose with isolated project names	python/narrative_stack/us_gaap_store_integration_test.sh8
Cleanup	Use trap handlers for guaranteed cleanup	python/narrative_stack/us_gaap_store_integration_test.sh:14-19

Sources: tests/config_manager_tests.rs:9-17 tests/sec_client_tests.rs:35-62 src/network/sec_client.rs:140-165 python/narrative_stack/us_gaap_store_integration_test.sh:1-39

Keyboard shortcuts

rust-sec-fetcher Documentation