This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Development Guide
Relevant source files
- python/narrative_stack/us_gaap_store_integration_test.sh
- src/network/sec_client.rs
- tests/config_manager_tests.rs
- tests/sec_client_tests.rs
Purpose and Scope
This guide provides an overview of development practices, code organization, and workflows for contributing to the rust-sec-fetcher project. It covers environment setup, code organization principles, development workflows, and common development tasks.
For detailed information about specific development topics, see:
- Testing strategies and test fixtures: Testing Strategy
- Continuous integration and automated testing: CI/CD Pipeline
- Docker container configuration: Docker Deployment
Development Environment Setup
Prerequisites
The project requires the following tools installed:
| Tool | Purpose | Version Requirement |
|---|---|---|
| Rust | Core application development | 1.87+ |
| Python | ML pipeline and preprocessing | 3.8+ |
| Docker | Integration testing and services | Latest stable |
| Git LFS | Large file support for test assets | Latest stable |
| MySQL | Database for US GAAP storage | 5.7+ or 8.0+ |
Rust Development Setup
-
Clone the repository and navigate to the root directory
-
Build the Rust application:
-
Run tests to verify setup:
The Rust workspace is configured in Cargo.toml with all necessary dependencies declared. Key development dependencies include:
mockitofor HTTP mocking in teststempfilefor temporary file/directory creation in teststokiotest macros for async test support
Python Development Setup
-
Create a virtual environment:
-
Install dependencies using
uv: -
Verify installation by running integration tests (requires Docker):
Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39
Configuration Setup
The application requires a configuration file at ~/.config/sec-fetcher/config.toml or a custom path specified via command-line argument. Minimum configuration:
For non-interactive testing, use AppConfig directly in test code as shown in tests/config_manager_tests.rs:36-57
Sources: tests/config_manager_tests.rs:36-57 tests/sec_client_tests.rs:8-20
Code Organization and Architecture
Repository Structure
Sources: src/network/sec_client.rs:1-181 tests/config_manager_tests.rs:1-95 tests/sec_client_tests.rs:1-159 python/narrative_stack/us_gaap_store_integration_test.sh:1-39
Module Dependency Flow
The dependency flow follows a layered architecture:
- Configuration Layer :
ConfigManagerloads settings from TOML files and credentials from keyring - Network Layer :
SecClientwraps HTTP client with caching and throttling middleware - Data Fetching Layer : Network module functions fetch raw data from SEC APIs
- Transformation Layer : Transformers normalize raw data into standardized concepts
- Model Layer : Data structures represent domain entities
Sources: src/network/sec_client.rs:1-181 tests/config_manager_tests.rs:1-95
Development Workflow
Standard Development Cycle
Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39
Running Tests Locally
Rust Unit Tests
Run all Rust tests with cargo:
Run specific test modules:
Run with output visibility:
Test Structure Mapping:
| Test File | Tests Component | Key Test Functions |
|---|---|---|
tests/config_manager_tests.rs | ConfigManager | test_load_custom_config, test_load_non_existent_config, test_fails_on_invalid_key |
tests/sec_client_tests.rs | SecClient | test_user_agent, test_fetch_json_without_retry_success, test_fetch_json_with_retry_failure |
Sources: tests/config_manager_tests.rs:1-95 tests/sec_client_tests.rs:1-159
Python Integration Tests
Integration tests require Docker services. Run via the provided shell script:
This script performs the following steps as defined in python/narrative_stack/us_gaap_store_integration_test.sh:1-39:
- Activates Python virtual environment
- Installs dependencies with
uv pip install -e . --group dev - Starts Docker Compose services (
db_test,simd_r_drive_ws_server_test) - Waits for MySQL availability
- Creates
us_gaap_testdatabase - Loads schema from
tests/integration/assets/us_gaap_schema_2025.sql - Runs pytest integration tests
- Tears down containers on exit
Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39
Writing Tests
Unit Test Pattern (Rust)
The codebase follows standard Rust testing patterns with mockito for HTTP mocking:
Key patterns demonstrated in tests/sec_client_tests.rs:35-62:
- Use
#[tokio::test]for async tests - Create
mockito::Serverfor HTTP endpoint mocking - Construct
AppConfigprogrammatically for test isolation - Use
ConfigManager::from_app_config()to bypass file system dependencies - Assert on specific JSON fields in responses
Sources: tests/sec_client_tests.rs:35-62
Test Fixture Pattern
The codebase uses temporary directories for file-based tests:
This pattern ensures test isolation and automatic cleanup as shown in tests/config_manager_tests.rs:8-17
Sources: tests/config_manager_tests.rs:8-17
Error Case Testing
Test error conditions explicitly:
This test from tests/sec_client_tests.rs:93-120 verifies retry behavior by expecting exactly 3 HTTP requests (initial + 2 retries) before failing.
Sources: tests/sec_client_tests.rs:93-120
Common Development Tasks
Adding a New SEC Data Endpoint
To add support for fetching a new SEC data endpoint:
- Add URL enum variant in
src/models/url.rs - Create fetch function in
src/network/following the pattern of existing functions - Define data models in
src/models/for the response structure - Add transformation logic in
src/transformers/if normalization is needed - Write unit tests in
tests/usingmockito::Serverfor mocking - Update main.rs to integrate the new endpoint into the processing pipeline
Example function signature pattern:
Adding a New FundamentalConcept Mapping
The distill_us_gaap_fundamental_concepts function maps raw SEC concept names to the FundamentalConcept enum. To add a new concept:
- Add enum variant to
FundamentalConceptinsrc/models/fundamental_concept.rs - Update the match arms in
src/transformers/distill_us_gaap_fundamental_concepts.rs - Add test case to verify the mapping in
tests/distill_tests.rs
See the existing mapping patterns in the transformer module for hierarchical mappings (concepts that map to multiple parent categories).
Modifying HTTP Client Behavior
The SecClient is configured in src/network/sec_client.rs:21-89 Key configuration points:
| Configuration | Location | Purpose |
|---|---|---|
CachePolicy | src/network/sec_client.rs:45-50 | Controls cache TTL and behavior |
ThrottlePolicy | src/network/sec_client.rs:53-59 | Controls rate limiting and retries |
| User-Agent | src/network/sec_client.rs:91-108 | Constructs SEC-compliant User-Agent header |
To modify throttling behavior, adjust the ThrottlePolicy parameters:
base_delay_ms: Minimum delay between requestsmax_concurrent: Maximum concurrent requestsmax_retries: Number of retry attempts on failureadaptive_jitter_ms: Random jitter to prevent thundering herd
Sources: src/network/sec_client.rs:21-89
Working with Caches
The system uses two cache types managed by the Caches module:
- HTTP Cache : Stores raw HTTP responses with configurable TTL (default: 1 week)
- Preprocessor Cache : Stores transformed/preprocessed data
Cache instances are accessed via Caches::get_http_cache_store() as shown in src/network/sec_client.rs73
During development, you may need to clear caches when testing data transformations. Cache data is persisted via the simd-r-drive backend.
Sources: src/network/sec_client.rs73
Code Quality Standards
TODO Comments and Technical Debt
The codebase uses TODO comments to mark areas for improvement. Examples from src/network/sec_client.rs:
- src/network/sec_client.rs46: Cache TTL should be configurable
- src/network/sec_client.rs57: Adaptive jitter should be configurable
- src/network/sec_client.rs100: Repository URL should be included in User-Agent
When adding TODO comments:
- Be specific about what needs to be done
- Include context about why it's not done now
- Reference related issues if applicable
Panic vs Result
The codebase follows Rust best practices:
- Use
Result<T, E>for recoverable errors - Use
panic!only for non-recoverable errors or programming errors
Example from src/network/sec_client.rs:95-98:
This panics because an invalid email makes all SEC API calls fail, representing a configuration error rather than a runtime error.
Sources: src/network/sec_client.rs:95-98
Error Validation in Tests
Configuration validation is tested by verifying error messages contain expected content, as shown in tests/config_manager_tests.rs:68-94:
This pattern ensures configuration errors are informative to users.
Sources: tests/config_manager_tests.rs:68-94
Integration Test Architecture
The integration test script from python/narrative_stack/us_gaap_store_integration_test.sh:1-39 orchestrates:
- Python environment setup with dependencies
- Docker Compose service startup (isolated project name:
us_gaap_it) - MySQL container health check via
mysqladmin ping - Database creation and schema loading
- pytest execution with verbose output
- Automatic cleanup via EXIT trap
Sources: python/narrative_stack/us_gaap_store_integration_test.sh:1-39
Best Practices Summary
| Practice | Implementation | Reference |
|---|---|---|
| Test isolation | Use temporary directories and AppConfig::default() | tests/config_manager_tests.rs:9-17 |
| HTTP mocking | Use mockito::Server for endpoint simulation | tests/sec_client_tests.rs:37-45 |
| Async testing | Use #[tokio::test] attribute | tests/sec_client_tests.rs35 |
| Error handling | Prefer Result<T, E> over panic | src/network/sec_client.rs:140-165 |
| Configuration | Use ConfigManager::from_app_config() in tests | tests/sec_client_tests.rs10 |
| Integration testing | Use Docker Compose with isolated project names | python/narrative_stack/us_gaap_store_integration_test.sh8 |
| Cleanup | Use trap handlers for guaranteed cleanup | python/narrative_stack/us_gaap_store_integration_test.sh:14-19 |
Sources: tests/config_manager_tests.rs:9-17 tests/sec_client_tests.rs:35-62 src/network/sec_client.rs:140-165 python/narrative_stack/us_gaap_store_integration_test.sh:1-39