This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Getting Started
Relevant source files
This page guides you through installing, configuring, and running the rust-sec-fetcher application. It covers building the Rust binary, setting up required credentials, and executing your first data fetch. For detailed configuration options, see Configuration System. For comprehensive examples, see Running Examples.
The rust-sec-fetcher is the Rust component of a dual-language system. It fetches and transforms SEC financial data into structured CSV files. The companion Python system (narrative_stack) processes these files for machine learning applications.
Prerequisites
Before installation, ensure you have:
| Requirement | Purpose | Notes |
|---|---|---|
| Rust 1.87+ | Compile sec-fetcher | Edition 2021 features required |
| Email Address | SEC EDGAR API access | Required by SEC for API identification |
| 4+ GB Disk Space | Cache and CSV storage | Default location: data/ directory |
| Internet Connection | SEC API access | Throttled to 1 request/second |
Optional Components:
- Python 3.8+ for ML pipeline (narrative_stack)
- Docker for simd-r-drive WebSocket server (Docker Deployment)
- MySQL for US GAAP data storage (Database Integration)
Sources: Cargo.toml:1-45
Installation
Clone Repository
Build from Source
The compiled binary will be located at:
- Debug:
target/debug/sec-fetcher - Release:
target/release/sec-fetcher
Verify Installation
If successful, this will load the configuration and display it in JSON format. If no configuration exists, it will prompt for your email address in interactive mode.
Installation Flow Diagram
graph TB
Clone["Clone Repository\nrust-sec-fetcher"]
Build["cargo build --release"]
Binary["Binary Created\ntarget/release/sec-fetcher"]
Config["Configuration Setup\nConfigManager::load()"]
Verify["Run Example\ncargo run --example config"]
Clone --> Build
Build --> Binary
Binary --> Config
Config --> Verify
ConfigFile["Configuration File\nsec_fetcher_config.toml"]
Credential["Email Credential\nCredentialManager"]
Config --> ConfigFile
Config --> Credential
Verify --> Success["Display AppConfig\nJSON Output"]
Verify --> Error["Missing Email\nPrompt in Interactive Mode"]
Sources: Cargo.toml:1-6 src/config/config_manager.rs:20-23 examples/config.rs:1-17
Basic Configuration
The application uses a TOML configuration file combined with system credential storage for the required email address.
Configuration File Location
The ConfigManager searches for configuration files in this order:
-
System Config Directory : Platform-specific location returned by
ConfigManager::get_suggested_system_path()- Linux:
~/.config/sec-fetcher/config.toml - macOS:
~/Library/Application Support/sec-fetcher/config.toml - Windows:
C:\Users\<User>\AppData\Roaming\sec-fetcher\config.toml
- Linux:
-
Current Directory :
sec_fetcher_config.toml(fallback)
Configuration Fields
The AppConfig structure src/config/app_config.rs:15-32 supports the following fields:
| Field | Type | Default | Description |
|---|---|---|---|
email | Option<String> | None | Required - Your email for SEC API identification |
max_concurrent | Option<usize> | 1 | Maximum concurrent requests |
min_delay_ms | Option<u64> | 1000 | Minimum delay between requests (milliseconds) |
max_retries | Option<usize> | 5 | Maximum retry attempts for failed requests |
cache_base_dir | Option<PathBuf> | "data" | Base directory for caching and CSV output |
Example Configuration File
Create sec_fetcher_config.toml:
Email Credential Setup
The SEC EDGAR API requires an email address in the User-Agent header. The application manages this through the CredentialManager:
Interactive Mode (when running from terminal):
Non-Interactive Mode (CI/CD, background processes):
- Email must be pre-configured in
sec_fetcher_config.toml - Or stored in system credential manager via prior interactive session
Configuration Loading Flow Diagram
graph TB
Start["ConfigManager::load()"]
PathCheck{"Config Path\nExists?"}
LoadFile["Config::builder()\nadd_source(File)"]
DefaultConfig["AppConfig::default()"]
MergeUser["settings.merge(user_settings)"]
EmailCheck{"Email\nConfigured?"}
InteractiveCheck{"is_interactive_mode()?"}
Prompt["CredentialManager::from_prompt()"]
KeyringGet["credential_manager.get_credential()"]
Error["Error: Could not obtain email"]
InitCaches["Caches::init(config_manager)"]
Complete["ConfigManager Instance"]
Start --> PathCheck
PathCheck -->|Yes| LoadFile
PathCheck -->|No Fallback| LoadFile
LoadFile --> DefaultConfig
DefaultConfig --> MergeUser
MergeUser --> EmailCheck
EmailCheck -->|Missing| InteractiveCheck
EmailCheck -->|Present| InitCaches
InteractiveCheck -->|Yes| Prompt
InteractiveCheck -->|No| Error
Prompt --> KeyringGet
KeyringGet -->|Success| InitCaches
KeyringGet -->|Failure| Error
InitCaches --> Complete
Sources: src/config/config_manager.rs:20-86 src/config/app_config.rs:15-54 Cargo.toml20
Running Your First Data Fetch
Example: Configuration Display
The simplest example displays the loaded configuration:
Code Structure examples/config.rs:1-17:
ConfigManager::load()- Loads configuration from file + credentialsconfig_manager.get_config()- RetrievesAppConfigreferenceconfig.pretty_print()- Serializes to formatted JSON
Expected Output:
Example: Lookup CIK by Ticker
Fetch the Central Index Key (CIK) for a company ticker symbol:
This example demonstrates:
SecClientinitialization with throttlingfetch_company_tickers()- Downloads SEC company tickers JSONfetch_cik_by_ticker_symbol()- Maps ticker → CIK- Caching behavior (subsequent runs use cached data)
Example: Fetch NPORT Filing
Download and parse an NPORT-P investment company filing:
This example shows:
- Fetching XML filing by accession number
- Parsing
NportInvestmentdata structures - CSV output to
data/fund-holdings/{A-Z}/directories
For detailed walkthrough of all examples, see Running Examples.
Example Execution Flow Diagram
Sources: examples/config.rs:1-17 src/config/config_manager.rs:20-23 Cargo.toml:28-29
Data Output Structure
The application organizes fetched data into a structured directory hierarchy:
data/
├── http_cache/ # HTTP response cache (simd-r-drive)
│ └── sec.gov/
│ └── *.bin # Cached API responses
│
├── fund-holdings/ # NPORT filing data by ticker
│ ├── A/
│ │ ├── AAPL_holdings.csv
│ │ └── AMZN_holdings.csv
│ ├── B/
│ │ └── MSFT_holdings.csv
│ └── ... # A-Z directories
│
└── us-gaap/ # US GAAP fundamental data
├── AAPL_fundamentals.csv
├── MSFT_fundamentals.csv
└── ...
CSV File Formats
US GAAP Fundamentals (us-gaap/*.csv):
- Ticker symbol
- Filing date
- Fiscal period
FundamentalConcept(64 normalized concepts)- Value
- Units
- Accession number
NPORT Holdings (fund-holdings/{A-Z}/*.csv):
- Fund CIK
- Investment ticker symbol
- Investment name
- Balance (shares)
- Value (USD)
- Percentage of portfolio
- Asset category
- Issuer category
Data Flow from API to CSV Diagram
Sources: src/config/app_config.rs:31-44 Cargo.toml24
Cache Behavior
The application implements two-tier caching to minimize redundant API calls:
HTTP Cache
- Storage :
simd-r-drivekey-value store Cargo.toml36 - Location :
{cache_base_dir}/http_cache/ - TTL : 1 week (168 hours)
- Scope : Raw HTTP responses from SEC API
Preprocessor Cache
- Storage : In-memory
DashMapwith persistent backup - Scope : Transformed data structures (after
distill_us_gaap_fundamental_concepts) - Purpose : Skip expensive concept normalization on repeated runs
Cache Initialization : The Caches::init() function src/config/config_manager.rs:98-100 is called automatically during ConfigManager construction.
For detailed caching architecture, see Caching & Storage System.
Sources: Cargo.toml:14-37 src/config/config_manager.rs:98-100
Troubleshooting Common Issues
| Issue | Cause | Solution |
|---|---|---|
| "Could not obtain email credential" | No email configured in non-interactive mode | Add email = "..." to config file or run interactively once |
| "Config path does not exist" | Invalid custom config path | Check path spelling or omit to use defaults |
| "unknown field" in config | Typo in TOML key name | Run cargo run --example config to see valid keys |
| Rate limit errors from SEC | min_delay_ms too low | Increase to 1000+ ms (SEC requires 1 req/sec max) |
| Cache directory permission denied | Insufficient filesystem permissions | Change cache_base_dir to writable location |
Debug Configuration Issues:
The AppConfig::get_valid_keys() function src/config/app_config.rs:62-77 dynamically generates a list of valid configuration fields with their expected types using JSON schema introspection.
Sources: src/config/config_manager.rs:49-77 src/config/app_config.rs:62-77
Next Steps
Now that you have the application configured and running, explore these topics:
- Configuration System - Deep dive into
AppConfig,ConfigManager, credential management, and TOML structure - Running Examples - Comprehensive walkthrough of all example programs (
config.rs,funds.rs,lookup_cik.rs,nport_filing.rs,us_gaap_human_readable.rs) - Network Layer& SecClient - Understanding HTTP client, throttling policies, and retry logic
- US GAAP Concept Transformation - How 57+ revenue variations are normalized into 64 standardized concepts
- Main Application Flow - Complete data fetching workflow from
main.rs
For Python ML pipeline setup, see Python narrative_stack System.
Sources: examples/config.rs:1-17 src/config/config_manager.rs:1-121 src/config/app_config.rs:1-159