Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Getting Started

Relevant source files

This page guides you through installing, configuring, and running the rust-sec-fetcher application. It covers building the Rust binary, setting up required credentials, and executing your first data fetch. For detailed configuration options, see Configuration System. For comprehensive examples, see Running Examples.

The rust-sec-fetcher is the Rust component of a dual-language system. It fetches and transforms SEC financial data into structured CSV files. The companion Python system (narrative_stack) processes these files for machine learning applications.


Prerequisites

Before installation, ensure you have:

RequirementPurposeNotes
Rust 1.87+Compile sec-fetcherEdition 2021 features required
Email AddressSEC EDGAR API accessRequired by SEC for API identification
4+ GB Disk SpaceCache and CSV storageDefault location: data/ directory
Internet ConnectionSEC API accessThrottled to 1 request/second

Optional Components:

Sources: Cargo.toml:1-45


Installation

Clone Repository

Build from Source

The compiled binary will be located at:

  • Debug: target/debug/sec-fetcher
  • Release: target/release/sec-fetcher

Verify Installation

If successful, this will load the configuration and display it in JSON format. If no configuration exists, it will prompt for your email address in interactive mode.

Installation Flow Diagram

graph TB
    Clone["Clone Repository\nrust-sec-fetcher"]
Build["cargo build --release"]
Binary["Binary Created\ntarget/release/sec-fetcher"]
Config["Configuration Setup\nConfigManager::load()"]
Verify["Run Example\ncargo run --example config"]
Clone --> Build
 
   Build --> Binary
 
   Binary --> Config
 
   Config --> Verify
    
    ConfigFile["Configuration File\nsec_fetcher_config.toml"]
Credential["Email Credential\nCredentialManager"]
Config --> ConfigFile
 
   Config --> Credential
    
 
   Verify --> Success["Display AppConfig\nJSON Output"]
Verify --> Error["Missing Email\nPrompt in Interactive Mode"]

Sources: Cargo.toml:1-6 src/config/config_manager.rs:20-23 examples/config.rs:1-17


Basic Configuration

The application uses a TOML configuration file combined with system credential storage for the required email address.

Configuration File Location

The ConfigManager searches for configuration files in this order:

  1. System Config Directory : Platform-specific location returned by ConfigManager::get_suggested_system_path()

    • Linux: ~/.config/sec-fetcher/config.toml
    • macOS: ~/Library/Application Support/sec-fetcher/config.toml
    • Windows: C:\Users\<User>\AppData\Roaming\sec-fetcher\config.toml
  2. Current Directory : sec_fetcher_config.toml (fallback)

Configuration Fields

The AppConfig structure src/config/app_config.rs:15-32 supports the following fields:

FieldTypeDefaultDescription
emailOption<String>NoneRequired - Your email for SEC API identification
max_concurrentOption<usize>1Maximum concurrent requests
min_delay_msOption<u64>1000Minimum delay between requests (milliseconds)
max_retriesOption<usize>5Maximum retry attempts for failed requests
cache_base_dirOption<PathBuf>"data"Base directory for caching and CSV output

Example Configuration File

Create sec_fetcher_config.toml:

Email Credential Setup

The SEC EDGAR API requires an email address in the User-Agent header. The application manages this through the CredentialManager:

Interactive Mode (when running from terminal):

Non-Interactive Mode (CI/CD, background processes):

  • Email must be pre-configured in sec_fetcher_config.toml
  • Or stored in system credential manager via prior interactive session

Configuration Loading Flow Diagram

graph TB
    Start["ConfigManager::load()"]
PathCheck{"Config Path\nExists?"}
LoadFile["Config::builder()\nadd_source(File)"]
DefaultConfig["AppConfig::default()"]
MergeUser["settings.merge(user_settings)"]
EmailCheck{"Email\nConfigured?"}
InteractiveCheck{"is_interactive_mode()?"}
Prompt["CredentialManager::from_prompt()"]
KeyringGet["credential_manager.get_credential()"]
Error["Error: Could not obtain email"]
InitCaches["Caches::init(config_manager)"]
Complete["ConfigManager Instance"]
Start --> PathCheck
 
   PathCheck -->|Yes| LoadFile
 
   PathCheck -->|No Fallback| LoadFile
 
   LoadFile --> DefaultConfig
 
   DefaultConfig --> MergeUser
    
 
   MergeUser --> EmailCheck
 
   EmailCheck -->|Missing| InteractiveCheck
 
   EmailCheck -->|Present| InitCaches
    
 
   InteractiveCheck -->|Yes| Prompt
 
   InteractiveCheck -->|No| Error
    
 
   Prompt --> KeyringGet
 
   KeyringGet -->|Success| InitCaches
 
   KeyringGet -->|Failure| Error
    
 
   InitCaches --> Complete

Sources: src/config/config_manager.rs:20-86 src/config/app_config.rs:15-54 Cargo.toml20


Running Your First Data Fetch

Example: Configuration Display

The simplest example displays the loaded configuration:

Code Structure examples/config.rs:1-17:

  1. ConfigManager::load() - Loads configuration from file + credentials
  2. config_manager.get_config() - Retrieves AppConfig reference
  3. config.pretty_print() - Serializes to formatted JSON

Expected Output:

Example: Lookup CIK by Ticker

Fetch the Central Index Key (CIK) for a company ticker symbol:

This example demonstrates:

  • SecClient initialization with throttling
  • fetch_company_tickers() - Downloads SEC company tickers JSON
  • fetch_cik_by_ticker_symbol() - Maps ticker → CIK
  • Caching behavior (subsequent runs use cached data)

Example: Fetch NPORT Filing

Download and parse an NPORT-P investment company filing:

This example shows:

  • Fetching XML filing by accession number
  • Parsing NportInvestment data structures
  • CSV output to data/fund-holdings/{A-Z}/ directories

For detailed walkthrough of all examples, see Running Examples.

Example Execution Flow Diagram

Sources: examples/config.rs:1-17 src/config/config_manager.rs:20-23 Cargo.toml:28-29


Data Output Structure

The application organizes fetched data into a structured directory hierarchy:

data/
├── http_cache/              # HTTP response cache (simd-r-drive)
│   └── sec.gov/
│       └── *.bin            # Cached API responses
│
├── fund-holdings/           # NPORT filing data by ticker
│   ├── A/
│   │   ├── AAPL_holdings.csv
│   │   └── AMZN_holdings.csv
│   ├── B/
│   │   └── MSFT_holdings.csv
│   └── ...                  # A-Z directories
│
└── us-gaap/                 # US GAAP fundamental data
    ├── AAPL_fundamentals.csv
    ├── MSFT_fundamentals.csv
    └── ...

CSV File Formats

US GAAP Fundamentals (us-gaap/*.csv):

  • Ticker symbol
  • Filing date
  • Fiscal period
  • FundamentalConcept (64 normalized concepts)
  • Value
  • Units
  • Accession number

NPORT Holdings (fund-holdings/{A-Z}/*.csv):

  • Fund CIK
  • Investment ticker symbol
  • Investment name
  • Balance (shares)
  • Value (USD)
  • Percentage of portfolio
  • Asset category
  • Issuer category

Data Flow from API to CSV Diagram

Sources: src/config/app_config.rs:31-44 Cargo.toml24


Cache Behavior

The application implements two-tier caching to minimize redundant API calls:

HTTP Cache

  • Storage : simd-r-drive key-value store Cargo.toml36
  • Location : {cache_base_dir}/http_cache/
  • TTL : 1 week (168 hours)
  • Scope : Raw HTTP responses from SEC API

Preprocessor Cache

  • Storage : In-memory DashMap with persistent backup
  • Scope : Transformed data structures (after distill_us_gaap_fundamental_concepts)
  • Purpose : Skip expensive concept normalization on repeated runs

Cache Initialization : The Caches::init() function src/config/config_manager.rs:98-100 is called automatically during ConfigManager construction.

For detailed caching architecture, see Caching & Storage System.

Sources: Cargo.toml:14-37 src/config/config_manager.rs:98-100


Troubleshooting Common Issues

IssueCauseSolution
"Could not obtain email credential"No email configured in non-interactive modeAdd email = "..." to config file or run interactively once
"Config path does not exist"Invalid custom config pathCheck path spelling or omit to use defaults
"unknown field" in configTypo in TOML key nameRun cargo run --example config to see valid keys
Rate limit errors from SECmin_delay_ms too lowIncrease to 1000+ ms (SEC requires 1 req/sec max)
Cache directory permission deniedInsufficient filesystem permissionsChange cache_base_dir to writable location

Debug Configuration Issues:

The AppConfig::get_valid_keys() function src/config/app_config.rs:62-77 dynamically generates a list of valid configuration fields with their expected types using JSON schema introspection.

Sources: src/config/config_manager.rs:49-77 src/config/app_config.rs:62-77


Next Steps

Now that you have the application configured and running, explore these topics:

For Python ML pipeline setup, see Python narrative_stack System.

Sources: examples/config.rs:1-17 src/config/config_manager.rs:1-121 src/config/app_config.rs:1-159