This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Rust sec-fetcher Application
Loading…
Rust sec-fetcher Application
Relevant source files
Purpose and Scope
This page provides an architectural overview of the Rust sec-fetcher application, which is responsible for fetching financial data from the SEC EDGAR API and transforming it into structured formats. The application serves as the high-performance data collection and preprocessing layer in a larger system that combines Rust’s safety and speed for I/O operations with Python’s machine learning capabilities.
This page covers the high-level architecture, module organization, and data flow patterns. For detailed information about specific components, see:
Sources: src/lib.rs:1-12 src/network.rs:1-47
Application Architecture
The sec-fetcher application is built around a modular architecture that separates concerns into distinct layers: configuration, networking, data transformation, and storage. The core design principle is to fetch data from SEC APIs with robust error handling and caching, transform it into a standardized format (often using polars DataFrames), and output structured data for downstream consumption.
graph TB
subgraph "src/lib.rs Module Organization"
config["config\nConfigManager, AppConfig"]
enums["enums\nFundamentalConcept, Url\nCacheNamespacePrefix"]
models["models\nTicker, CikSubmission\nNportInvestment, AccessionNumber"]
network["network\nSecClient, fetch_* functions"]
ops["ops\nrender_filing, fetch_and_render"]
parsers["parsers\nXML/JSON parsing utilities"]
caches["caches\nInternal caching infrastructure"]
views["views\nMarkdownView, EmbeddingTextView"]
utils["utils\nVecExtensions, helpers"]
end
subgraph "External Dependencies"
reqwest["reqwest\nHTTP client"]
polars["polars\nDataFrame operations"]
simd["simd-r-drive\nDrive-based cache storage"]
tokio["tokio\nAsync runtime"]
serde["serde\nSerialization"]
end
config --> caches
network --> config
network --> caches
network --> models
network --> enums
network --> parsers
ops --> network
ops --> views
parsers --> models
network --> reqwest
network --> simd
network --> tokio
network --> polars
models --> serde
Module Structure
The application is organized into several core modules as declared in the library root:
| Module | Purpose | Key Components |
|---|---|---|
config | Configuration management and credential handling | ConfigManager, AppConfig |
enums | Type-safe enumerations for domain concepts | FundamentalConcept, Url, CacheNamespacePrefix, FormType |
models | Data structures representing SEC entities | Ticker, CikSubmission, NportInvestment, AccessionNumber |
network | HTTP client and data fetching functions | SecClient, fetch_company_tickers, fetch_us_gaap_fundamentals |
ops | Higher-level business logic and workflows | render_filing, fetch_and_render, diff_holdings |
parsers | XML/JSON parsing utilities | parse_us_gaap_fundamentals, parse_cik_submissions_json |
caches | Internal caching infrastructure | Caches (singleton), HTTP cache, preprocessor cache |
views | Rendering logic for filing data | MarkdownView, EmbeddingTextView |
normalize | Data normalization and cleaning | Pct type, 13F normalization logic |
Sources: src/lib.rs:1-12 src/network.rs:1-47 src/network/fetch_us_gaap_fundamentals.rs:1-108
Data Flow Architecture
Request-Response Flow with Caching
The data flow follows a pipeline pattern:
- Request Initiation : High-level operations in
opsor CLI binaries call specific fetching functions likefetch_us_gaap_fundamentalssrc/network/fetch_us_gaap_fundamentals.rs:54-58 - Client Middleware :
SecClientapplies throttling and caching policies before making HTTP requests src/network/sec_client.rs:1-10 - Cache Check : The system checks
simd-r-drivestorage for cached responses based onCacheNamespacePrefix. - API Request : If a cache miss occurs, the request is sent to the SEC EDGAR API (e.g.,
CompanyFactsendpoint src/network/fetch_us_gaap_fundamentals.rs:62-67). - Parsing : Raw JSON/XML is converted into structured models or DataFrames via the
parsersmodule src/network/fetch_us_gaap_fundamentals.rs69 - Enrichment : Data is often cross-referenced; for example, fundamentals are joined with submission data to resolve primary document URLs src/network/fetch_us_gaap_fundamentals.rs:74-105
Sources: src/network/fetch_us_gaap_fundamentals.rs:54-108 src/network.rs:1-47
Key Dependencies and Technology Stack
The application leverages modern Rust crates for performance and reliability:
| Category | Crate | Purpose |
|---|---|---|
| Async Runtime | tokio | Asynchronous I/O and task scheduling. |
| HTTP Client | reqwest | Underlying HTTP engine for SecClient. |
| Data Frames | polars | High-performance data manipulation, especially for US GAAP data src/network/fetch_us_gaap_fundamentals.rs9 |
| Caching | simd-r-drive | WebSocket-based key-value storage for persistent caching. |
| Serialization | serde | JSON/CSV serialization and deserialization. |
| XML Parsing | quick-xml | Fast parsing for SEC XML filings (13F, N-PORT, Form 4). |
Sources: src/network/fetch_us_gaap_fundamentals.rs:1-10 src/lib.rs:1-12
Module Interaction Patterns
US GAAP Data Retrieval Example
The interaction between modules is best exemplified by the US GAAP fundamentals retrieval process:
- Network Module :
fetch_us_gaap_fundamentalsis called src/network/fetch_us_gaap_fundamentals.rs54 - Models Module : It uses
Cik::get_company_cik_by_ticker_symbolto resolve the ticker src/network/fetch_us_gaap_fundamentals.rs60 - Enums Module : It constructs the target URL using
Url::CompanyFactssrc/network/fetch_us_gaap_fundamentals.rs62 - Parsers Module : It delegates the raw JSON to
parsers::parse_us_gaap_fundamentalssrc/network/fetch_us_gaap_fundamentals.rs69 - Network (Sub-call) : It calls
fetch_cik_submissionsto enrich the data with filing URLs src/network/fetch_us_gaap_fundamentals.rs74
Sources: src/network/fetch_us_gaap_fundamentals.rs:54-108
Error Handling Strategy
The application uses a layered error handling approach:
- Network Layer : Handles transient HTTP errors and rate limiting via retries and throttling.
- Parsing Layer : Returns specific error types (e.g.,
CikError,AccessionNumberError) when SEC data doesn’t match expected formats. - Operations Layer : Often implements “non-fatal” logic, where a failure to fetch secondary data (like submissions for URL enrichment) results in a warning rather than a process crash src/network/fetch_us_gaap_fundamentals.rs:101-105
Sources: src/network/fetch_us_gaap_fundamentals.rs:101-105 src/models.rs:1-5
Dismiss
Refresh this wiki
Enter email to refresh