Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Rust sec-fetcher Application

Loading…

Rust sec-fetcher Application

Relevant source files

Purpose and Scope

This page provides an architectural overview of the Rust sec-fetcher application, which is responsible for fetching financial data from the SEC EDGAR API and transforming it into structured formats. The application serves as the high-performance data collection and preprocessing layer in a larger system that combines Rust’s safety and speed for I/O operations with Python’s machine learning capabilities.

This page covers the high-level architecture, module organization, and data flow patterns. For detailed information about specific components, see:

Sources: src/lib.rs:1-12 src/network.rs:1-47

Application Architecture

The sec-fetcher application is built around a modular architecture that separates concerns into distinct layers: configuration, networking, data transformation, and storage. The core design principle is to fetch data from SEC APIs with robust error handling and caching, transform it into a standardized format (often using polars DataFrames), and output structured data for downstream consumption.

graph TB
    subgraph "src/lib.rs Module Organization"
        config["config\nConfigManager, AppConfig"]
enums["enums\nFundamentalConcept, Url\nCacheNamespacePrefix"]
models["models\nTicker, CikSubmission\nNportInvestment, AccessionNumber"]
network["network\nSecClient, fetch_* functions"]
ops["ops\nrender_filing, fetch_and_render"]
parsers["parsers\nXML/JSON parsing utilities"]
caches["caches\nInternal caching infrastructure"]
views["views\nMarkdownView, EmbeddingTextView"]
utils["utils\nVecExtensions, helpers"]
end
    
    subgraph "External Dependencies"
        reqwest["reqwest\nHTTP client"]
polars["polars\nDataFrame operations"]
simd["simd-r-drive\nDrive-based cache storage"]
tokio["tokio\nAsync runtime"]
serde["serde\nSerialization"]
end
    
 
   config --> caches
 
   network --> config
 
   network --> caches
 
   network --> models
 
   network --> enums
 
   network --> parsers
 
   ops --> network
 
   ops --> views
 
   parsers --> models
    
 
   network --> reqwest
 
   network --> simd
 
   network --> tokio
 
   network --> polars
 
   models --> serde

Module Structure

The application is organized into several core modules as declared in the library root:

ModulePurposeKey Components
configConfiguration management and credential handlingConfigManager, AppConfig
enumsType-safe enumerations for domain conceptsFundamentalConcept, Url, CacheNamespacePrefix, FormType
modelsData structures representing SEC entitiesTicker, CikSubmission, NportInvestment, AccessionNumber
networkHTTP client and data fetching functionsSecClient, fetch_company_tickers, fetch_us_gaap_fundamentals
opsHigher-level business logic and workflowsrender_filing, fetch_and_render, diff_holdings
parsersXML/JSON parsing utilitiesparse_us_gaap_fundamentals, parse_cik_submissions_json
cachesInternal caching infrastructureCaches (singleton), HTTP cache, preprocessor cache
viewsRendering logic for filing dataMarkdownView, EmbeddingTextView
normalizeData normalization and cleaningPct type, 13F normalization logic

Sources: src/lib.rs:1-12 src/network.rs:1-47 src/network/fetch_us_gaap_fundamentals.rs:1-108

Data Flow Architecture

Request-Response Flow with Caching

The data flow follows a pipeline pattern:

  1. Request Initiation : High-level operations in ops or CLI binaries call specific fetching functions like fetch_us_gaap_fundamentals src/network/fetch_us_gaap_fundamentals.rs:54-58
  2. Client Middleware : SecClient applies throttling and caching policies before making HTTP requests src/network/sec_client.rs:1-10
  3. Cache Check : The system checks simd-r-drive storage for cached responses based on CacheNamespacePrefix.
  4. API Request : If a cache miss occurs, the request is sent to the SEC EDGAR API (e.g., CompanyFacts endpoint src/network/fetch_us_gaap_fundamentals.rs:62-67).
  5. Parsing : Raw JSON/XML is converted into structured models or DataFrames via the parsers module src/network/fetch_us_gaap_fundamentals.rs69
  6. Enrichment : Data is often cross-referenced; for example, fundamentals are joined with submission data to resolve primary document URLs src/network/fetch_us_gaap_fundamentals.rs:74-105

Sources: src/network/fetch_us_gaap_fundamentals.rs:54-108 src/network.rs:1-47

Key Dependencies and Technology Stack

The application leverages modern Rust crates for performance and reliability:

CategoryCratePurpose
Async RuntimetokioAsynchronous I/O and task scheduling.
HTTP ClientreqwestUnderlying HTTP engine for SecClient.
Data FramespolarsHigh-performance data manipulation, especially for US GAAP data src/network/fetch_us_gaap_fundamentals.rs9
Cachingsimd-r-driveWebSocket-based key-value storage for persistent caching.
SerializationserdeJSON/CSV serialization and deserialization.
XML Parsingquick-xmlFast parsing for SEC XML filings (13F, N-PORT, Form 4).

Sources: src/network/fetch_us_gaap_fundamentals.rs:1-10 src/lib.rs:1-12

Module Interaction Patterns

US GAAP Data Retrieval Example

The interaction between modules is best exemplified by the US GAAP fundamentals retrieval process:

  1. Network Module : fetch_us_gaap_fundamentals is called src/network/fetch_us_gaap_fundamentals.rs54
  2. Models Module : It uses Cik::get_company_cik_by_ticker_symbol to resolve the ticker src/network/fetch_us_gaap_fundamentals.rs60
  3. Enums Module : It constructs the target URL using Url::CompanyFacts src/network/fetch_us_gaap_fundamentals.rs62
  4. Parsers Module : It delegates the raw JSON to parsers::parse_us_gaap_fundamentals src/network/fetch_us_gaap_fundamentals.rs69
  5. Network (Sub-call) : It calls fetch_cik_submissions to enrich the data with filing URLs src/network/fetch_us_gaap_fundamentals.rs74

Sources: src/network/fetch_us_gaap_fundamentals.rs:54-108

Error Handling Strategy

The application uses a layered error handling approach:

  • Network Layer : Handles transient HTTP errors and rate limiting via retries and throttling.
  • Parsing Layer : Returns specific error types (e.g., CikError, AccessionNumberError) when SEC data doesn’t match expected formats.
  • Operations Layer : Often implements “non-fatal” logic, where a failure to fetch secondary data (like submissions for URL enrichment) results in a warning rather than a process crash src/network/fetch_us_gaap_fundamentals.rs:101-105

Sources: src/network/fetch_us_gaap_fundamentals.rs:101-105 src/models.rs:1-5

Dismiss

Refresh this wiki

Enter email to refresh