Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Data Fetching Functions

Loading…

Data Fetching Functions

Relevant source files

Purpose and Scope

This document describes the data fetching functions in the Rust sec-fetcher application. These functions provide the interface for retrieving financial data from the SEC EDGAR API, including company tickers, CIK lookups, submissions, company descriptions, SIC codes, EDGAR master index, NPORT filings, US GAAP fundamentals, and investment company datasets.

For information about the underlying HTTP client, throttling, and caching infrastructure, see [3.1 Network Layer & SecClient](https://github.com/jzombie/rust-sec-fetcher/blob/345ac64c/3.1 Network Layer & SecClient) For details on how US GAAP data is transformed after fetching, see [3.4 US GAAP Concept Transformation](https://github.com/jzombie/rust-sec-fetcher/blob/345ac64c/3.4 US GAAP Concept Transformation) For information about the data structures returned by these functions, see [3.5 Data Models & Enumerations](https://github.com/jzombie/rust-sec-fetcher/blob/345ac64c/3.5 Data Models & Enumerations)

Overview of Data Fetching Architecture

The network module provides specialized fetching functions that retrieve different types of financial data from the SEC EDGAR API. Each function accepts a SecClient reference and returns structured data types or Polars DataFrames.

Data Flow: Natural Language to Code Entity Space

The following diagram maps high-level data requirements to specific Rust functions and their corresponding SEC EDGAR endpoints.

Diagram: Mapping Data Requirements to Code Entities

graph TB
    subgraph "Data Requirement (Natural Language)"
        ReqTickers["'Get all stock symbols'"]
ReqCIK["'Find CIK for AAPL'"]
ReqDesc["'What does this company do?'"]
ReqGAAP["'Get revenue and net income'"]
ReqFeed["'What was filed today?'"]
ReqFunds["'Search for mutual fund CIKs'"]
end
    
    subgraph "Code Entity Space (Functions)"
        fetch_company_tickers["fetch_company_tickers()\nsrc/network/fetch_company_tickers.rs"]
fetch_cik_by_ticker_symbol["fetch_cik_by_ticker_symbol()\nsrc/network/fetch_cik_by_ticker_symbol.rs"]
fetch_company_description["fetch_company_description()\nsrc/network/fetch_company_description.rs"]
fetch_us_gaap_fundamentals["fetch_us_gaap_fundamentals()\nsrc/network/fetch_us_gaap_fundamentals.rs"]
parse_edgar_atom_feed["parse_edgar_atom_feed()\nsrc/network/fetch_edgar_feed.rs"]
fetch_investment_company_series_and_class_dataset["fetch_investment_company_series_and_class_dataset()\nsrc/network/fetch_investment_company_series_and_class_dataset.rs"]
end
    
    subgraph "EDGAR API Endpoints (Url Enum)"
        Url_CompanyTickersJson["Url::CompanyTickersJson"]
Url_CompanyFacts["Url::CompanyFacts(cik)"]
Url_CikAccessionDocument["Url::CikAccessionDocument"]
Url_InvestmentCompanySeriesAndClassDataset["Url::InvestmentCompanySeriesAndClassDataset(year)"]
end
    
 
   ReqTickers --> fetch_company_tickers
 
   ReqCIK --> fetch_cik_by_ticker_symbol
 
   ReqDesc --> fetch_company_description
 
   ReqGAAP --> fetch_us_gaap_fundamentals
 
   ReqFeed --> parse_edgar_atom_feed
 
   ReqFunds --> fetch_investment_company_series_and_class_dataset
    
 
   fetch_company_tickers --> Url_CompanyTickersJson
 
   fetch_us_gaap_fundamentals --> Url_CompanyFacts
 
   fetch_company_description --> Url_CikAccessionDocument
 
   fetch_investment_company_series_and_class_dataset --> Url_InvestmentCompanySeriesAndClassDataset

Sources: src/network/fetch_company_tickers.rs:58-61 src/network/fetch_cik_by_ticker_symbol.rs:53-56 src/network/fetch_company_description.rs:35-38 src/network/fetch_us_gaap_fundamentals.rs:54-58 src/network/fetch_investment_company_series_and_class_dataset.rs:43-45 src/network/fetch_edgar_feed.rs118

Company Ticker and CIK Resolution

Function: fetch_company_tickers

Retrieves operating-company equity tickers. It supports merging primary listings with derived instruments (warrants, units, preferreds) src/network/fetch_company_tickers.rs:8-18

Function: fetch_cik_by_ticker_symbol

Resolves a ticker to a 10-digit CIK. It prioritizes operating companies before falling back to the investment company dataset for mutual funds/ETFs src/network/fetch_cik_by_ticker_symbol.rs:27-36

Fuzzy Matching: Ticker::get_by_fuzzy_matched_name

Performs weighted token overlap matching to resolve company names to tickers src/models/ticker.rs:38-42 It uses a TOKEN_MATCH_THRESHOLD of 0.6 and applies boosts for exact matches and common stock src/models/ticker.rs:27-32

Sources: src/network/fetch_company_tickers.rs:58-61 src/network/fetch_cik_by_ticker_symbol.rs:53-72 src/models/ticker.rs:35-136

Company Profiles and Descriptions

Function: fetch_company_description

Extracts the “Item 1. Business” section from a company’s most recent 10-K filing src/network/fetch_company_description.rs:11-13

Implementation Strategy :

  1. Locates all occurrences of “Item 1” and “Item 1A” using regex src/network/fetch_company_description.rs:81-82
  2. Identifies the “real” section by finding the largest HTML byte gap between an Item 1 and its subsequent Item 1A (avoiding Table of Contents entries) src/network/fetch_company_description.rs:90-96
  3. Uses html2text for tag stripping and entity decoding src/network/fetch_company_description.rs:103-105
  4. Skips short heading lines and truncates at a sentence boundary near 800 characters src/network/fetch_company_description.rs:109-133

Function: fetch_sic_codes

Fetches the complete SEC SIC code list from siccodes.htm src/network/fetch_sic_codes.rs:11-15 It parses the HTML table rows into SicCode models containing the code, office, and industry title src/network/fetch_sic_codes.rs:63-71

Sources: src/network/fetch_company_description.rs:35-68 src/network/fetch_sic_codes.rs:33-45

US GAAP Fundamentals

Function: fetch_us_gaap_fundamentals

Fetches all XBRL-tagged financial data for a company as a structured DataFrame src/network/fetch_us_gaap_fundamentals.rs:11-12

Data Flow :

  1. Resolves Ticker to CIK src/network/fetch_us_gaap_fundamentals.rs60
  2. Fetches JSON from the companyfacts endpoint src/network/fetch_us_gaap_fundamentals.rs67
  3. Accession Resolution : Calls fetch_cik_submissions to map accession numbers to primary document URLs (e.g., “aapl-20241228.htm”) src/network/fetch_us_gaap_fundamentals.rs:74-81
  4. Updates the filing_url column in the resulting DataFrame src/network/fetch_us_gaap_fundamentals.rs99

Sources: src/network/fetch_us_gaap_fundamentals.rs:54-108

Investment Company Datasets

Function: fetch_investment_company_series_and_class_dataset

Retrieves the annual CSV of registered investment company share classes src/network/fetch_investment_company_series_and_class_dataset.rs:22-32

Year Fallback and Caching :

Diagram: Investment Company Dataset Fetch Sequence

Sources: src/network/fetch_investment_company_series_and_class_dataset.rs:43-80 src/network/fetch_investment_company_series_and_class_dataset.rs:92-117

Real-time Feeds

Function: parse_edgar_atom_feed

Parses the EDGAR Atom XML feed into FeedEntry items src/network/fetch_edgar_feed.rs:112-118

Sources: src/network/fetch_edgar_feed.rs:46-109 src/network/fetch_edgar_feed.rs:118-155

Dismiss

Refresh this wiki

Enter email to refresh