This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Data Fetching Functions
Loading…
Data Fetching Functions
Relevant source files
- examples/fuzzy_match_company.rs
- src/lib.rs
- src/models/nport_investment.rs
- src/models/ticker.rs
- src/network/fetch_cik_by_ticker_symbol.rs
- src/network/fetch_company_description.rs
- src/network/fetch_company_tickers.rs
- src/network/fetch_edgar_feed.rs
- src/network/fetch_investment_company_series_and_class_dataset.rs
- src/network/fetch_sic_codes.rs
- src/network/fetch_us_gaap_fundamentals.rs
Purpose and Scope
This document describes the data fetching functions in the Rust sec-fetcher application. These functions provide the interface for retrieving financial data from the SEC EDGAR API, including company tickers, CIK lookups, submissions, company descriptions, SIC codes, EDGAR master index, NPORT filings, US GAAP fundamentals, and investment company datasets.
For information about the underlying HTTP client, throttling, and caching infrastructure, see [3.1 Network Layer & SecClient](https://github.com/jzombie/rust-sec-fetcher/blob/345ac64c/3.1 Network Layer & SecClient) For details on how US GAAP data is transformed after fetching, see [3.4 US GAAP Concept Transformation](https://github.com/jzombie/rust-sec-fetcher/blob/345ac64c/3.4 US GAAP Concept Transformation) For information about the data structures returned by these functions, see [3.5 Data Models & Enumerations](https://github.com/jzombie/rust-sec-fetcher/blob/345ac64c/3.5 Data Models & Enumerations)
Overview of Data Fetching Architecture
The network module provides specialized fetching functions that retrieve different types of financial data from the SEC EDGAR API. Each function accepts a SecClient reference and returns structured data types or Polars DataFrames.
Data Flow: Natural Language to Code Entity Space
The following diagram maps high-level data requirements to specific Rust functions and their corresponding SEC EDGAR endpoints.
Diagram: Mapping Data Requirements to Code Entities
graph TB
subgraph "Data Requirement (Natural Language)"
ReqTickers["'Get all stock symbols'"]
ReqCIK["'Find CIK for AAPL'"]
ReqDesc["'What does this company do?'"]
ReqGAAP["'Get revenue and net income'"]
ReqFeed["'What was filed today?'"]
ReqFunds["'Search for mutual fund CIKs'"]
end
subgraph "Code Entity Space (Functions)"
fetch_company_tickers["fetch_company_tickers()\nsrc/network/fetch_company_tickers.rs"]
fetch_cik_by_ticker_symbol["fetch_cik_by_ticker_symbol()\nsrc/network/fetch_cik_by_ticker_symbol.rs"]
fetch_company_description["fetch_company_description()\nsrc/network/fetch_company_description.rs"]
fetch_us_gaap_fundamentals["fetch_us_gaap_fundamentals()\nsrc/network/fetch_us_gaap_fundamentals.rs"]
parse_edgar_atom_feed["parse_edgar_atom_feed()\nsrc/network/fetch_edgar_feed.rs"]
fetch_investment_company_series_and_class_dataset["fetch_investment_company_series_and_class_dataset()\nsrc/network/fetch_investment_company_series_and_class_dataset.rs"]
end
subgraph "EDGAR API Endpoints (Url Enum)"
Url_CompanyTickersJson["Url::CompanyTickersJson"]
Url_CompanyFacts["Url::CompanyFacts(cik)"]
Url_CikAccessionDocument["Url::CikAccessionDocument"]
Url_InvestmentCompanySeriesAndClassDataset["Url::InvestmentCompanySeriesAndClassDataset(year)"]
end
ReqTickers --> fetch_company_tickers
ReqCIK --> fetch_cik_by_ticker_symbol
ReqDesc --> fetch_company_description
ReqGAAP --> fetch_us_gaap_fundamentals
ReqFeed --> parse_edgar_atom_feed
ReqFunds --> fetch_investment_company_series_and_class_dataset
fetch_company_tickers --> Url_CompanyTickersJson
fetch_us_gaap_fundamentals --> Url_CompanyFacts
fetch_company_description --> Url_CikAccessionDocument
fetch_investment_company_series_and_class_dataset --> Url_InvestmentCompanySeriesAndClassDataset
Sources: src/network/fetch_company_tickers.rs:58-61 src/network/fetch_cik_by_ticker_symbol.rs:53-56 src/network/fetch_company_description.rs:35-38 src/network/fetch_us_gaap_fundamentals.rs:54-58 src/network/fetch_investment_company_series_and_class_dataset.rs:43-45 src/network/fetch_edgar_feed.rs118
Company Ticker and CIK Resolution
Function: fetch_company_tickers
Retrieves operating-company equity tickers. It supports merging primary listings with derived instruments (warrants, units, preferreds) src/network/fetch_company_tickers.rs:8-18
- Primary Source :
company_tickers.jsonsrc/network/fetch_company_tickers.rs:24-25 - Derived Source :
ticker.txt(ifinclude_derived_instrumentsis true) src/network/fetch_company_tickers.rs:27-32 - Backfilling : Uses CIK-to-name mapping from JSON to enrich text-only derived entries src/network/fetch_company_tickers.rs:84-88
Function: fetch_cik_by_ticker_symbol
Resolves a ticker to a 10-digit CIK. It prioritizes operating companies before falling back to the investment company dataset for mutual funds/ETFs src/network/fetch_cik_by_ticker_symbol.rs:27-36
Fuzzy Matching: Ticker::get_by_fuzzy_matched_name
Performs weighted token overlap matching to resolve company names to tickers src/models/ticker.rs:38-42 It uses a TOKEN_MATCH_THRESHOLD of 0.6 and applies boosts for exact matches and common stock src/models/ticker.rs:27-32
Sources: src/network/fetch_company_tickers.rs:58-61 src/network/fetch_cik_by_ticker_symbol.rs:53-72 src/models/ticker.rs:35-136
Company Profiles and Descriptions
Function: fetch_company_description
Extracts the “Item 1. Business” section from a company’s most recent 10-K filing src/network/fetch_company_description.rs:11-13
Implementation Strategy :
- Locates all occurrences of “Item 1” and “Item 1A” using regex src/network/fetch_company_description.rs:81-82
- Identifies the “real” section by finding the largest HTML byte gap between an Item 1 and its subsequent Item 1A (avoiding Table of Contents entries) src/network/fetch_company_description.rs:90-96
- Uses
html2textfor tag stripping and entity decoding src/network/fetch_company_description.rs:103-105 - Skips short heading lines and truncates at a sentence boundary near 800 characters src/network/fetch_company_description.rs:109-133
Function: fetch_sic_codes
Fetches the complete SEC SIC code list from siccodes.htm src/network/fetch_sic_codes.rs:11-15 It parses the HTML table rows into SicCode models containing the code, office, and industry title src/network/fetch_sic_codes.rs:63-71
Sources: src/network/fetch_company_description.rs:35-68 src/network/fetch_sic_codes.rs:33-45
US GAAP Fundamentals
Function: fetch_us_gaap_fundamentals
Fetches all XBRL-tagged financial data for a company as a structured DataFrame src/network/fetch_us_gaap_fundamentals.rs:11-12
Data Flow :
- Resolves Ticker to CIK src/network/fetch_us_gaap_fundamentals.rs60
- Fetches JSON from the
companyfactsendpoint src/network/fetch_us_gaap_fundamentals.rs67 - Accession Resolution : Calls
fetch_cik_submissionsto map accession numbers to primary document URLs (e.g., “aapl-20241228.htm”) src/network/fetch_us_gaap_fundamentals.rs:74-81 - Updates the
filing_urlcolumn in the resulting DataFrame src/network/fetch_us_gaap_fundamentals.rs99
Sources: src/network/fetch_us_gaap_fundamentals.rs:54-108
Investment Company Datasets
Function: fetch_investment_company_series_and_class_dataset
Retrieves the annual CSV of registered investment company share classes src/network/fetch_investment_company_series_and_class_dataset.rs:22-32
Year Fallback and Caching :
- It attempts to fetch the current year’s file. If it receives a 404, it decrements the year and retries until it finds a valid file (minimum year 2024) src/network/fetch_investment_company_series_and_class_dataset.rs:39-66
- The successful year is stored in the
preprocessor_cachewith a 1-week TTL to avoid repeated 404s src/network/fetch_investment_company_series_and_class_dataset.rs:68-73
Diagram: Investment Company Dataset Fetch Sequence
Sources: src/network/fetch_investment_company_series_and_class_dataset.rs:43-80 src/network/fetch_investment_company_series_and_class_dataset.rs:92-117
Real-time Feeds
Function: parse_edgar_atom_feed
Parses the EDGAR Atom XML feed into FeedEntry items src/network/fetch_edgar_feed.rs:112-118
- Extraction Logic : Uses regex to pull the CIK from the archive URL src/network/fetch_edgar_feed.rs:70-74 the company name from the title src/network/fetch_edgar_feed.rs:78-82 and 8-K item codes from the summary text src/network/fetch_edgar_feed.rs:93-97
- HTML Handling : Strips HTML tags from the summary to extract the filing date and item codes src/network/fetch_edgar_feed.rs:86-91
Sources: src/network/fetch_edgar_feed.rs:46-109 src/network/fetch_edgar_feed.rs:118-155
Dismiss
Refresh this wiki
Enter email to refresh