This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Data Models & Enumerations
Loading…
Data Models & Enumerations
Relevant source files
- examples/fuzzy_match_company.rs
- src/config/config_manager.rs
- src/config/credential_manager.rs
- src/enums.rs
- src/enums/form_type_enum.rs
- src/enums/fundamental_concept_enum.rs
- src/models.rs
- src/models/accession_number.rs
- src/models/cik.rs
- src/models/investment_company.rs
- src/models/nport_investment.rs
- src/models/thirteenf_holding.rs
- src/models/ticker.rs
- src/network/fetch_cik_by_ticker_symbol.rs
- src/network/fetch_company_tickers.rs
Purpose and Scope
This page documents the core data structures and enumerations used throughout the rust-sec-fetcher application. These models represent SEC financial data, including company identifiers, filing metadata, investment holdings, and financial concepts. The data models are defined across the src/models/ directory and centralized in src/models.rs:1-18 while enumerations are managed in src/enums.rs:1-15
Sources: src/models.rs:1-18 src/enums.rs:1-15
SEC Identifier Models
The system uses three primary identifier types to reference companies and filings within the SEC EDGAR system.
Ticker
The Ticker struct represents a company’s stock ticker symbol along with its SEC identifiers. It is the primary structure for mapping human-readable symbols to regulatory keys.
Structure:
cik: Cik- The company’s Central Index Key src/models/ticker.rs21symbol: TickerSymbol- Stock ticker symbol (e.g., “AAPL”) src/models/ticker.rs22company_name: String- Full company name src/models/ticker.rs23origin: TickerOrigin- Source of the ticker data (Primary vs Derived) src/models/ticker.rs24
Fuzzy Matching: The Ticker model includes a sophisticated fuzzy matching engine in get_by_fuzzy_matched_name src/models/ticker.rs:38-136 It uses tokenization, SIMD-accelerated cleaning src/models/ticker.rs:148-204 and weighted scoring (e.g., EXACT_MATCH_BOOST, PREFERRED_STOCK_PENALTY) to resolve company names to CIKs src/models/ticker.rs:27-33
Sources: src/models/ticker.rs:19-35 src/models/ticker.rs:38-136
Cik (Central Index Key)
The Cik struct represents a 10-digit SEC identifier that uniquely identifies a company or entity. CIKs are permanent and never reused src/models/cik.rs:11-36
Structure:
value: u64- The numeric CIK value src/models/cik.rs39
Key Characteristics:
- Formatting: Always zero-padded to 10 digits when displayed (e.g.,
320193→"0000320193") src/models/cik.rs:66-69 - Resolution:
get_company_cik_by_ticker_symbolhandles the logic of resolving derived instruments (warrants, units) back to their parent registrant’s CIK src/models/cik.rs:143-167
Sources: src/models/cik.rs:37-40 src/models/cik.rs:143-167
AccessionNumber
The AccessionNumber struct represents a unique identifier for SEC filings. Each accession number is exactly 18 digits and encodes the filer’s CIK, filing year, and sequence number.
Format: XXXXXXXXXX-YY-NNNNNN src/models/accession_number.rs:11-14
Key Methods:
from_str(accession_str: &str)- Parses from string, handling both dashed and plain formats src/models/accession_number.rs:80-112to_string()- Returns the canonical dash-separated format src/models/accession_number.rs:179-181
Sources: src/models/accession_number.rs:35-187
SEC Identifier Relationships
Sources: src/models/ticker.rs:20-25 src/models/cik.rs:143-167 src/models/accession_number.rs:35-40
Filing Data Structures
NportInvestment
The NportInvestment struct represents a single investment holding from an NPORT-P filing. It includes both raw data from the SEC and “mapped” fields enriched by the fetcher.
Key Fields:
- Mapped Data:
mapped_ticker_symbol,mapped_company_name,mapped_company_cik_numbersrc/models/nport_investment.rs:14-16 - Identifiers:
name,lei,cusip,isinsrc/models/nport_investment.rs:18-22 - Financials:
balance,val_usd, andpct_val(stored as a normalizedPcttype) src/models/nport_investment.rs:24-35
Sources: src/models/nport_investment.rs:9-41
ThirteenfHolding
The ThirteenfHolding struct represents a row in a Form 13F-HR information table. Unlike raw XML data, these fields are stored in normalized form src/models/thirteenf_holding.rs:4-9
value_usd: Normalized to actual dollars (correcting pre-2023 “thousands” reporting) src/models/thirteenf_holding.rs:17-21weight_pct: Portfolio weight on a 0–100 scale src/models/thirteenf_holding.rs:30-32
Sources: src/models/thirteenf_holding.rs:10-33
InvestmentCompany
Represents mutual funds and ETFs. It is primarily used to resolve tickers that do not appear in the standard operating company list src/models/investment_company.rs:6-49
get_fund_cik_by_ticker_symbol: Specifically searches the series/class dataset for fund CIKs src/models/investment_company.rs:52-67
Sources: src/models/investment_company.rs:52-67 src/network/fetch_cik_by_ticker_symbol.rs:67-69
Enumerations
FundamentalConcept
The FundamentalConcept enum defines 64 standardized financial concepts (e.g., Assets, NetIncomeLoss, Revenues). It is the backbone of the US GAAP transformation pipeline src/enums/fundamental_concept_enum.rs:1-72
FormType
The FormType enum covers SEC forms explicitly handled by the library, such as TenK (“10-K”), EightK (“8-K”), and Sc13G (“SCHEDULE 13G”) src/enums/form_type_enum.rs:65-200 It uses strum for case-insensitive parsing and provides the canonical EDGAR string via as_edgar_str src/enums/form_type_enum.rs:56-63
CacheNamespacePrefix
Defines the organizational structure of the simd-r-drive cache.
CompanyTickerFuzzyMatch: Used to cache expensive fuzzy matching results src/models/ticker.rs15CompanyTickers: Used for the raw ticker dataset.
Url
A centralized registry of SEC EDGAR endpoints, such as CompanyTickersJson and CompanyTickersTxt src/network/fetch_company_tickers.rs:62-73
TickerOrigin
Distinguishes between PrimaryListing (from company_tickers.json) and DerivedInstrument (from ticker.txt, including warrants and preferreds) src/network/fetch_company_tickers.rs:22-32
graph LR
subgraph Input["Natural Language Space"]
Query["'Apple' or 'AAPL'"]
end
subgraph Logic["Code Entity Space"]
SClient["SecClient"]
FCT["fetch_company_tickers"]
T_Fuzzy["Ticker::get_by_fuzzy_matched_name"]
C_Lookup["Cik::get_company_cik_by_ticker_symbol"]
subgraph Models["Data Models"]
M_Ticker["Ticker"]
M_Cik["Cik"]
M_Origin["TickerOrigin"]
end
end
Query --> T_Fuzzy
SClient --> FCT
FCT --> M_Ticker
T_Fuzzy --> M_Ticker
M_Ticker --> M_Origin
M_Ticker --> C_Lookup
C_Lookup --> M_Cik
Data Flow & Relationships
The following diagram bridges the natural language concepts of “Searching for a Company” to the specific code entities involved.
Sources: src/network/fetch_company_tickers.rs:58-65 src/models/ticker.rs:38-42 src/models/cik.rs:143-146 examples/fuzzy_match_company.rs:35-75
Implementation Details: Precision & Normalization
The system prioritizes financial accuracy by using specialized types:
rust_decimal::Decimal: Used for all currency and balance fields to avoid floating-point errors src/models/nport_investment.rs:25-30Pct: A custom wrapper for percentage values (0-100 scale) used in portfolio weighting src/models/nport_investment.rs35 src/models/thirteenf_holding.rs32
Sources: src/models/nport_investment.rs:2-35 src/models/thirteenf_holding.rs:1-32
Dismiss
Refresh this wiki
Enter email to refresh