Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Data Models & Enumerations

Loading…

Data Models & Enumerations

Relevant source files

Purpose and Scope

This page documents the core data structures and enumerations used throughout the rust-sec-fetcher application. These models represent SEC financial data, including company identifiers, filing metadata, investment holdings, and financial concepts. The data models are defined across the src/models/ directory and centralized in src/models.rs:1-18 while enumerations are managed in src/enums.rs:1-15

Sources: src/models.rs:1-18 src/enums.rs:1-15


SEC Identifier Models

The system uses three primary identifier types to reference companies and filings within the SEC EDGAR system.

Ticker

The Ticker struct represents a company’s stock ticker symbol along with its SEC identifiers. It is the primary structure for mapping human-readable symbols to regulatory keys.

Structure:

Fuzzy Matching: The Ticker model includes a sophisticated fuzzy matching engine in get_by_fuzzy_matched_name src/models/ticker.rs:38-136 It uses tokenization, SIMD-accelerated cleaning src/models/ticker.rs:148-204 and weighted scoring (e.g., EXACT_MATCH_BOOST, PREFERRED_STOCK_PENALTY) to resolve company names to CIKs src/models/ticker.rs:27-33

Sources: src/models/ticker.rs:19-35 src/models/ticker.rs:38-136

Cik (Central Index Key)

The Cik struct represents a 10-digit SEC identifier that uniquely identifies a company or entity. CIKs are permanent and never reused src/models/cik.rs:11-36

Structure:

Key Characteristics:

  • Formatting: Always zero-padded to 10 digits when displayed (e.g., 320193"0000320193") src/models/cik.rs:66-69
  • Resolution: get_company_cik_by_ticker_symbol handles the logic of resolving derived instruments (warrants, units) back to their parent registrant’s CIK src/models/cik.rs:143-167

Sources: src/models/cik.rs:37-40 src/models/cik.rs:143-167

AccessionNumber

The AccessionNumber struct represents a unique identifier for SEC filings. Each accession number is exactly 18 digits and encodes the filer’s CIK, filing year, and sequence number.

Format: XXXXXXXXXX-YY-NNNNNN src/models/accession_number.rs:11-14

Key Methods:

Sources: src/models/accession_number.rs:35-187

SEC Identifier Relationships

Sources: src/models/ticker.rs:20-25 src/models/cik.rs:143-167 src/models/accession_number.rs:35-40


Filing Data Structures

NportInvestment

The NportInvestment struct represents a single investment holding from an NPORT-P filing. It includes both raw data from the SEC and “mapped” fields enriched by the fetcher.

Key Fields:

Sources: src/models/nport_investment.rs:9-41

ThirteenfHolding

The ThirteenfHolding struct represents a row in a Form 13F-HR information table. Unlike raw XML data, these fields are stored in normalized form src/models/thirteenf_holding.rs:4-9

Sources: src/models/thirteenf_holding.rs:10-33

InvestmentCompany

Represents mutual funds and ETFs. It is primarily used to resolve tickers that do not appear in the standard operating company list src/models/investment_company.rs:6-49

Sources: src/models/investment_company.rs:52-67 src/network/fetch_cik_by_ticker_symbol.rs:67-69


Enumerations

FundamentalConcept

The FundamentalConcept enum defines 64 standardized financial concepts (e.g., Assets, NetIncomeLoss, Revenues). It is the backbone of the US GAAP transformation pipeline src/enums/fundamental_concept_enum.rs:1-72

FormType

The FormType enum covers SEC forms explicitly handled by the library, such as TenK (“10-K”), EightK (“8-K”), and Sc13G (“SCHEDULE 13G”) src/enums/form_type_enum.rs:65-200 It uses strum for case-insensitive parsing and provides the canonical EDGAR string via as_edgar_str src/enums/form_type_enum.rs:56-63

CacheNamespacePrefix

Defines the organizational structure of the simd-r-drive cache.

  • CompanyTickerFuzzyMatch: Used to cache expensive fuzzy matching results src/models/ticker.rs15
  • CompanyTickers: Used for the raw ticker dataset.

Url

A centralized registry of SEC EDGAR endpoints, such as CompanyTickersJson and CompanyTickersTxt src/network/fetch_company_tickers.rs:62-73

TickerOrigin

Distinguishes between PrimaryListing (from company_tickers.json) and DerivedInstrument (from ticker.txt, including warrants and preferreds) src/network/fetch_company_tickers.rs:22-32


graph LR
    subgraph Input["Natural Language Space"]
Query["'Apple' or 'AAPL'"]
end

    subgraph Logic["Code Entity Space"]
SClient["SecClient"]
FCT["fetch_company_tickers"]
T_Fuzzy["Ticker::get_by_fuzzy_matched_name"]
C_Lookup["Cik::get_company_cik_by_ticker_symbol"]
subgraph Models["Data Models"]
M_Ticker["Ticker"]
M_Cik["Cik"]
M_Origin["TickerOrigin"]
end
    end

 
   Query --> T_Fuzzy
 
   SClient --> FCT
 
   FCT --> M_Ticker
 
   T_Fuzzy --> M_Ticker
 
   M_Ticker --> M_Origin
 
   M_Ticker --> C_Lookup
 
   C_Lookup --> M_Cik

Data Flow & Relationships

The following diagram bridges the natural language concepts of “Searching for a Company” to the specific code entities involved.

Sources: src/network/fetch_company_tickers.rs:58-65 src/models/ticker.rs:38-42 src/models/cik.rs:143-146 examples/fuzzy_match_company.rs:35-75

Implementation Details: Precision & Normalization

The system prioritizes financial accuracy by using specialized types:

Sources: src/models/nport_investment.rs:2-35 src/models/thirteenf_holding.rs:1-32

Dismiss

Refresh this wiki

Enter email to refresh