Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

US GAAP Concept Transformation

Loading…

US GAAP Concept Transformation

Relevant source files

Purpose and Scope

This page documents the US GAAP concept transformation system, which normalizes raw financial concept names from SEC EDGAR filings into a standardized taxonomy. The core functionality is provided by the distill_us_gaap_fundamental_concepts function, which maps the diverse US GAAP terminology (57+ revenue variations, 6 cost variants, multiple equity representations) into a consistent set of 71 FundamentalConcept enum variants src/enums/fundamental_concept_enum.rs:5-71

For information about fetching US GAAP data from the SEC API, see Data Fetching Functions. For details on the data models that use these concepts, see US GAAP Concept Transformation. For the Python ML pipeline that processes the transformed concepts, see Python narrative_stack System.


System Overview

The transformation system acts as a critical normalization layer between raw SEC EDGAR filings and downstream data processing. Companies report financial data using various US GAAP concept names (e.g., Revenues, SalesRevenueNet, HealthCareOrganizationRevenue), and this system ensures all variations map to consistent concept identifiers.

Data Flow: Natural Language to Code Entity Space

The following diagram bridges the gap between the natural language of financial reporting and the internal code entities used for processing.

Sources: src/enums/fundamental_concept_enum.rs:5-71 src/enums.rs:7-8

graph TB
    subgraph "Natural Language Space (SEC Filings)"
        RawConcepts["Raw US GAAP Concept Names\n'Revenues'\n'SalesRevenueNet'\n'AssetsCurrent'"]
end
    
    subgraph "Code Entity Space (rust-sec-fetcher)"
        DistillFn["distill_us_gaap_fundamental_concepts\n(Function)"]
FCEnum["FundamentalConcept\n(Enum)"]
Assets["FundamentalConcept::Assets"]
CurrentAssets["FundamentalConcept::CurrentAssets"]
Revenues["FundamentalConcept::Revenues"]
end
    
 
   RawConcepts -->|Input: &str| DistillFn
 
   DistillFn -->|Output: Option<Vec<FC>>| FCEnum
 
   FCEnum --> Assets
 
   FCEnum --> CurrentAssets
 
   FCEnum --> Revenues

The FundamentalConcept Taxonomy

The FundamentalConcept enum defines 71 standardized financial concept variants organized into main categories: Balance Sheet, Income Statement, Cash Flow, and Equity classifications src/enums/fundamental_concept_enum.rs:5-71 Each variant represents a normalized concept that may map from multiple raw US GAAP names.

Sources: src/enums/fundamental_concept_enum.rs:5-71

graph TB
    subgraph "FundamentalConcept Enum Variants"
        Root["FundamentalConcept\n(71 total variants)"]
end
    
    subgraph "Balance Sheet"
        Assets["Assets"]
CurrentAssets["CurrentAssets"]
NoncurrentAssets["NoncurrentAssets"]
Liabilities["Liabilities"]
CurrentLiabilities["CurrentLiabilities"]
NoncurrentLiabilities["NoncurrentLiabilities"]
LiabilitiesAndEquity["LiabilitiesAndEquity"]
end
    
    subgraph "Income Statement"
        Revenues["Revenues"]
CostOfRevenue["CostOfRevenue"]
GrossProfit["GrossProfit"]
OperatingExpenses["OperatingExpenses"]
OperatingIncomeLoss["OperatingIncomeLoss"]
NetIncomeLoss["NetIncomeLoss"]
InterestExpenseOperating["InterestExpenseOperating"]
ResearchAndDevelopment["ResearchAndDevelopment"]
end
    
    subgraph "Cash Flow"
        NetCashFlow["NetCashFlow"]
NetCashFlowFromOperatingActivities["NetCashFlowFromOperatingActivities"]
NetCashFlowFromInvestingActivities["NetCashFlowFromInvestingActivities"]
NetCashFlowFromFinancingActivities["NetCashFlowFromFinancingActivities"]
end
    
 
   Root --> Assets
 
   Root --> Liabilities
 
   Root --> Revenues
 
   Root --> NetIncomeLoss
 
   Root --> NetCashFlow

Mapping Pattern Types

The transformation system implements four distinct mapping patterns to handle the diverse ways companies report financial concepts.

Pattern 1: One-to-One Mapping

Simple direct mappings where a single US GAAP concept name maps to exactly one FundamentalConcept variant.

Raw US GAAP ConceptFundamentalConcept Output
Assetsvec![Assets]
Liabilitiesvec![Liabilities]
GrossProfitvec![GrossProfit]
OperatingIncomeLossvec![OperatingIncomeLoss]
CommitmentsAndContingenciesvec![CommitmentsAndContingencies]

Pattern 2: Hierarchical Mapping

Specific concepts map to multiple variants, including both the specific concept and parent categories. This enables queries at different levels of granularity.

Raw US GAAP ConceptFundamentalConcept Output (Ordered)
AssetsCurrentvec![CurrentAssets, Assets]
StockholdersEquityvec![EquityAttributableToParent, Equity]
NetIncomeLossvec![NetIncomeLossAttributableToParent, NetIncomeLoss]

Pattern 3: Synonym Consolidation

Multiple US GAAP concept names that represent the same financial concept are consolidated into a single FundamentalConcept variant. For example, CostOfGoodsAndServicesSold, CostOfServices, and CostOfGoodsSold all map to FundamentalConcept::CostOfRevenue.

Pattern 4: Industry-Specific Revenue Mapping

The system handles dozens of industry-specific revenue variations (e.g., HealthCareOrganizationRevenue, OilAndGasRevenue, ElectricUtilityRevenue), mapping them all to the Revenues concept.

Sources: src/enums/fundamental_concept_enum.rs:5-71


The distill_us_gaap_fundamental_concepts Function

The core transformation function accepts a string representation of a US GAAP concept name and returns an Option<Vec<FundamentalConcept>>. The return type is an Option because not all US GAAP concepts are mapped, and a Vec because some concepts map to multiple standardized variants.

Implementation Logic

The function serves as the primary entry point for the transformation logic. It is utilized by higher-level operations to normalize data before it is stored or used for training.

Sources: src/enums/fundamental_concept_enum.rs:5-71 src/enums.rs:7-8

graph LR
    subgraph "Input"
        RawStr["&str: 'SalesRevenueNet'"]
end
    
    subgraph "distill_us_gaap_fundamental_concepts"
        Match["Pattern Match Engine"]
end
    
    subgraph "Output"
        Result["Some(vec![FundamentalConcept::Revenues])"]
end
    
 
   RawStr --> Match
 
   Match --> Result

Summary

The US GAAP concept transformation system provides:

  1. Standardization : Maps hundreds of raw US GAAP concept names to 71 standardized FundamentalConcept variants src/enums/fundamental_concept_enum.rs:5-71
  2. Flexibility : Supports four mapping patterns (one-to-one, hierarchical, synonyms, industry-specific) to handle diverse reporting practices.
  3. Queryability : Hierarchical mappings enable queries at multiple granularity levels (e.g., query for all Assets or specifically CurrentAssets).
  4. Integration : Serves as the critical normalization layer between SEC EDGAR API and downstream data processing/ML pipelines.

Sources: src/enums/fundamental_concept_enum.rs:5-71 src/enums.rs:7-8

Dismiss

Refresh this wiki

Enter email to refresh