This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
US GAAP Concept Transformation
Loading…
US GAAP Concept Transformation
Relevant source files
Purpose and Scope
This page documents the US GAAP concept transformation system, which normalizes raw financial concept names from SEC EDGAR filings into a standardized taxonomy. The core functionality is provided by the distill_us_gaap_fundamental_concepts function, which maps the diverse US GAAP terminology (57+ revenue variations, 6 cost variants, multiple equity representations) into a consistent set of 71 FundamentalConcept enum variants src/enums/fundamental_concept_enum.rs:5-71
For information about fetching US GAAP data from the SEC API, see Data Fetching Functions. For details on the data models that use these concepts, see US GAAP Concept Transformation. For the Python ML pipeline that processes the transformed concepts, see Python narrative_stack System.
System Overview
The transformation system acts as a critical normalization layer between raw SEC EDGAR filings and downstream data processing. Companies report financial data using various US GAAP concept names (e.g., Revenues, SalesRevenueNet, HealthCareOrganizationRevenue), and this system ensures all variations map to consistent concept identifiers.
Data Flow: Natural Language to Code Entity Space
The following diagram bridges the gap between the natural language of financial reporting and the internal code entities used for processing.
Sources: src/enums/fundamental_concept_enum.rs:5-71 src/enums.rs:7-8
graph TB
subgraph "Natural Language Space (SEC Filings)"
RawConcepts["Raw US GAAP Concept Names\n'Revenues'\n'SalesRevenueNet'\n'AssetsCurrent'"]
end
subgraph "Code Entity Space (rust-sec-fetcher)"
DistillFn["distill_us_gaap_fundamental_concepts\n(Function)"]
FCEnum["FundamentalConcept\n(Enum)"]
Assets["FundamentalConcept::Assets"]
CurrentAssets["FundamentalConcept::CurrentAssets"]
Revenues["FundamentalConcept::Revenues"]
end
RawConcepts -->|Input: &str| DistillFn
DistillFn -->|Output: Option<Vec<FC>>| FCEnum
FCEnum --> Assets
FCEnum --> CurrentAssets
FCEnum --> Revenues
The FundamentalConcept Taxonomy
The FundamentalConcept enum defines 71 standardized financial concept variants organized into main categories: Balance Sheet, Income Statement, Cash Flow, and Equity classifications src/enums/fundamental_concept_enum.rs:5-71 Each variant represents a normalized concept that may map from multiple raw US GAAP names.
Sources: src/enums/fundamental_concept_enum.rs:5-71
graph TB
subgraph "FundamentalConcept Enum Variants"
Root["FundamentalConcept\n(71 total variants)"]
end
subgraph "Balance Sheet"
Assets["Assets"]
CurrentAssets["CurrentAssets"]
NoncurrentAssets["NoncurrentAssets"]
Liabilities["Liabilities"]
CurrentLiabilities["CurrentLiabilities"]
NoncurrentLiabilities["NoncurrentLiabilities"]
LiabilitiesAndEquity["LiabilitiesAndEquity"]
end
subgraph "Income Statement"
Revenues["Revenues"]
CostOfRevenue["CostOfRevenue"]
GrossProfit["GrossProfit"]
OperatingExpenses["OperatingExpenses"]
OperatingIncomeLoss["OperatingIncomeLoss"]
NetIncomeLoss["NetIncomeLoss"]
InterestExpenseOperating["InterestExpenseOperating"]
ResearchAndDevelopment["ResearchAndDevelopment"]
end
subgraph "Cash Flow"
NetCashFlow["NetCashFlow"]
NetCashFlowFromOperatingActivities["NetCashFlowFromOperatingActivities"]
NetCashFlowFromInvestingActivities["NetCashFlowFromInvestingActivities"]
NetCashFlowFromFinancingActivities["NetCashFlowFromFinancingActivities"]
end
Root --> Assets
Root --> Liabilities
Root --> Revenues
Root --> NetIncomeLoss
Root --> NetCashFlow
Mapping Pattern Types
The transformation system implements four distinct mapping patterns to handle the diverse ways companies report financial concepts.
Pattern 1: One-to-One Mapping
Simple direct mappings where a single US GAAP concept name maps to exactly one FundamentalConcept variant.
| Raw US GAAP Concept | FundamentalConcept Output |
|---|---|
Assets | vec![Assets] |
Liabilities | vec![Liabilities] |
GrossProfit | vec![GrossProfit] |
OperatingIncomeLoss | vec![OperatingIncomeLoss] |
CommitmentsAndContingencies | vec![CommitmentsAndContingencies] |
Pattern 2: Hierarchical Mapping
Specific concepts map to multiple variants, including both the specific concept and parent categories. This enables queries at different levels of granularity.
| Raw US GAAP Concept | FundamentalConcept Output (Ordered) |
|---|---|
AssetsCurrent | vec![CurrentAssets, Assets] |
StockholdersEquity | vec![EquityAttributableToParent, Equity] |
NetIncomeLoss | vec![NetIncomeLossAttributableToParent, NetIncomeLoss] |
Pattern 3: Synonym Consolidation
Multiple US GAAP concept names that represent the same financial concept are consolidated into a single FundamentalConcept variant. For example, CostOfGoodsAndServicesSold, CostOfServices, and CostOfGoodsSold all map to FundamentalConcept::CostOfRevenue.
Pattern 4: Industry-Specific Revenue Mapping
The system handles dozens of industry-specific revenue variations (e.g., HealthCareOrganizationRevenue, OilAndGasRevenue, ElectricUtilityRevenue), mapping them all to the Revenues concept.
Sources: src/enums/fundamental_concept_enum.rs:5-71
The distill_us_gaap_fundamental_concepts Function
The core transformation function accepts a string representation of a US GAAP concept name and returns an Option<Vec<FundamentalConcept>>. The return type is an Option because not all US GAAP concepts are mapped, and a Vec because some concepts map to multiple standardized variants.
Implementation Logic
The function serves as the primary entry point for the transformation logic. It is utilized by higher-level operations to normalize data before it is stored or used for training.
Sources: src/enums/fundamental_concept_enum.rs:5-71 src/enums.rs:7-8
graph LR
subgraph "Input"
RawStr["&str: 'SalesRevenueNet'"]
end
subgraph "distill_us_gaap_fundamental_concepts"
Match["Pattern Match Engine"]
end
subgraph "Output"
Result["Some(vec![FundamentalConcept::Revenues])"]
end
RawStr --> Match
Match --> Result
Summary
The US GAAP concept transformation system provides:
- Standardization : Maps hundreds of raw US GAAP concept names to 71 standardized
FundamentalConceptvariants src/enums/fundamental_concept_enum.rs:5-71 - Flexibility : Supports four mapping patterns (one-to-one, hierarchical, synonyms, industry-specific) to handle diverse reporting practices.
- Queryability : Hierarchical mappings enable queries at multiple granularity levels (e.g., query for all
Assetsor specificallyCurrentAssets). - Integration : Serves as the critical normalization layer between SEC EDGAR API and downstream data processing/ML pipelines.
Sources: src/enums/fundamental_concept_enum.rs:5-71 src/enums.rs:7-8
Dismiss
Refresh this wiki
Enter email to refresh