Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

CLI Binaries

Loading…

CLI Binaries

Relevant source files

This page documents the standalone binary programs provided by the rust-sec-fetcher crate. These tools are located in src/bin/ and serve specialized purposes ranging from test fixture maintenance and enum validation to bulk data extraction for machine learning pipelines.

1. refresh-test-fixtures

The refresh-test-fixtures utility automates the retrieval of real SEC EDGAR data to serve as test fixtures. It ensures that integration tests operate against authentic, version-controlled data without requiring live network access during test execution.

Purpose and Usage

This binary should be run whenever new test cases are added or when existing fixtures need to be updated to reflect modern EDGAR schema changes (e.g., the 2023 change in 13F value reporting).

Implementation Details

The program iterates through a hardcoded manifest of Fixture structs src/bin/refresh_test_fixtures.rs:55-63 Each fixture defines a TickerSymbol, an output filename, and a FixtureKind which determines the specific SEC endpoint to hit.

Key Components:

Fixture Generation Flow

Sources: src/bin/refresh_test_fixtures.rs:90-173 src/bin/refresh_test_fixtures.rs:178-240


2. check-form-type-coverage

This binary validates the completeness of the FormType enum against actual data in the EDGAR Master Index. It performs both “Forward” and “Reverse” coverage checks.

Coverage Logic

  1. Forward Check : Ensures every variant defined in the FormType enum (that isn’t marked as retired) actually appears in recent SEC filings src/bin/check_form_type_coverage.rs:16-19
  2. Reverse Check : Identifies any form types appearing frequently in the most recent quarter (above MINIMUM_FILINGS_THRESHOLD) that are not currently represented in the enum src/bin/check_form_type_coverage.rs:20-22

Usage

Technical Implementation

The program calculates the last_completed_quarter src/bin/check_form_type_coverage.rs:51-61 and scans backwards up to MAX_LOOKBACK_QUARTERS (default 8) src/bin/check_form_type_coverage.rs34 It uses fetch_edgar_master_index to retrieve the list of all filings for those periods src/bin/check_form_type_coverage.rs110

Sources: src/bin/check_form_type_coverage.rs:1-40 src/bin/check_form_type_coverage.rs:72-146


3. pull-us-gaap-bulk

The pull-us-gaap-bulk binary is the primary data ingestion tool for the narrative_stack ML pipeline. It performs a massive sweep of all primary-listed companies and extracts their XBRL fundamentals.

Purpose and Usage

It fetches CompanyFacts for every ticker and flattens the complex JSON structure into a tabular CSV format suitable for training autoencoders or dimensionality reduction models.

Data Flow and Constraints

Sources: src/bin/pulls/us_gaap_bulk.rs:1-33 src/bin/pulls/us_gaap_bulk.rs:45-95


4. pull-fund-holdings

This binary targets the investment management domain, specifically fetching N-PORT holdings for all registered investment companies (ETFs, Mutual Funds).

Purpose and Usage

It iterates through the SEC’s investment company dataset, finds the latest N-PORT-P (monthly portfolio holdings) filing for each fund, and exports the holdings to CSV.

Implementation Logic

  1. Dataset Retrieval : Calls fetch_investment_company_series_and_class_dataset to get the master list of funds src/bin/pulls/fund_holdings.rs:74-75
  2. Filing Discovery : For each fund ticker, it resolves the CIK and calls fetch_nport_filings to find the most recent submission src/bin/pulls/fund_holdings.rs:96-114
  3. Holdings Extraction : It fetches the specific N-PORT XML, parses the investment table, and normalizes the data src/bin/pulls/fund_holdings.rs:126-134
  4. Partitioned Storage : To avoid directories with tens of thousands of files, it organizes output by the first letter of the ticker (e.g., data/fund-holdings/S/SPY.csv) src/bin/pulls/fund_holdings.rs:137-155

Fund Processing Pipeline

Sources: src/bin/pulls/fund_holdings.rs:1-38 src/bin/pulls/fund_holdings.rs:74-157

Dismiss

Refresh this wiki

Enter email to refresh