Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

GitHub

This documentation is part of the "Projects with Books" initiative at zenOSmosis.

The source code for this project is available on GitHub.

Caching & Storage System

Loading…

Caching & Storage System

Relevant source files

Purpose and Scope

This document describes the caching and storage infrastructure used by the Rust sec-fetcher application to minimize redundant API requests and improve performance. The system implements a two-tier caching architecture with persistent storage backed by simd-r-drive.

The caching system is designed to be isolated per ConfigManager instance, ensuring that different environments (e.g., production vs. unit tests) do not suffer from cross-test cache pollution.

Overview

The caching system provides two distinct cache layers managed by the Caches struct:

  1. HTTP Cache : Stores raw HTTP responses from SEC EDGAR API requests to avoid re-downloading immutable filing data.
  2. Preprocessor Cache : Stores transformed and processed data structures (e.g., mapping tables, calculated values, or TTL-based metadata).

Both caches use the simd-r-drive key-value storage backend with persistent file-based storage.

Sources: src/caches.rs:1-14 src/caches.rs:25-51


Caching Architecture

The following diagram illustrates the caching architecture and its integration with the configuration and network layers:

Sources: src/caches.rs:11-14 src/caches.rs:29-51 src/network/fetch_investment_company_series_and_class_dataset.rs:43-46

graph TB
    subgraph "Initialization Space"
        ConfigMgr["ConfigManager"]
CachesStruct["Caches Struct"]
OpenFn["Caches::open(base_path)"]
end
    
    subgraph "Code Entity Space: Caches Module"
        HTTP_DS["http_cache: Arc<DataStore>"]
PRE_DS["preprocessor_cache: Arc<DataStore>"]
end
    
    subgraph "File System (On-Disk)"
        HTTP_File["http_storage_cache.bin"]
PRE_File["preprocessor_cache.bin"]
end
    
    subgraph "Network Integration"
        SecClient["SecClient"]
FetchInv["fetch_investment_company_..."]
end
    
 
   ConfigMgr -->|provides path| OpenFn
 
   OpenFn -->|instantiates| CachesStruct
 
   CachesStruct --> HTTP_DS
 
   CachesStruct --> PRE_DS
    
 
   HTTP_DS -->|persists to| HTTP_File
 
   PRE_DS -->|persists to| PRE_File
    
 
   SecClient -->|uses| HTTP_DS
 
   FetchInv -->|uses| PRE_DS

Implementation Details

The Caches Struct

Unlike previous versions that used global OnceLock statics, the current implementation encapsulates the storage logic within the Caches struct. This allows for better dependency injection and testing isolation.

MethodDescription
open(base: &Path)Creates the directory if missing and opens two DataStore files: http_storage_cache.bin and preprocessor_cache.bin.
get_http_cache_store()Returns an Arc<DataStore> for the HTTP response cache.
get_preprocessor_cache()Returns an Arc<DataStore> for the preprocessor/metadata cache.

Sources: src/caches.rs:25-59

CacheNamespacePrefix

To prevent key collisions within a single DataStore, the system utilizes CacheNamespacePrefix. This enum provides distinct prefixes for different types of cached data, which are then hashed using simd_r_drive::utils::NamespaceHasher.

Common namespaces include:

  • LatestFundsYear: Used to track the most recent available year for investment company datasets.

Sources: src/network/fetch_investment_company_series_and_class_dataset.rs:11-15 src/network/fetch_investment_company_series_and_class_dataset.rs:47-48


Preprocessor Cache Usage

The preprocessor cache is used for logic that requires persistence but isn’t a direct 1:1 mapping of an HTTP response. A primary example is the “Year-Fallback Logic” used when fetching investment company datasets.

sequenceDiagram
    participant App as Fetch Logic
    participant PreCache as Preprocessor Cache
    participant SEC as SEC EDGAR API
    
    App->>PreCache: read_with_ttl(Namespace: LatestFundsYear)
    alt Cache Hit
        PreCache-->>App: Return cached year (e.g., 2024)
    else Cache Miss
        App->>App: Default to Utc::now().year()
    end
    
    loop Fallback Logic
        App->>SEC: GET Dataset for Year
        alt 200 OK
            SEC-->>App: CSV Data
            App->>PreCache: write_with_ttl(year, TTL: 1 week)
            Note over App: Break Loop\nelse 404 Not Found
            App->>App: decrement year
        end
    end

Data Flow: Investment Company Dataset Fetching

Implementation Details:

Sources: src/network/fetch_investment_company_series_and_class_dataset.rs:46-73


HTTP Cache & SecClient

The SecClient utilizes the http_cache provided by the Caches struct. This integration typically happens during the construction of the SecClient via the ConfigManager.

Storage Characteristics (simd-r-drive)

The underlying storage provided by simd-r-drive offers:

  • High Performance : Optimized for fast key-value lookups.
  • Atomic Operations : Ensures data integrity during writes.
  • Simplicity : Single-file binary format (.bin) per store.

Sources: src/caches.rs:31-46


Integration Summary

ComponentRoleFile Reference
CachesOwner of DataStore handlessrc/caches.rs:11-14
simd_r_drive::DataStoreLow-level storage enginesrc/caches.rs1
NamespaceHasherScopes keys within a DataStoresrc/network/fetch_investment_company_series_and_class_dataset.rs:11-15
StorageCacheExtProvides read_with_ttl and write_with_ttlsrc/network/fetch_investment_company_series_and_class_dataset.rs7

Sources: src/caches.rs:1-60 src/network/fetch_investment_company_series_and_class_dataset.rs:1-73

Dismiss

Refresh this wiki

Enter email to refresh