This documentation is part of the "Projects with Books" initiative at zenOSmosis.
The source code for this project is available on GitHub.
Caching & Storage System
Loading…
Caching & Storage System
Relevant source files
Purpose and Scope
This document describes the caching and storage infrastructure used by the Rust sec-fetcher application to minimize redundant API requests and improve performance. The system implements a two-tier caching architecture with persistent storage backed by simd-r-drive.
The caching system is designed to be isolated per ConfigManager instance, ensuring that different environments (e.g., production vs. unit tests) do not suffer from cross-test cache pollution.
Overview
The caching system provides two distinct cache layers managed by the Caches struct:
- HTTP Cache : Stores raw HTTP responses from SEC EDGAR API requests to avoid re-downloading immutable filing data.
- Preprocessor Cache : Stores transformed and processed data structures (e.g., mapping tables, calculated values, or TTL-based metadata).
Both caches use the simd-r-drive key-value storage backend with persistent file-based storage.
Sources: src/caches.rs:1-14 src/caches.rs:25-51
Caching Architecture
The following diagram illustrates the caching architecture and its integration with the configuration and network layers:
Sources: src/caches.rs:11-14 src/caches.rs:29-51 src/network/fetch_investment_company_series_and_class_dataset.rs:43-46
graph TB
subgraph "Initialization Space"
ConfigMgr["ConfigManager"]
CachesStruct["Caches Struct"]
OpenFn["Caches::open(base_path)"]
end
subgraph "Code Entity Space: Caches Module"
HTTP_DS["http_cache: Arc<DataStore>"]
PRE_DS["preprocessor_cache: Arc<DataStore>"]
end
subgraph "File System (On-Disk)"
HTTP_File["http_storage_cache.bin"]
PRE_File["preprocessor_cache.bin"]
end
subgraph "Network Integration"
SecClient["SecClient"]
FetchInv["fetch_investment_company_..."]
end
ConfigMgr -->|provides path| OpenFn
OpenFn -->|instantiates| CachesStruct
CachesStruct --> HTTP_DS
CachesStruct --> PRE_DS
HTTP_DS -->|persists to| HTTP_File
PRE_DS -->|persists to| PRE_File
SecClient -->|uses| HTTP_DS
FetchInv -->|uses| PRE_DS
Implementation Details
The Caches Struct
Unlike previous versions that used global OnceLock statics, the current implementation encapsulates the storage logic within the Caches struct. This allows for better dependency injection and testing isolation.
| Method | Description |
|---|---|
open(base: &Path) | Creates the directory if missing and opens two DataStore files: http_storage_cache.bin and preprocessor_cache.bin. |
get_http_cache_store() | Returns an Arc<DataStore> for the HTTP response cache. |
get_preprocessor_cache() | Returns an Arc<DataStore> for the preprocessor/metadata cache. |
Sources: src/caches.rs:25-59
CacheNamespacePrefix
To prevent key collisions within a single DataStore, the system utilizes CacheNamespacePrefix. This enum provides distinct prefixes for different types of cached data, which are then hashed using simd_r_drive::utils::NamespaceHasher.
Common namespaces include:
LatestFundsYear: Used to track the most recent available year for investment company datasets.
Sources: src/network/fetch_investment_company_series_and_class_dataset.rs:11-15 src/network/fetch_investment_company_series_and_class_dataset.rs:47-48
Preprocessor Cache Usage
The preprocessor cache is used for logic that requires persistence but isn’t a direct 1:1 mapping of an HTTP response. A primary example is the “Year-Fallback Logic” used when fetching investment company datasets.
sequenceDiagram
participant App as Fetch Logic
participant PreCache as Preprocessor Cache
participant SEC as SEC EDGAR API
App->>PreCache: read_with_ttl(Namespace: LatestFundsYear)
alt Cache Hit
PreCache-->>App: Return cached year (e.g., 2024)
else Cache Miss
App->>App: Default to Utc::now().year()
end
loop Fallback Logic
App->>SEC: GET Dataset for Year
alt 200 OK
SEC-->>App: CSV Data
App->>PreCache: write_with_ttl(year, TTL: 1 week)
Note over App: Break Loop\nelse 404 Not Found
App->>App: decrement year
end
end
Data Flow: Investment Company Dataset Fetching
Implementation Details:
- Function :
fetch_investment_company_series_and_class_datasetsrc/network/fetch_investment_company_series_and_class_dataset.rs:43-80 - Namespace :
CacheNamespacePrefix::LatestFundsYearsrc/network/fetch_investment_company_series_and_class_dataset.rs:11-15 - TTL : Hardcoded to 1 week (604,800 seconds) for the fallback year metadata src/network/fetch_investment_company_series_and_class_dataset.rs71
Sources: src/network/fetch_investment_company_series_and_class_dataset.rs:46-73
HTTP Cache & SecClient
The SecClient utilizes the http_cache provided by the Caches struct. This integration typically happens during the construction of the SecClient via the ConfigManager.
Storage Characteristics (simd-r-drive)
The underlying storage provided by simd-r-drive offers:
- High Performance : Optimized for fast key-value lookups.
- Atomic Operations : Ensures data integrity during writes.
- Simplicity : Single-file binary format (
.bin) per store.
Sources: src/caches.rs:31-46
Integration Summary
| Component | Role | File Reference |
|---|---|---|
Caches | Owner of DataStore handles | src/caches.rs:11-14 |
simd_r_drive::DataStore | Low-level storage engine | src/caches.rs1 |
NamespaceHasher | Scopes keys within a DataStore | src/network/fetch_investment_company_series_and_class_dataset.rs:11-15 |
StorageCacheExt | Provides read_with_ttl and write_with_ttl | src/network/fetch_investment_company_series_and_class_dataset.rs7 |
Sources: src/caches.rs:1-60 src/network/fetch_investment_company_series_and_class_dataset.rs:1-73
Dismiss
Refresh this wiki
Enter email to refresh