Executive Summary
This document describes an optimized approach for retrieving time series event counter data stored in multiple granularities (5-second, 1-minute, 1-hour, 1-day, 1-week, 1-month) when querying with timezone-specific ranges. The solution minimizes data retrieval overhead while ensuring precise boundary coverage through intelligent granularity selection.
Problem Statement
Time series data is stored in UTC-aligned segments across six different granularities:
- 5 seconds: Ultra-fine resolution for precise measurements
- 1 minute: Fine resolution for short-term analysis
- 1 hour: Standard resolution for medium-term analysis
- 1 day: Coarse resolution for long-term trends
- 1 week: Weekly aggregations
- 1 month: Monthly aggregations
Challenge: When users query data using local timezone ranges, we need to:
- Convert timezone-specific queries to UTC
- Map to appropriate storage segments
- Minimize the number of segments retrieved
- Ensure precise boundary coverage without data gaps
Solution Overview
The Mixed Granularity Optimization approach uses different granularities strategically:
- Finest granularity (5-second) only for partial segments at boundaries
- Medium granularity (1-minute/1-hour) to fill gaps efficiently
- Coarsest appropriate granularity for the bulk of the range
Detailed Example Analysis
Input Query
Range: 21-10-2024T19:00:25 IST
to 23-10-2024T15:00:30 IST
Duration: 44 hours, 0 minutes, 5 seconds
Step 1: Timezone Conversion
Timezone | Start Time | End Time |
---|---|---|
IST (Input) | 21-10-2024T19:00:25 | 23-10-2024T15:00:30 |
UTC (Storage) | 21-10-2024T13:30:25 | 23-10-2024T09:30:30 |
IST = UTC + 5:30, so we subtract 5:30 to convert to UTC
Step 2: Granularity Selection Strategy
For a 44-hour range, the optimal primary granularity is 1-hour segments. However, we need mixed granularities for precise boundary handling:
UTC Range: 13:30:25 ──────────────────────────── 09:30:30
↓ ↓
Segments: [5s][1m][────── 1h segments ──────][1m][5s]
Step 3: Segment Breakdown
Phase 1: Start Boundary Precision (13:30:25 → 13:31:00)
Granularity: 5-second segments for sub-minute precision
Segment Timestamp | Coverage |
---|---|
2024-10-21T13:30:25Z | 25-30 seconds |
2024-10-21T13:30:30Z | 30-35 seconds |
2024-10-21T13:30:35Z | 35-40 seconds |
2024-10-21T13:30:40Z | 40-45 seconds |
2024-10-21T13:30:45Z | 45-50 seconds |
2024-10-21T13:30:50Z | 50-55 seconds |
2024-10-21T13:30:55Z | 55-60 seconds |
Total: 7 five-second segments (35 seconds coverage)
Phase 2: Hour Completion (13:31:00 → 14:00:00)
Granularity: 1-minute segments to reach hour boundary
Time Range | Segment Count |
---|---|
13:31:00 → 14:00:00 | 29 one-minute segments |
Phase 3: Bulk Data Retrieval (14:00:00 → 09:00:00)
Granularity: 1-hour segments for maximum efficiency
Day | Hour Segments | Time Range |
---|---|---|
Oct 21 | 10 segments | 14:00 → 23:59 |
Oct 22 | 24 segments | 00:00 → 23:59 |
Oct 23 | 9 segments | 00:00 → 08:59 |
Total: 43 one-hour segments (43 hours coverage)
Phase 4: End Boundary Approach (09:00:00 → 09:30:00)
Granularity: 1-minute segments for sub-hour precision
Time Range | Segment Count |
---|---|
09:00:00 → 09:30:00 | 30 one-minute segments |
Phase 5: End Boundary Precision (09:30:00 → 09:30:30)
Granularity: 5-second segments for sub-minute precision
Segment Timestamp | Coverage |
---|---|
2024-10-23T09:30:00Z | 00-05 seconds |
2024-10-23T09:30:05Z | 05-10 seconds |
2024-10-23T09:30:10Z | 10-15 seconds |
2024-10-23T09:30:15Z | 15-20 seconds |
2024-10-23T09:30:20Z | 20-25 seconds |
2024-10-23T09:30:25Z | 25-30 seconds |
Total: 6 five-second segments (30 seconds coverage)
Step 4: Final Segment Summary
Granularity | Segment Count | Total Duration | Usage Purpose |
---|---|---|---|
5-second | 13 | 65 seconds | Boundary precision |
1-minute | 59 | 59 minutes | Gap filling |
1-hour | 43 | 43 hours | Bulk retrieval |
TOTAL | 115 | 44:00:05 | Complete coverage |
Efficiency Analysis
Comparison with Alternative Approaches
Strategy | Total Segments | Efficiency | Precision |
---|---|---|---|
All 5-second | 31,685 | Very Poor | Perfect |
All 1-minute | 2,641 | Poor | Good |
All 1-hour | 46 | Good | Poor |
Mixed Granularity | 115 | Excellent | Perfect |
Performance Benefits
- Data Transfer Reduction: 99.6% fewer segments than all-5-second approach
- Storage I/O Optimization: Bulk reads for majority of data
- Memory Efficiency: Fewer objects to process and aggregate
- Network Efficiency: Fewer database queries/API calls
- Processing Speed: Less data parsing and aggregation overhead
Implementation Considerations
Algorithm Complexity
- Time Complexity: O(n) where n is the number of segments
- Space Complexity: O(n) for segment list storage
- Preprocessing: Constant time timezone conversion
Edge Cases Handled
- Daylight Saving Time Transitions: UTC storage eliminates DST complexity
- Month Boundary Variations: Proper handling of different month lengths
- Leap Seconds: UTC-based segments handle leap second adjustments
- Sub-Second Precision: 5-second granularity provides adequate precision
- Cross-Year Queries: Year boundaries handled seamlessly
Error Scenarios
Scenario | Handling Strategy |
---|---|
Invalid timezone | Reject query with clear error message |
Future date ranges | Allow but warn about potential data gaps |
Extremely long ranges | Auto-upgrade to coarser granularities |
Storage unavailability | Graceful degradation to available granularities |
Technical Architecture
Data Storage Schema
time_series_5s/YYYY/MM/DD/HH/mm_ss.parquet
time_series_1m/YYYY/MM/DD/HH/mm.parquet
time_series_1h/YYYY/MM/DD/HH.parquet
time_series_1d/YYYY/MM/DD.parquet
time_series_1w/YYYY/WW.parquet
time_series_1mo/YYYY/MM.parquet
Query Optimization Pipeline
- Parse user input (timestamp + timezone)
- Convert to UTC boundaries
- Analyze range duration for primary granularity
- Generate mixed granularity segment list
- Parallelize data retrieval across granularities
- Aggregate and merge results
- Convert back to user’s timezone for response
Monitoring and Metrics
Key Performance Indicators
- Segment Retrieval Count: Average segments per query
- Data Transfer Volume: Bytes transferred per time unit queried
- Query Response Time: End-to-end latency
- Cache Hit Rate: Percentage of segments served from cache
- Granularity Distribution: Usage patterns across different granularities
Performance Targets
- Sub-second response for ranges < 1 day
- < 5 second response for ranges < 1 week
- < 30 second response for ranges < 1 month
- 95% cache hit rate for frequently accessed recent data
- < 1000 segments for any single query
Conclusion
The Mixed Granularity Optimization approach provides an optimal balance between precision and performance for time series data retrieval. By intelligently selecting granularities based on the specific requirements of each portion of the query range, we achieve:
- Perfect precision at query boundaries
- Maximum efficiency for bulk data retrieval
- Minimal resource utilization across storage, network, and processing layers
- Scalable architecture that handles queries from seconds to months
This approach enables responsive time series analytics while maintaining cost-effective infrastructure scaling as data volumes grow.