Executive Summary

This document describes an optimized approach for retrieving time series event counter data stored in multiple granularities (5-second, 1-minute, 1-hour, 1-day, 1-week, 1-month) when querying with timezone-specific ranges. The solution minimizes data retrieval overhead while ensuring precise boundary coverage through intelligent granularity selection.

Problem Statement

Time series data is stored in UTC-aligned segments across six different granularities:

5 seconds: Ultra-fine resolution for precise measurements
1 minute: Fine resolution for short-term analysis
1 hour: Standard resolution for medium-term analysis
1 day: Coarse resolution for long-term trends
1 week: Weekly aggregations
1 month: Monthly aggregations

Challenge: When users query data using local timezone ranges, we need to:

Convert timezone-specific queries to UTC
Map to appropriate storage segments
Minimize the number of segments retrieved
Ensure precise boundary coverage without data gaps

Solution Overview

The Mixed Granularity Optimization approach uses different granularities strategically:

Finest granularity (5-second) only for partial segments at boundaries
Medium granularity (1-minute/1-hour) to fill gaps efficiently
Coarsest appropriate granularity for the bulk of the range

Detailed Example Analysis

Input Query

Range: 21-10-2024T19:00:25 IST to 23-10-2024T15:00:30 IST Duration: 44 hours, 0 minutes, 5 seconds

Step 1: Timezone Conversion

Timezone	Start Time	End Time
IST (Input)	21-10-2024T19:00:25	23-10-2024T15:00:30
UTC (Storage)	21-10-2024T13:30:25	23-10-2024T09:30:30

IST = UTC + 5:30, so we subtract 5:30 to convert to UTC

Step 2: Granularity Selection Strategy

For a 44-hour range, the optimal primary granularity is 1-hour segments. However, we need mixed granularities for precise boundary handling:

UTC Range: 13:30:25 ──────────────────────────── 09:30:30
           ↓                                    ↓
Segments:  [5s][1m][────── 1h segments ──────][1m][5s]

Step 3: Segment Breakdown

Phase 1: Start Boundary Precision (13:30:25 → 13:31:00)

Granularity: 5-second segments for sub-minute precision

Segment Timestamp	Coverage
2024-10-21T13:30:25Z	25-30 seconds
2024-10-21T13:30:30Z	30-35 seconds
2024-10-21T13:30:35Z	35-40 seconds
2024-10-21T13:30:40Z	40-45 seconds
2024-10-21T13:30:45Z	45-50 seconds
2024-10-21T13:30:50Z	50-55 seconds
2024-10-21T13:30:55Z	55-60 seconds

Total: 7 five-second segments (35 seconds coverage)

Phase 2: Hour Completion (13:31:00 → 14:00:00)

Granularity: 1-minute segments to reach hour boundary

Time Range	Segment Count
13:31:00 → 14:00:00	29 one-minute segments

Phase 3: Bulk Data Retrieval (14:00:00 → 09:00:00)

Granularity: 1-hour segments for maximum efficiency

Day	Hour Segments	Time Range
Oct 21	10 segments	14:00 → 23:59
Oct 22	24 segments	00:00 → 23:59
Oct 23	9 segments	00:00 → 08:59

Total: 43 one-hour segments (43 hours coverage)

Phase 4: End Boundary Approach (09:00:00 → 09:30:00)

Granularity: 1-minute segments for sub-hour precision

Time Range	Segment Count
09:00:00 → 09:30:00	30 one-minute segments

Phase 5: End Boundary Precision (09:30:00 → 09:30:30)

Granularity: 5-second segments for sub-minute precision

Segment Timestamp	Coverage
2024-10-23T09:30:00Z	00-05 seconds
2024-10-23T09:30:05Z	05-10 seconds
2024-10-23T09:30:10Z	10-15 seconds
2024-10-23T09:30:15Z	15-20 seconds
2024-10-23T09:30:20Z	20-25 seconds
2024-10-23T09:30:25Z	25-30 seconds

Total: 6 five-second segments (30 seconds coverage)

Step 4: Final Segment Summary

Granularity	Segment Count	Total Duration	Usage Purpose
5-second	13	65 seconds	Boundary precision
1-minute	59	59 minutes	Gap filling
1-hour	43	43 hours	Bulk retrieval
TOTAL	115	44:00:05	Complete coverage

Efficiency Analysis

Comparison with Alternative Approaches

Strategy	Total Segments	Efficiency	Precision
All 5-second	31,685	Very Poor	Perfect
All 1-minute	2,641	Poor	Good
All 1-hour	46	Good	Poor
Mixed Granularity	115	Excellent	Perfect

Performance Benefits

Data Transfer Reduction: 99.6% fewer segments than all-5-second approach
Storage I/O Optimization: Bulk reads for majority of data
Memory Efficiency: Fewer objects to process and aggregate
Network Efficiency: Fewer database queries/API calls
Processing Speed: Less data parsing and aggregation overhead

Implementation Considerations

Algorithm Complexity

Time Complexity: O(n) where n is the number of segments
Space Complexity: O(n) for segment list storage
Preprocessing: Constant time timezone conversion

Edge Cases Handled

Daylight Saving Time Transitions: UTC storage eliminates DST complexity
Month Boundary Variations: Proper handling of different month lengths
Leap Seconds: UTC-based segments handle leap second adjustments
Sub-Second Precision: 5-second granularity provides adequate precision
Cross-Year Queries: Year boundaries handled seamlessly

Error Scenarios

Scenario	Handling Strategy
Invalid timezone	Reject query with clear error message
Future date ranges	Allow but warn about potential data gaps
Extremely long ranges	Auto-upgrade to coarser granularities
Storage unavailability	Graceful degradation to available granularities

Technical Architecture

Data Storage Schema

time_series_5s/YYYY/MM/DD/HH/mm_ss.parquet
time_series_1m/YYYY/MM/DD/HH/mm.parquet  
time_series_1h/YYYY/MM/DD/HH.parquet
time_series_1d/YYYY/MM/DD.parquet
time_series_1w/YYYY/WW.parquet
time_series_1mo/YYYY/MM.parquet

Query Optimization Pipeline

Parse user input (timestamp + timezone)
Convert to UTC boundaries
Analyze range duration for primary granularity
Generate mixed granularity segment list
Parallelize data retrieval across granularities
Aggregate and merge results
Convert back to user’s timezone for response

Monitoring and Metrics

Key Performance Indicators

Segment Retrieval Count: Average segments per query
Data Transfer Volume: Bytes transferred per time unit queried
Query Response Time: End-to-end latency
Cache Hit Rate: Percentage of segments served from cache
Granularity Distribution: Usage patterns across different granularities

Performance Targets

Sub-second response for ranges < 1 day
< 5 second response for ranges < 1 week
< 30 second response for ranges < 1 month
95% cache hit rate for frequently accessed recent data
< 1000 segments for any single query

Conclusion

The Mixed Granularity Optimization approach provides an optimal balance between precision and performance for time series data retrieval. By intelligently selecting granularities based on the specific requirements of each portion of the query range, we achieve:

Perfect precision at query boundaries
Maximum efficiency for bulk data retrieval
Minimal resource utilization across storage, network, and processing layers
Scalable architecture that handles queries from seconds to months

This approach enables responsive time series analytics while maintaining cost-effective infrastructure scaling as data volumes grow.

The Daily Kebab

The ramblings of a technomuse

Time Series Data Retrieval with Mixed Granularity Optimization