Time Series Data Retrieval with Mixed Granularity Optimization

Executive Summary

This document describes an optimized approach for retrieving time series event counter data stored in multiple granularities (5-second, 1-minute, 1-hour, 1-day, 1-week, 1-month) when querying with timezone-specific ranges. The solution minimizes data retrieval overhead while ensuring precise boundary coverage through intelligent granularity selection.

Problem Statement

Time series data is stored in UTC-aligned segments across six different granularities:

  • 5 seconds: Ultra-fine resolution for precise measurements
  • 1 minute: Fine resolution for short-term analysis
  • 1 hour: Standard resolution for medium-term analysis
  • 1 day: Coarse resolution for long-term trends
  • 1 week: Weekly aggregations
  • 1 month: Monthly aggregations

Challenge: When users query data using local timezone ranges, we need to:

  1. Convert timezone-specific queries to UTC
  2. Map to appropriate storage segments
  3. Minimize the number of segments retrieved
  4. Ensure precise boundary coverage without data gaps

Solution Overview

The Mixed Granularity Optimization approach uses different granularities strategically:

  • Finest granularity (5-second) only for partial segments at boundaries
  • Medium granularity (1-minute/1-hour) to fill gaps efficiently
  • Coarsest appropriate granularity for the bulk of the range

Detailed Example Analysis

Input Query

Range: 21-10-2024T19:00:25 IST to 23-10-2024T15:00:30 IST Duration: 44 hours, 0 minutes, 5 seconds

Step 1: Timezone Conversion

TimezoneStart TimeEnd Time
IST (Input)21-10-2024T19:00:2523-10-2024T15:00:30
UTC (Storage)21-10-2024T13:30:2523-10-2024T09:30:30

IST = UTC + 5:30, so we subtract 5:30 to convert to UTC

Step 2: Granularity Selection Strategy

For a 44-hour range, the optimal primary granularity is 1-hour segments. However, we need mixed granularities for precise boundary handling:

UTC Range: 13:30:25 ──────────────────────────── 09:30:30
           ↓                                    ↓
Segments:  [5s][1m][────── 1h segments ──────][1m][5s]

Step 3: Segment Breakdown

Phase 1: Start Boundary Precision (13:30:25 → 13:31:00)

Granularity: 5-second segments for sub-minute precision

Segment TimestampCoverage
2024-10-21T13:30:25Z25-30 seconds
2024-10-21T13:30:30Z30-35 seconds
2024-10-21T13:30:35Z35-40 seconds
2024-10-21T13:30:40Z40-45 seconds
2024-10-21T13:30:45Z45-50 seconds
2024-10-21T13:30:50Z50-55 seconds
2024-10-21T13:30:55Z55-60 seconds

Total: 7 five-second segments (35 seconds coverage)

Phase 2: Hour Completion (13:31:00 → 14:00:00)

Granularity: 1-minute segments to reach hour boundary

Time RangeSegment Count
13:31:00 → 14:00:0029 one-minute segments

Phase 3: Bulk Data Retrieval (14:00:00 → 09:00:00)

Granularity: 1-hour segments for maximum efficiency

DayHour SegmentsTime Range
Oct 2110 segments14:00 → 23:59
Oct 2224 segments00:00 → 23:59
Oct 239 segments00:00 → 08:59

Total: 43 one-hour segments (43 hours coverage)

Phase 4: End Boundary Approach (09:00:00 → 09:30:00)

Granularity: 1-minute segments for sub-hour precision

Time RangeSegment Count
09:00:00 → 09:30:0030 one-minute segments

Phase 5: End Boundary Precision (09:30:00 → 09:30:30)

Granularity: 5-second segments for sub-minute precision

Segment TimestampCoverage
2024-10-23T09:30:00Z00-05 seconds
2024-10-23T09:30:05Z05-10 seconds
2024-10-23T09:30:10Z10-15 seconds
2024-10-23T09:30:15Z15-20 seconds
2024-10-23T09:30:20Z20-25 seconds
2024-10-23T09:30:25Z25-30 seconds

Total: 6 five-second segments (30 seconds coverage)

Step 4: Final Segment Summary

GranularitySegment CountTotal DurationUsage Purpose
5-second1365 secondsBoundary precision
1-minute5959 minutesGap filling
1-hour4343 hoursBulk retrieval
TOTAL11544:00:05Complete coverage

Efficiency Analysis

Comparison with Alternative Approaches

StrategyTotal SegmentsEfficiencyPrecision
All 5-second31,685Very PoorPerfect
All 1-minute2,641PoorGood
All 1-hour46GoodPoor
Mixed Granularity115ExcellentPerfect

Performance Benefits

  1. Data Transfer Reduction: 99.6% fewer segments than all-5-second approach
  2. Storage I/O Optimization: Bulk reads for majority of data
  3. Memory Efficiency: Fewer objects to process and aggregate
  4. Network Efficiency: Fewer database queries/API calls
  5. Processing Speed: Less data parsing and aggregation overhead

Implementation Considerations

Algorithm Complexity

  • Time Complexity: O(n) where n is the number of segments
  • Space Complexity: O(n) for segment list storage
  • Preprocessing: Constant time timezone conversion

Edge Cases Handled

  1. Daylight Saving Time Transitions: UTC storage eliminates DST complexity
  2. Month Boundary Variations: Proper handling of different month lengths
  3. Leap Seconds: UTC-based segments handle leap second adjustments
  4. Sub-Second Precision: 5-second granularity provides adequate precision
  5. Cross-Year Queries: Year boundaries handled seamlessly

Error Scenarios

ScenarioHandling Strategy
Invalid timezoneReject query with clear error message
Future date rangesAllow but warn about potential data gaps
Extremely long rangesAuto-upgrade to coarser granularities
Storage unavailabilityGraceful degradation to available granularities

Technical Architecture

Data Storage Schema

time_series_5s/YYYY/MM/DD/HH/mm_ss.parquet
time_series_1m/YYYY/MM/DD/HH/mm.parquet  
time_series_1h/YYYY/MM/DD/HH.parquet
time_series_1d/YYYY/MM/DD.parquet
time_series_1w/YYYY/WW.parquet
time_series_1mo/YYYY/MM.parquet

Query Optimization Pipeline

  1. Parse user input (timestamp + timezone)
  2. Convert to UTC boundaries
  3. Analyze range duration for primary granularity
  4. Generate mixed granularity segment list
  5. Parallelize data retrieval across granularities
  6. Aggregate and merge results
  7. Convert back to user’s timezone for response

Monitoring and Metrics

Key Performance Indicators

  • Segment Retrieval Count: Average segments per query
  • Data Transfer Volume: Bytes transferred per time unit queried
  • Query Response Time: End-to-end latency
  • Cache Hit Rate: Percentage of segments served from cache
  • Granularity Distribution: Usage patterns across different granularities

Performance Targets

  • Sub-second response for ranges < 1 day
  • < 5 second response for ranges < 1 week
  • < 30 second response for ranges < 1 month
  • 95% cache hit rate for frequently accessed recent data
  • < 1000 segments for any single query

Conclusion

The Mixed Granularity Optimization approach provides an optimal balance between precision and performance for time series data retrieval. By intelligently selecting granularities based on the specific requirements of each portion of the query range, we achieve:

  • Perfect precision at query boundaries
  • Maximum efficiency for bulk data retrieval
  • Minimal resource utilization across storage, network, and processing layers
  • Scalable architecture that handles queries from seconds to months

This approach enables responsive time series analytics while maintaining cost-effective infrastructure scaling as data volumes grow.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.