amazon web services - DynamoDB Schema Design -


i'm thinking of using amazon aws dynamodb project i'm working on. here's gist of situation:

i'm going gathering ton of energy usage data hundreds of machines (energy readings taken around every 5 minutes). each machine in zone, , each zone in network.

i'm going roll these individual readings zone , network, hour , day.

my thinking doing this, i'll able perform 1 query against dynamodb on network_day table, , return energy usage given day quickly.

here's schema @ point:

table_name      | hash_key   | range_key  | attributes ______________________________________________________ machine_reading | machine.id | epoch      | energy_use machine_hour    | machine.id | epoch_hour | energy_use machine_day     | machine.id | epoch_day  | energy_use zone_hour       | machine.id | epoch_hour | energy_use zone_day        | machine.id | epoch_day  | energy_use network_hour    | machine.id | epoch_hour | energy_use network_day     | machine.id | epoch_day  | energy_use 

i'm not seeing great of performance in tests when run rollup cronjob, i'm wondering if more experience comment on key design? experience have far rds, i'm trying learn dynamodb.

edit:

basic structure cronjob i'm using rollups:

foreach network   foreach zone     foreach machine       add_unprocessed_readings_to_dynamo()       roll_up_fixture_hours_to_dynamo()       roll_up_fixture_days_to_dynamo()     end     roll_up_zone_hours_to_dynamo()     roll_up_zone_days_to_dynamo()   end   roll_up_network_hours_to_dynamo()   roll_up_network_days_to_dynamo() end 

i use previous function's values in dynamo next roll up, i.e.

  • i use zone hours roll zone days
  • i use zone days roll network days

this (i think) causing lot of unnecessary reads/writes. right can manage low throughputs because sample size 100 readings. concerns begin when scales expected contain around 9,000,000 readings.

first things first, time series data in dynamodb hard right, not impossible.

dynamodb uses hash key shard data using machine.id means of going have hot keys. however, function of amount of data , expect iops be. dynamodb doesn't create 2nd shard until push past 1000 read or write iops. if expect below level may fine, if expect scale beyond may want redesign, include date component in hash key break things up.

regarding performance, hitting provisioned read or write throughput level? if raise them crazy high level , re-run test until bottleneck becomes code. simple setting throughput level appropriately.

however, regarding actual code, without seeing actual dynamodb queries performing possible issue reading data. make sure not reading more data need dynamodb. since range key date field use range conditional (not filter) reduce number of records need read.

make sure code executes rollup using multiple threads. if not able saturate dynamodb provisioned capacity issue may not dynamodb, may code. performing rollups using multiple threads in parallel should able see performance gains.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -