sql - Select first row from group each by with count using Big Query -


i've got on 500million rows stored in bigquery represent exact position of device @ time (irregular).

i'm trying find fast , efficient way determine first , last seen position of device.

so far, have working join it's taking on 10 mins complete (unless limit in query). tried dense_rank query can't sort count out (and don't understand tbh).

i have client_id, device_id (which fixed , represents location within building) , timestamp.

first did group client_id , device_id validate should expect. tried joining table using min , max timestamp:

select count(firstset.device_id), firstset.device_id (   select client_id, device_id, created_at   [mytable.visitsv3]   secret = 'xxx'   group each client_id, device_id, created_at   order client_id, created_at asc   limit 1000 ) firstset inner join (   select client_id, device_id, min(created_at)   [mytable.visitsv3]   secret = 'xxx'   group each client_id, device_id, created_at   limit 1000 ) secondset on firstset.device_id = secondset.device_id group firstset.device_id order 1 desc limit 25 

i'm new world appreciate advice. said, it's more performance need run queries live.

there's loads of tutorials same concept - i'm hoping can optimise things.

--- edit ---

output format should so:

|count|device_id| ----------------- |10000|      123| | 9000|      345| |  800|      234| 

--- in context ---

we're trying achieve number of things our dataset (not in single query of course). including:

  • create heatmap of popular entry , exit points.

initially needs totals long-term, we've got understand entry / exit points busy @ times of day. , use correlation functions predictive analysis.

  • create paths of clients move through buildings.

once have entry points, need map paths clients take. again, need predict behaviour per day / hour etc.

initially, need popular entry / exit points. reason, grouped clients , did count of devices appeared on.

the data looks this:

client_id,device_id,created_at,start,end,duration,lat,lng f047ca72e,0013ba30,2015-06-22 10:00:32 utc,2015-06-22 09:30:31 utc,2015-06-22 09:30:32 utc,1,xx,yy f40d8632f,00a30e00,2015-06-22 10:00:29 utc,2015-06-22 09:30:26 utc,2015-06-22 09:30:26 utc,0,xx,yy b808aa3e0,00138b20,2015-06-22 10:00:27 utc,2015-06-22 09:30:25 utc,2015-06-22 09:30:25 utc,0,xx,yy a0e532e96,00a33600,2015-06-22 10:00:34 utc,2015-06-22 09:30:24 utc,2015-06-22 09:30:25 utc,1,xx,yy d4f6f8d50,00149150,2015-06-22 10:00:34 utc,2015-06-22 09:30:22 utc,2015-06-22 09:30:22 utc,0,xx,yy 70124eb7e,00a350a0,2015-06-22 10:00:33 utc,2015-06-22 09:30:14 utc,2015-06-22 09:30:14 utc,0,xx,yy 200af2a9e,00149090,2015-06-22 10:00:33 utc,2015-06-22 09:30:13 utc,2015-06-22 09:30:12 utc,0,xx,yy 20ae4884a,0013efc0,2015-06-22 10:00:30 utc,2015-06-22 09:30:13 utc,2015-06-22 09:30:13 utc,0,xx,yy ccd9bb01c,0013efc0,2015-06-22 10:00:30 utc,2015-06-22 09:30:12 utc,2015-06-22 09:30:06 utc,0,xx,yy 8cce1f24e,0004165e,2015-06-22 10:00:12 utc,2015-06-22 09:30:12 utc,2015-06-22 09:30:12 utc,0,xx,yy 

the count of clients per device in someways irrelevant normalised heatmap regardless. need them ranked can see popular , least popular.

let me rephrase how understand setup: - devices installed @ fixed locations throughout buildings - clients (people) move through building @ when pass nearby device, event recorded - time when client client_id passes device device_id recorded in timestamp created_at

therefore, first created_at timestamp given client when entered building, , corresponding device_id entry point. find it, can use

select    client_id,    first_value(device_id)       over(partition client_id order created_at asc)    entry_device_id 

in order find 10 popular entries, can use top count(*), i.e.

select top(entry_device_id, 10), count(*) 

putting together:

select top(entry_device_id, 10), count(*) (   select      client_id,      first_value(device_id)         over(partition client_id order created_at asc)      entry_device_id   [mytable.visitsv3] ) 

same things exit points, time use last_value window function:

select top(exit_device_id, 10), count(*) (   select      client_id,      last_value(device_id)         over(partition client_id order created_at asc)      exit_device_id   [mytable.visitsv3] ) 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -