sql - Select first row from group each by with count using Big Query -
i've got on 500million rows stored in bigquery represent exact position of device @ time (irregular).
i'm trying find fast , efficient way determine first , last seen position of device.
so far, have working join it's taking on 10 mins complete (unless limit in query). tried dense_rank query can't sort count out (and don't understand tbh).
i have client_id, device_id (which fixed , represents location within building) , timestamp.
first did group client_id , device_id validate should expect. tried joining table using min , max timestamp:
select count(firstset.device_id), firstset.device_id ( select client_id, device_id, created_at [mytable.visitsv3] secret = 'xxx' group each client_id, device_id, created_at order client_id, created_at asc limit 1000 ) firstset inner join ( select client_id, device_id, min(created_at) [mytable.visitsv3] secret = 'xxx' group each client_id, device_id, created_at limit 1000 ) secondset on firstset.device_id = secondset.device_id group firstset.device_id order 1 desc limit 25
i'm new world appreciate advice. said, it's more performance need run queries live.
there's loads of tutorials same concept - i'm hoping can optimise things.
--- edit ---
output format should so:
|count|device_id| ----------------- |10000| 123| | 9000| 345| | 800| 234|
--- in context ---
we're trying achieve number of things our dataset (not in single query of course). including:
- create heatmap of popular entry , exit points.
initially needs totals long-term, we've got understand entry / exit points busy @ times of day. , use correlation functions predictive analysis.
- create paths of clients move through buildings.
once have entry points, need map paths clients take. again, need predict behaviour per day / hour etc.
initially, need popular entry / exit points. reason, grouped clients , did count of devices appeared on.
the data looks this:
client_id,device_id,created_at,start,end,duration,lat,lng f047ca72e,0013ba30,2015-06-22 10:00:32 utc,2015-06-22 09:30:31 utc,2015-06-22 09:30:32 utc,1,xx,yy f40d8632f,00a30e00,2015-06-22 10:00:29 utc,2015-06-22 09:30:26 utc,2015-06-22 09:30:26 utc,0,xx,yy b808aa3e0,00138b20,2015-06-22 10:00:27 utc,2015-06-22 09:30:25 utc,2015-06-22 09:30:25 utc,0,xx,yy a0e532e96,00a33600,2015-06-22 10:00:34 utc,2015-06-22 09:30:24 utc,2015-06-22 09:30:25 utc,1,xx,yy d4f6f8d50,00149150,2015-06-22 10:00:34 utc,2015-06-22 09:30:22 utc,2015-06-22 09:30:22 utc,0,xx,yy 70124eb7e,00a350a0,2015-06-22 10:00:33 utc,2015-06-22 09:30:14 utc,2015-06-22 09:30:14 utc,0,xx,yy 200af2a9e,00149090,2015-06-22 10:00:33 utc,2015-06-22 09:30:13 utc,2015-06-22 09:30:12 utc,0,xx,yy 20ae4884a,0013efc0,2015-06-22 10:00:30 utc,2015-06-22 09:30:13 utc,2015-06-22 09:30:13 utc,0,xx,yy ccd9bb01c,0013efc0,2015-06-22 10:00:30 utc,2015-06-22 09:30:12 utc,2015-06-22 09:30:06 utc,0,xx,yy 8cce1f24e,0004165e,2015-06-22 10:00:12 utc,2015-06-22 09:30:12 utc,2015-06-22 09:30:12 utc,0,xx,yy
the count of clients per device in someways irrelevant normalised heatmap regardless. need them ranked can see popular , least popular.
let me rephrase how understand setup: - devices installed @ fixed locations throughout buildings - clients (people) move through building @ when pass nearby device, event recorded - time when client client_id passes device device_id recorded in timestamp created_at
therefore, first created_at timestamp given client when entered building, , corresponding device_id entry point. find it, can use
select client_id, first_value(device_id) over(partition client_id order created_at asc) entry_device_id
in order find 10 popular entries, can use top count(*), i.e.
select top(entry_device_id, 10), count(*)
putting together:
select top(entry_device_id, 10), count(*) ( select client_id, first_value(device_id) over(partition client_id order created_at asc) entry_device_id [mytable.visitsv3] )
same things exit points, time use last_value window function:
select top(exit_device_id, 10), count(*) ( select client_id, last_value(device_id) over(partition client_id order created_at asc) exit_device_id [mytable.visitsv3] )
Comments
Post a Comment