hadoop - Hive - Hashtag Counting -
i stuck counting hashtags hiveql. problem: have these format of hashtags in 1 row:
jurassicworld;movie;night;dino
jurassicworld;book;yourtickets;movie
jurassicworld;movie
i looked @ https://cwiki.apache.org/confluence/display/hive/languagemanual+udf hive, there no function, can choose delimiter (;) seperat these hashtags , count them.
my result should this:
+---------------+-----------+ | hashtag | count | +---------------+-----------+ | jurassicworld | 300 | | movie | 200 | | night | 100 | | dino | 250 | | book | 50 | | etc... | 100 | +---------------+-----------+
i have created following dummy table deli -
hive> describe deli; ok row1 string none
i used following query -
select hashtag, count(*) data deli lateral view explode(split(row1,'\\;')) t1 hashtag group hashtag;
and, giving me following result data-
book 1 dino 1 jurassicworld 2 jurassicworld 1 movie 3 night 1 yourtickets 1
Comments
Post a Comment