hadoop - Hive - Hashtag Counting -


i stuck counting hashtags hiveql. problem: have these format of hashtags in 1 row:


jurassicworld;movie;night;dino

jurassicworld;book;yourtickets;movie

jurassicworld;movie


i looked @ https://cwiki.apache.org/confluence/display/hive/languagemanual+udf hive, there no function, can choose delimiter (;) seperat these hashtags , count them.

my result should this:

+---------------+-----------+ | hashtag       | count     | +---------------+-----------+ | jurassicworld | 300       | | movie         | 200       | | night         | 100       | | dino          | 250       |  | book          | 50        |   | etc...        | 100       | +---------------+-----------+ 

i have created following dummy table deli -

hive> describe deli; ok row1                    string                  none 

i used following query -

select hashtag, count(*) data deli lateral view explode(split(row1,'\\;')) t1 hashtag group hashtag; 

and, giving me following result data-

book    1 dino    1 jurassicworld   2 jurassicworld   1 movie   3 night   1 yourtickets     1 

Comments

Popular posts from this blog

twig - Using Twigbridge in a Laravel 5.1 Package -

jdbc - Not able to establish database connection in eclipse -

Kivy: Swiping (Carousel & ScreenManager) -