hadoop - Hive - Hashtag Counting -


i stuck counting hashtags hiveql. problem: have these format of hashtags in 1 row:


jurassicworld;movie;night;dino

jurassicworld;book;yourtickets;movie

jurassicworld;movie


i looked @ https://cwiki.apache.org/confluence/display/hive/languagemanual+udf hive, there no function, can choose delimiter (;) seperat these hashtags , count them.

my result should this:

+---------------+-----------+ | hashtag       | count     | +---------------+-----------+ | jurassicworld | 300       | | movie         | 200       | | night         | 100       | | dino          | 250       |  | book          | 50        |   | etc...        | 100       | +---------------+-----------+ 

i have created following dummy table deli -

hive> describe deli; ok row1                    string                  none 

i used following query -

select hashtag, count(*) data deli lateral view explode(split(row1,'\\;')) t1 hashtag group hashtag; 

and, giving me following result data-

book    1 dino    1 jurassicworld   2 jurassicworld   1 movie   3 night   1 yourtickets     1 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -