R data.table: subgroup weighted percent of group -


i have data.table like:

library(data.table) widgets <- data.table(serial_no=1:100,                        color=rep_len(c("red","green","blue","black"),length.out=100),                       style=rep_len(c("round","pointy","flat"),length.out=100),                       weight=rep_len(1:5,length.out=100) ) 

although not sure data.table way, can calculate subgroup frequency group using table , length in single step-- example, answer question "what percent of red widgets round?"

edit: code not provide right answer

# example widgets[, list(style = unique(style),                 style_pct_of_color_by_count =                   as.numeric(table(style)/length(style)) ), by=color]  #    color  style style_pct_of_color_by_count # 1:   red  round                        0.32 # 2:   red pointy                        0.32 # 3:   red   flat                        0.36 # 4: green pointy                        0.32 # ... 

but can't use approach answer questions "by weight, percent of red widgets round?" can come two-step approach:

# example b widgets[,list(cs_weight=sum(weight)),by=list(color,style)][,list(style, style_pct_of_color_by_weight=cs_weight/sum(cs_weight)),by=color]  #    color  style style_pct_of_color_by_weight # 1:   red  round                    0.3466667 # 2:   red pointy                    0.3466667 # 3:   red   flat                    0.3066667 # 4: green pointy                    0.3333333 # ... 

i'm looking single-step approach b, , if improvable, in explanation deepens understanding of data.table syntax by-group operations. please note question different weighted sum of variables groups data.table because mine involves subgroups , avoiding multiple steps. tyvm.

this single step:

# widgets[,{     totwt = .n     .sd[,.(frac=.n/totwt),by=style] },by=color]     # color  style frac  # 1:   red  round 0.36  # 2:   red pointy 0.32  # 3:   red   flat 0.32  # 4: green pointy 0.36  # 5: green   flat 0.32  # 6: green  round 0.32  # 7:  blue   flat 0.36  # 8:  blue  round 0.32  # 9:  blue pointy 0.32 # 10: black  round 0.36 # 11: black pointy 0.32 # 12: black   flat 0.32  # b widgets[,{     totwt = sum(weight)     .sd[,.(frac=sum(weight)/totwt),by=style] },by=color]  #    color  style      frac  # 1:   red  round 0.3466667  # 2:   red pointy 0.3466667  # 3:   red   flat 0.3066667  # 4: green pointy 0.3333333  # 5: green   flat 0.3200000  # 6: green  round 0.3466667  # 7:  blue   flat 0.3866667  # 8:  blue  round 0.2933333  # 9:  blue pointy 0.3200000 # 10: black  round 0.3733333 # 11: black pointy 0.3333333 # 12: black   flat 0.2933333 

how works: construct denominator top-level group (color) before going finer group (color style) tabulate.


alternatives. if styles repeat within each color , display purposes, try table:

# widgets[,   prop.table(table(color,style),1) ] #        style # color   flat pointy round #   black 0.32   0.32  0.36 #   blue  0.36   0.32  0.32 #   green 0.32   0.36  0.32 #   red   0.32   0.32  0.36  # b widgets[,rep(1l,sum(weight)),by=.(color,style)][,   prop.table(table(color,style),1) ]  #        style # color        flat    pointy     round #   black 0.2933333 0.3333333 0.3733333 #   blue  0.3866667 0.3200000 0.2933333 #   green 0.3200000 0.3333333 0.3466667 #   red   0.3066667 0.3466667 0.3466667 

for b, expands data there 1 observation each unit of weight. large data, such expansion bad idea (since costs memory). also, weight has integer; otherwise, sum silently truncated 1 (e.g., try rep(1,2.5) # [1] 1 1).


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -