R data.table: subgroup weighted percent of group -
i have data.table
like:
library(data.table) widgets <- data.table(serial_no=1:100, color=rep_len(c("red","green","blue","black"),length.out=100), style=rep_len(c("round","pointy","flat"),length.out=100), weight=rep_len(1:5,length.out=100) )
although not sure data.table
way, can calculate subgroup frequency group using table
, length
in single step-- example, answer question "what percent of red widgets round?"
edit: code not provide right answer
# example widgets[, list(style = unique(style), style_pct_of_color_by_count = as.numeric(table(style)/length(style)) ), by=color] # color style style_pct_of_color_by_count # 1: red round 0.32 # 2: red pointy 0.32 # 3: red flat 0.36 # 4: green pointy 0.32 # ...
but can't use approach answer questions "by weight, percent of red widgets round?" can come two-step approach:
# example b widgets[,list(cs_weight=sum(weight)),by=list(color,style)][,list(style, style_pct_of_color_by_weight=cs_weight/sum(cs_weight)),by=color] # color style style_pct_of_color_by_weight # 1: red round 0.3466667 # 2: red pointy 0.3466667 # 3: red flat 0.3066667 # 4: green pointy 0.3333333 # ...
i'm looking single-step approach b, , if improvable, in explanation deepens understanding of data.table
syntax by-group operations. please note question different weighted sum of variables groups data.table because mine involves subgroups , avoiding multiple steps. tyvm.
this single step:
# widgets[,{ totwt = .n .sd[,.(frac=.n/totwt),by=style] },by=color] # color style frac # 1: red round 0.36 # 2: red pointy 0.32 # 3: red flat 0.32 # 4: green pointy 0.36 # 5: green flat 0.32 # 6: green round 0.32 # 7: blue flat 0.36 # 8: blue round 0.32 # 9: blue pointy 0.32 # 10: black round 0.36 # 11: black pointy 0.32 # 12: black flat 0.32 # b widgets[,{ totwt = sum(weight) .sd[,.(frac=sum(weight)/totwt),by=style] },by=color] # color style frac # 1: red round 0.3466667 # 2: red pointy 0.3466667 # 3: red flat 0.3066667 # 4: green pointy 0.3333333 # 5: green flat 0.3200000 # 6: green round 0.3466667 # 7: blue flat 0.3866667 # 8: blue round 0.2933333 # 9: blue pointy 0.3200000 # 10: black round 0.3733333 # 11: black pointy 0.3333333 # 12: black flat 0.2933333
how works: construct denominator top-level group (color
) before going finer group (color
style
) tabulate.
alternatives. if style
s repeat within each color
, display purposes, try table
:
# widgets[, prop.table(table(color,style),1) ] # style # color flat pointy round # black 0.32 0.32 0.36 # blue 0.36 0.32 0.32 # green 0.32 0.36 0.32 # red 0.32 0.32 0.36 # b widgets[,rep(1l,sum(weight)),by=.(color,style)][, prop.table(table(color,style),1) ] # style # color flat pointy round # black 0.2933333 0.3333333 0.3733333 # blue 0.3866667 0.3200000 0.2933333 # green 0.3200000 0.3333333 0.3466667 # red 0.3066667 0.3466667 0.3466667
for b, expands data there 1 observation each unit of weight. large data, such expansion bad idea (since costs memory). also, weight
has integer; otherwise, sum silently truncated 1 (e.g., try rep(1,2.5) # [1] 1 1
).
Comments
Post a Comment