r - calculate mean for multiple columns in data.frame -
just wondering whether possible calculate means multiple columns using mean function
e.g.
mean(iris[,1])
is possible not
mean(iris[,1:4])
tried:
mean(iris[,c(1:4)])
got error message:
warning message: in mean.default(iris[, 1:4]) : argument not numeric or logical: returning na
i know can use lapply(iris[,1:4],mean) or sapply(iris[,1:4],mean)
try colmeans
:
but column must numeric. can add test for larger datasets.
colmeans(iris[sapply(iris, is.numeric)]) sepal.length sepal.width petal.length petal.width 5.843333 3.057333 3.758000 1.199333
benchmark
seems long dplyr
, data.table
. perhaps can replicate findings veracity.
microbenchmark( plafort = colmeans(big.df[sapply(big.df, is.numeric)]), carlos = colmeans(filter(is.numeric, big.df)), cdtable = big.dt[, lapply(.sd, mean)], cdplyr = big.df %>% summarise_each(funs(mean)) ) #unit: milliseconds # expr min lq mean median uq max # plafort 9.862934 10.506778 12.07027 10.699616 11.16404 31.23927 # carlos 9.215143 9.557987 11.30063 9.843197 10.21821 65.21379 # cdtable 57.157250 64.866996 78.72452 67.633433 87.52451 264.60453 # cdplyr 62.933293 67.853312 81.77382 71.296555 91.44994 182.36578
data
m <- matrix(1:1e6, 1000) m2 <- matrix(rep('a', 1000), ncol=1) big.df <- as.data.frame(cbind(m2, m), stringsasfactors=f) big.df[,-1] <- lapply(big.df[,-1], as.numeric) big.dt <- as.data.table(big.df)
Comments
Post a Comment