r - Sum of all pairwise row products as a two way matrix -
i'm looking @ high throughput gene data , doing type of correlation analysis based on bayesian statistics. 1 of things need find every pairwise combination of products in dataset , find sum of each resultant row.
so example, high throughput dataset matrix dataset
(dataset <- structure(list(`condition 1` = c(1l, 3l, 2l, 2l), `condition 2` = c(2l, 1l, 7l, 2l), `condition 3` = c(4l, 1l, 2l, 5l)), .names = c("condition 1", "condition 2", "condition 3"), class = "data.frame", row.names = c("gene a", "gene b", "gene c", "gene d"))) condition 1 condition 2 condition 3 gene 1 2 4 gene b 3 1 1 gene c 2 7 2 gene d 2 2 5
first want multiply every possible pair of rows following matrix called comb
:
condition 1 condition 2 condition 3 gene gene 1 4 9 gene gene b 3 2 4 gene gene c 2 14 8 gene gene d 2 4 20 gene b gene b 9 1 1 gene b gene c 6 7 2 gene b gene d 6 2 5 gene c gene c 4 49 4 gene c gene d 4 14 10 gene d gene d 4 4 25
after want find row sums each product , sums in form of matrix (which call combsums
):
gene gene b gene c gene d gene na 10 24 26 gene b 10 na 15 13 gene c 24 15 na 28 gene d 26 13 28 na
when tried it, best come
combs <- combn(seq_len(nrow(dataset)), 2) comb <- dataset[combs[1,], ] * dataset[combs[2,], ] rownames(comb) <- apply(combn(rownames(comb), 2), 2, paste, collapse = " ") combsums <- rowsums(comb)
which gives me sums list, such below:
[1,] gene gene b 10 gene gene c 24 gene gene d 26 gene b gene c 15 gene b gene d 13 gene c gene d 28
unfortunately, want two-way matrix , not list doesn't quite work, if suggest way sums matrix, great help.
if speed important factor (e.g. if you're processing huge matrix), might find rcpp implementation helpful. fills upper triangular portion of matrix.
library(rcpp) cppfunction( "numericmatrix josilberrcpp(numericmatrix x) { const int nr = x.nrow(); const int nc = x.ncol(); numericmatrix y(nr, nr); (int col=0; col < nc; ++col) { (int i=0; < nr; ++i) { (int j=i; j < nr; ++j) { y(i, j) += x(i, col) * x(j, col); } } } return y; }") josilberrcpp(as.matrix(dataset)) # [,1] [,2] [,3] [,4] # [1,] 21 9 24 26 # [2,] 0 11 15 13 # [3,] 0 0 57 28 # [4,] 0 0 0 33
benchmarking provided in other answer. note benchmarking not include compile time using cppfunction
, can quite significant. therefore implementation useful large inputs or when need use function many times.
Comments
Post a Comment