R function that calculate correlation between two elements of a data frame if condition is meet -
so i've been trying make function work days..
function behaviour should follows:
- go throw number of specified csv's = id
- read.csv's , rbind them in 1 dataframe
- remove rows have na values , keep complete ones. condition 1
- then if sum of data[row_number,col_1] + data[row_number,col_2] > threshold, keep row, else row deleted.
- finally correlate element-1 element-2 , return list of correlations done.
my code:
corr <- function(directory,threshold = 0,id = 1:332) { file.list <- list.files(directory, full.names = t) dat <- data.frame() for(i in id) { dat <- rbind(dat,read.csv(file.list[i])) } complete_rows <- dat[complete.cases(dat),] z <- data.frame() z <- complete_rows[,2:3] y <- data.frame() y <- rowsums(z) > threshold x <- data.frame() x <- z[y,1:2] for( in x[1:nrow(x),]){ cor(x[i,1],x[i,2], method = c("pearson")) } } my problem in step 5 correlating both elements , returning correlations..
thanks in advance.
try this:
corr <- function(directory, threshold = 0) { files <- list.files(directory, full.names = t) dat2 <- lapply(files, function(x) na.omit(read.csv(x))) size <- unlist(lapply(dat2, nrow)) cors <- lapply(dat2[size > threshold], function(x) cor(x['nitrate'], x['sulfate'])) res <- unname(unlist(cors)) }
Comments
Post a Comment