r - How do I select values from a list based on character length? -
i working on program follows basic logic:
- read 3 csv files, first 2 of contain keywords , last 1 of contains exclusion words.
- combine 2 keyword lists 1 thing, capitalizing keywords , removing keywords less 3 characters.
- sort , remove duplicate keywords keyword list.
- capitalize of words in exclusion list.
- remove keywords have match in exclusion list.
it step 2 having trouble with. i've tried quite few solutions nothing working. here code:
# read in individual data sets set1=read.csv("set1.csv",header=false,sep=",") set2=read.csv("set2.csv",header=false,sep=",") exclude_list=read.csv("exclude.csv",header=false,sep=",") # create new set aggregate of keyword sets, # capitalizing keywords , excluding keywords # less 2 characters in length set_agg=rbind(set1,set2) keywords=set_agg[c("v1")] keywords = as.data.frame(sapply(keywords, toupper)) ??? goes here ??? # sort , remove duplicate keywords keyword list as.data.frame(keywords[order(keywords$v1),]) keywords=unique(keywords) # modify , capitalize exclusion list exclude_list=as.data.frame(exclude_list[c("v1")]) exclude_list=as.data.frame(sapply(exclude_list, toupper)) # remove keywords matching exclude list `%ni%` <- negate(`%in%`) keywords=subset(keywords, v1 %ni% exclude_list$v1) return(keywords)
for reference, csv files formatted this:
word1, word2, word3, etc...
you can indexing sapply on length of keywords:
keywords[sapply(keywords[,1], nchar) > 2,]
update here full version little simpler using vectors:
## assuming have keywords , exclude_list stored vectors keywords <- sapply(unique(sort(c(set1, set2))), toupper) keywords <- keywords[nchar(keywords) > 2] keywords <- setdiff(keywords, sapply(exclude_list, toupper))
Comments
Post a Comment