r - How do I select values from a list based on character length? -


i working on program follows basic logic:

  1. read 3 csv files, first 2 of contain keywords , last 1 of contains exclusion words.
  2. combine 2 keyword lists 1 thing, capitalizing keywords , removing keywords less 3 characters.
  3. sort , remove duplicate keywords keyword list.
  4. capitalize of words in exclusion list.
  5. remove keywords have match in exclusion list.

it step 2 having trouble with. i've tried quite few solutions nothing working. here code:

# read in individual data sets  set1=read.csv("set1.csv",header=false,sep=",") set2=read.csv("set2.csv",header=false,sep=",") exclude_list=read.csv("exclude.csv",header=false,sep=",")  # create new set aggregate of keyword sets, # capitalizing keywords , excluding keywords  # less 2 characters in length  set_agg=rbind(set1,set2) keywords=set_agg[c("v1")] keywords = as.data.frame(sapply(keywords, toupper))  ??? goes here ???  # sort , remove duplicate keywords keyword list as.data.frame(keywords[order(keywords$v1),]) keywords=unique(keywords)  # modify , capitalize exclusion list  exclude_list=as.data.frame(exclude_list[c("v1")]) exclude_list=as.data.frame(sapply(exclude_list, toupper))     # remove keywords matching exclude list  `%ni%` <- negate(`%in%`)  keywords=subset(keywords, v1 %ni% exclude_list$v1)  return(keywords) 

for reference, csv files formatted this:

word1, word2, word3, etc... 

you can indexing sapply on length of keywords:

keywords[sapply(keywords[,1], nchar) > 2,] 

update here full version little simpler using vectors:

## assuming have keywords , exclude_list stored vectors keywords <- sapply(unique(sort(c(set1, set2))), toupper) keywords <- keywords[nchar(keywords) > 2] keywords <- setdiff(keywords, sapply(exclude_list, toupper)) 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -