r - Removing duplicate records from .Xdf file -


i remove duplicate records large .xdf file trans.xdf. here file details:

file name: /poc/revor/data/trans.xdf number of observations: 1000000000 number of variables: 5 number of blocks: 40 compression type: zlib variable information: var 1: card_id, type: character var 2: se_no, type: character var 3: r12m_cv, type: numeric, low/high: (-2348.7600, 40587.3900) var 4: r12m_roc, type: numeric, low/high: (0.0000, 231.0000) var 5: prod_grp_cd, type: character 

also below sample data of file:

card_id se_no   r12m_cv r12m_roc    prod_grp_cd 900000999000000000          1045815024  110 1   1 900000999000000000          1052487253  247.52  2   1 900000999000000000          9999999999  38.72   1   1 900000999000000000          1090389768  1679.96 16  1 900000999000000000          1091226035  0   1   1 900000999000000000          1091241208  538.68  4   1 900000999000000000          9999999999  83  1   1 900000999000000000          1091468041  148.4   3   1 900000999000000000          1092640358  3.13    1   1 900000999000000000          1093468692  546.29  1   1 

i have tried using rxdatastep function use transform parameter call unique() function on .xdf file. below code same:

uniq_dat <- function( datalist ) {     datalist <- unique(datalist)     return(datalist) }  rxdatastepxdf(infile = "/poc/revor/data/trans.xdf",outfile = "/poc/revor/data/trans.xdf",transformfunc = uniq_dat,overwrite = true)  

but getting below error:

error in unique(datalist) : object 'datalist' not found  error in transformation function: error in unique(datalist) : object 'datalist' not found  error in rxcall("rxdatastep", params) : 

so point out mistake doing here or if there better way remove duplicate records .xdf file. avoiding loading data inmemory dataframe data pretty huge.

i running above code in revolution r environment on hdfs.

if same can obtained other approach example same appreciated.

thanks in advance :)

cheers,

amit


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -