scala - How to transform Array[RDD[Row]] to SchemaRDD -- OR -- how to split SchemaRDD, in which the results be SchemaRDDs? -


i want use implementation of pipeline in mllib. use pipeline, there should sequence of labeleddocument passed pipeline (schemardd).

i create schemardd follows:

val data = sc.textfile("/test.csv"); val parseddata = data.map { line =>         val parts = line.split(',')         labeledpoint(parts(0).todouble, vectors.dense(parts.tail))         }.cache() val rddschema = parseddata.toschemardd; 

i want split new rddschema training (80%) , test (20%). if use randomsplit, returns array[rdd[row]] instead of schemardd.

problem: how transform array[rdd[row]] schemardd

-- or --

how split schemardd, in results schemardds?

i appreciate help.

i know old, did try :

val splits = parseddata.randomsplit(array(0.6, 0.4), seed = 11l) val training = splits(0) val test = splits(1) 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -