scala - How to transform Array[RDD[Row]] to SchemaRDD -- OR -- how to split SchemaRDD, in which the results be SchemaRDDs? -
i want use implementation of pipeline in mllib. use pipeline, there should sequence of labeleddocument passed pipeline (schemardd).
i create schemardd follows:
val data = sc.textfile("/test.csv"); val parseddata = data.map { line => val parts = line.split(',') labeledpoint(parts(0).todouble, vectors.dense(parts.tail)) }.cache() val rddschema = parseddata.toschemardd;
i want split new rddschema training (80%) , test (20%). if use randomsplit, returns array[rdd[row]] instead of schemardd.
problem: how transform array[rdd[row]] schemardd
-- or --
how split schemardd, in results schemardds?
i appreciate help.
i know old, did try :
val splits = parseddata.randomsplit(array(0.6, 0.4), seed = 11l) val training = splits(0) val test = splits(1)
Comments
Post a Comment