scala - Spark implicit RDD conversion doesn't work -
i have seems similar issue spark sorting of delimited data, accepted solution not resolve issue me.
i'm trying apply combinebykey on simple rdd:
package foo import org.apache.spark._ import org.apache.spark.sparkconf import org.apache.spark.sparkcontext._ object hellotest { def main(args: array[string]) { val sparkconf = new sparkconf().setappname("test") val sc = new sparkcontext(sparkconf) val input = sc.textfile("/path/to/test.txt") val result = input.combinebykey( (v) => (v, 1), (acc: (int, int), v) => (acc._1 + v, acc._2 + 1), (acc1: (int, int), acc2: (int, int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2) ).map{ case (key, value) => (key, value._1 / value._2.tofloat) } result.collectasmap().map(println(_)) sc.stop() } }
i (unique) following error while compiling:
$ scalac -cp /path/to/scala-2.10/spark-assembly-1.4.0-snapshot-hadoop2.2.0.jar -sourcepath src/ -d bin src/foo/hellotest.scala error: value combinebykey not member of org.apache.spark.rdd.rdd[string]
interestingly combinebykey function not described here: https://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs is, in working k/v pairs section of learning spark book.
so problem seems input un-keyed. when read in input text file rdd of strings, , combinebykey, or of similar functions, work needs rdd of key value pairs. hope helps , glad see learning spark reader :)
Comments
Post a Comment