scala - Spark implicit RDD conversion doesn't work -


i have seems similar issue spark sorting of delimited data, accepted solution not resolve issue me.

i'm trying apply combinebykey on simple rdd:

    package foo     import org.apache.spark._     import org.apache.spark.sparkconf     import org.apache.spark.sparkcontext._      object hellotest {       def main(args: array[string]) {         val sparkconf = new sparkconf().setappname("test")         val sc = new sparkcontext(sparkconf)         val input = sc.textfile("/path/to/test.txt")         val result = input.combinebykey(           (v) => (v, 1),            (acc: (int, int), v) => (acc._1 + v, acc._2 + 1),            (acc1: (int, int), acc2: (int, int)) => (acc1._1 + acc2._1, acc1._2 + acc2._2)         ).map{ case (key, value) => (key, value._1 / value._2.tofloat) }         result.collectasmap().map(println(_))             sc.stop()          }     }   

i (unique) following error while compiling:

$ scalac -cp /path/to/scala-2.10/spark-assembly-1.4.0-snapshot-hadoop2.2.0.jar -sourcepath src/ -d bin src/foo/hellotest.scala  error: value combinebykey not member of org.apache.spark.rdd.rdd[string] 

interestingly combinebykey function not described here: https://spark.apache.org/docs/latest/programming-guide.html#working-with-key-value-pairs is, in working k/v pairs section of learning spark book.

so problem seems input un-keyed. when read in input text file rdd of strings, , combinebykey, or of similar functions, work needs rdd of key value pairs. hope helps , glad see learning spark reader :)


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -