r - SparkR and Packages -
how 1 call packages spark utilized data operations r?
example trying access test.csv in hdfs below
sys.setenv(spark_home="/opt/spark14") library(sparkr) sc <- sparkr.init(master="local") sqlcontext <- sparkrsql.init(sc) flights <- read.df(sqlcontext,"hdfs://sandbox.hortonworks.com:8020 /user/root/test.csv","com.databricks.spark.csv", header="true")
but getting error below:
caused by: java.lang.runtimeexception: failed load class data source: com.databricks.spark.csv
i tried loading csv package below option
sys.setenv('sparkr_submit_args'='--packages com.databricks:spark-csv_2.10:1.0.3')
but getting below error during loading sqlcontext
launching java spark-submit command /opt/spark14/bin/spark-submit --packages com.databricks:spark-csv_2.10:1.0.3 /tmp/rtmpuvwoky /backend_port95332e5267b error: cannot load main class jar file:/tmp/rtmpuvwoky/backend_port95332e5267b
any highly appreciated.
so looks setting sparkr_submit_args
overriding default value, sparkr-shell
. same thing , append sparkr-shell end of sparkr_submit_args. seems unnecessarily complex compared depending on jars i've created jira track issue (and i'll try , fix if sparkr people agree me) https://issues.apache.org/jira/browse/spark-8506 .
note: option using sparkr command + --packages com.databricks:spark-csv_2.10:1.0.3
since should work.
Comments
Post a Comment