Cannot read cassandra column family on a spark cluster using the spark-cassandra-connector in java -
here code read column family using spark cassandra connector
import static com.datastax.spark.connector.japi.cassandrajavautil.*; import com.datastax.spark.connector.japi.sparkcontextjavafunctions; import org.apache.spark.sparkconf; import org.apache.spark.sparkcontext; import org.apache.spark.api.java.javardd; public class main { private static final string host = "spark://sparkmaster:7077"; // private static final string host = "local[4]"; private static final string app_name = "cassandra spark wordcount"; public static void main (string... args) { string[] jars = { "./build/libs/cassandrasparkmapreduce-1.0-snapshot.jar" }; sparkconf conf = new sparkconf(true) .set("spark.cassandra.connection.host", "107.108.214.154") .set("spark.executor.userclasspathfirst", "true") .setjars(jars); sparkcontext sc = new sparkcontext(host, app_name, conf); sparkcontextjavafunctions context = javafunctions(sc); javardd<string> rdd = context.cassandratable("wordcount", "input") .map(row -> row.tostring()); system.out.println(rdd.toarray()); } }
and here build.gradle
file build , run application
group 'in.suyash.tests' version '1.0-snapshot' apply plugin: 'java' apply plugin: 'application' sourcecompatibility = 1.8 repositories { mavencentral() } dependencies { compile group: 'org.apache.spark', name: 'spark-core_2.10', version: '1.4.0' compile group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.10', version: '1.4.0-m1' compile group: 'com.datastax.spark', name: 'spark-cassandra-connector-java_2.10', version: '1.4.0-m1' testcompile group: 'junit', name: 'junit', version: '4.11' } sourcesets { main { java { srcdir './' } } } mainclassname = 'main' // http://stackoverflow.com/a/14441628/3673043 jar { dofirst { { configurations.compile.collect { it.isdirectory() ? : ziptree(it) } } } exclude 'meta-inf/*.rsa', 'meta-inf/*.sf','meta-inf/*.dsa' }
i execute job first doing gradle build
build jar, , gradle run
. job fails, , looking @ stderr
in executors, getting following exception
java.lang.classcastexception: org.apache.spark.scheduler.resulttask cannot cast org.apache.spark.scheduler.task @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:194) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745)
i have 3 node setup, in 1 node acts spark master node, while other 2 spark worker nodes , form cassandra ring. can execute job locally, if change spark host, on cluster getting weird exception, cannot find anywhere else. versions:
- spark: 1.4.0
- cassandra: 2.1.6
- spark-cassandra-connector: 1.4.0-m1
edit
i can't answer how resolved this, removed java installations nodes, restarted , installed fresh copy of jdk1.8.0_45
, started cluster again, , job completes successfully. explanations behaviour welcome.
Comments
Post a Comment