jdbc - Connecting from Spark/pyspark to PostgreSQL -
i've installed spark on windows machine , want use via spyder. after troubleshooting basics seems work:
import os os.environ["spark_home"] = "d:\analytics\spark\spark-1.4.0-bin-hadoop2.6" pyspark import sparkcontext, sparkconf pyspark.sql import sqlcontext spark_config = sparkconf().setmaster("local[8]") sc = sparkcontext(conf=spark_config) sqlcontext = sqlcontext(sc) textfile = sc.textfile("d:\\analytics\\spark\\spark-1.4.0-bin-hadoop2.6\\readme.md") textfile.count() textfile.filter(lambda line: "spark" in line).count() sc.stop()
this runs expected. want connect postgres9.3 database running on same server. have downloaded jdbc driver here here , have put in folder d:\analytics\spark\spark_jars. i've created new file d:\analytics\spark\spark-1.4.0-bin-hadoop2.6\conf\spark-defaults.conf containing line:
spark.driver.extraclasspath 'd:\\analytics\\spark\\spark_jars\\postgresql-9.3-1103.jdbc41.jar'
i've ran following code test connection
import os os.environ["spark_home"] = "d:\analytics\spark\spark-1.4.0-bin-hadoop2.6" pyspark import sparkcontext, sparkconf pyspark.sql import sqlcontext spark_config = sparkconf().setmaster("local[8]") sc = sparkcontext(conf=spark_config) sqlcontext = sqlcontext(sc) df = (sqlcontext .load(source="jdbc", url="jdbc:postgresql://[hostname]/[database]?user=[username]&password=[password]", dbtable="pubs") ) sc.stop()
but getting following error:
py4jjavaerror: error occurred while calling o22.load. : java.sql.sqlexception: no suitable driver found jdbc:postgresql://uklonana01/stonegate?user=analytics&password=pmoe8jyd @ java.sql.drivermanager.getconnection(unknown source) @ java.sql.drivermanager.getconnection(unknown source) @ org.apache.spark.sql.jdbc.jdbcrdd$.resolvetable(jdbcrdd.scala:118) @ org.apache.spark.sql.jdbc.jdbcrelation.<init>(jdbcrelation.scala:128) @ org.apache.spark.sql.jdbc.defaultsource.createrelation(jdbcrelation.scala:113) @ org.apache.spark.sql.sources.resolveddatasource$.apply(ddl.scala:265) @ org.apache.spark.sql.dataframereader.load(dataframereader.scala:114) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(unknown source) @ sun.reflect.delegatingmethodaccessorimpl.invoke(unknown source) @ java.lang.reflect.method.invoke(unknown source) @ py4j.reflection.methodinvoker.invoke(methodinvoker.java:231) @ py4j.reflection.reflectionengine.invoke(reflectionengine.java:379) @ py4j.gateway.invoke(gateway.java:259) @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133) @ py4j.commands.callcommand.execute(callcommand.java:79) @ py4j.gatewayconnection.run(gatewayconnection.java:207) @ java.lang.thread.run(unknown source)
how can check whether i've downloaded right .jar file or else error might come from?
remove spark-defaults.conf , add spark_classpath system environment in python this:
os.environ["spark_classpath"] = 'path\\to\\postgresql-9.3-1101.jdbc41.jar'
Comments
Post a Comment