jdbc - Connecting from Spark/pyspark to PostgreSQL -


i've installed spark on windows machine , want use via spyder. after troubleshooting basics seems work:

import os  os.environ["spark_home"] = "d:\analytics\spark\spark-1.4.0-bin-hadoop2.6"  pyspark import sparkcontext, sparkconf pyspark.sql import sqlcontext  spark_config = sparkconf().setmaster("local[8]") sc = sparkcontext(conf=spark_config)  sqlcontext = sqlcontext(sc)  textfile = sc.textfile("d:\\analytics\\spark\\spark-1.4.0-bin-hadoop2.6\\readme.md") textfile.count() textfile.filter(lambda line: "spark" in line).count()  sc.stop() 

this runs expected. want connect postgres9.3 database running on same server. have downloaded jdbc driver here here , have put in folder d:\analytics\spark\spark_jars. i've created new file d:\analytics\spark\spark-1.4.0-bin-hadoop2.6\conf\spark-defaults.conf containing line:

spark.driver.extraclasspath        'd:\\analytics\\spark\\spark_jars\\postgresql-9.3-1103.jdbc41.jar' 

i've ran following code test connection

import os  os.environ["spark_home"] = "d:\analytics\spark\spark-1.4.0-bin-hadoop2.6"  pyspark import sparkcontext, sparkconf pyspark.sql import sqlcontext  spark_config = sparkconf().setmaster("local[8]") sc = sparkcontext(conf=spark_config)  sqlcontext = sqlcontext(sc)  df = (sqlcontext     .load(source="jdbc",           url="jdbc:postgresql://[hostname]/[database]?user=[username]&password=[password]",           dbtable="pubs")  ) sc.stop() 

but getting following error:

py4jjavaerror: error occurred while calling o22.load. : java.sql.sqlexception: no suitable driver found     jdbc:postgresql://uklonana01/stonegate?user=analytics&password=pmoe8jyd @ java.sql.drivermanager.getconnection(unknown source) @ java.sql.drivermanager.getconnection(unknown source) @ org.apache.spark.sql.jdbc.jdbcrdd$.resolvetable(jdbcrdd.scala:118) @ org.apache.spark.sql.jdbc.jdbcrelation.<init>(jdbcrelation.scala:128) @ org.apache.spark.sql.jdbc.defaultsource.createrelation(jdbcrelation.scala:113) @ org.apache.spark.sql.sources.resolveddatasource$.apply(ddl.scala:265) @ org.apache.spark.sql.dataframereader.load(dataframereader.scala:114) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(unknown source) @ sun.reflect.delegatingmethodaccessorimpl.invoke(unknown source) @ java.lang.reflect.method.invoke(unknown source) @ py4j.reflection.methodinvoker.invoke(methodinvoker.java:231) @ py4j.reflection.reflectionengine.invoke(reflectionengine.java:379) @ py4j.gateway.invoke(gateway.java:259) @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133) @ py4j.commands.callcommand.execute(callcommand.java:79) @ py4j.gatewayconnection.run(gatewayconnection.java:207) @ java.lang.thread.run(unknown source) 

how can check whether i've downloaded right .jar file or else error might come from?

remove spark-defaults.conf , add spark_classpath system environment in python this:

os.environ["spark_classpath"] = 'path\\to\\postgresql-9.3-1101.jdbc41.jar' 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -