File Processing with Spark and Cassandra -

May 15, 2011

right i'm working on loading table cassandra cluster spark cluster datastax cassandra spark connector. right spark program performs simple mapreduce job counts number of rows in cassandra table. set , run locally.

the spark program works small cassandra table has string key column. when load in table has columns string id, , blob consists of file data, several errors (futures timeout error in spark workers, java out of memory exception on stdout of driver program).

my question whether spark can load elements contain blobs of around 1mb cassandra , perform mapreduce jobs on them, or if elements supposed divided smaller pieces before being processed spark mapreduce job.

originally using 'sbt run' start application.

once able use spark-submit launch application, worked fine. yes, files under 10 mb can stored column of type blob. spark mapreduce ran 200 rows.

Search This Blog

Macro

File Processing with Spark and Cassandra -

Comments

Post a Comment

Popular posts from this blog

symfony - TEST environment only: The database schema is not in sync with the current mapping file -

twig - Using Twigbridge in a Laravel 5.1 Package -

jdbc - Not able to establish database connection in eclipse -