File Processing with Spark and Cassandra -


right i'm working on loading table cassandra cluster spark cluster datastax cassandra spark connector. right spark program performs simple mapreduce job counts number of rows in cassandra table. set , run locally.

the spark program works small cassandra table has string key column. when load in table has columns string id, , blob consists of file data, several errors (futures timeout error in spark workers, java out of memory exception on stdout of driver program).

my question whether spark can load elements contain blobs of around 1mb cassandra , perform mapreduce jobs on them, or if elements supposed divided smaller pieces before being processed spark mapreduce job.

originally using 'sbt run' start application.

once able use spark-submit launch application, worked fine. yes, files under 10 mb can stored column of type blob. spark mapreduce ran 200 rows.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -