Python: multiple writes to different files -


import pickle import time  def save_dict(name, dict_to_save):     stime = time.time()     open(name, 'wb') output:         pickle.dump(dict_to_save, output, 1)     print 'done. (%.3f secs)' % (time.time() - stime)  class simpleobject(object):      def __init__(self, name):         self.name = name         return  obj_dict1 = {} obj_dict2 = {} obj_dict3 = {} in range(90000):     if < 30000:         obj_dict1[i] = simpleobject(i)     elif < 60000:         obj_dict2[i] = simpleobject(i)     else:         obj_dict3[i] = simpleobject(i)  save_dict('zzz.1', obj_dict1) save_dict('zzz.2', obj_dict2) save_dict('zzz.3', obj_dict3) 

output:

done. (1.997 secs) done. (2.067 secs) done. (2.020 secs) 

i writes happen in parallel i've tried using threads

import pickle import time import threading  def save_dict(name, dict_to_save):     stime = time.time()     open(name, 'wb') output:         pickle.dump(dict_to_save, output, 1)     print 'done. (%.3f secs)' % (time.time() - stime)  class simpleobject(object):      def __init__(self, name):         self.name = name         return  obj_dict1 = {} obj_dict2 = {} obj_dict3 = {} in range(90000):     if < 30000:         obj_dict1[i] = simpleobject(i)     elif < 60000:         obj_dict2[i] = simpleobject(i)     else:         obj_dict3[i] = simpleobject(i)   names =['zzz.1', 'zzz.2', 'zzz.3'] dicts = [obj_dict1, obj_dict2, obj_dict3] thrs = [threading.thread(target=save_dict, args=(info, data)) (info, data) in zip(names, dicts)] thr in thrs:     thr.start() thr in thrs:     thr.join() 

output:

done. (10.761 secs) done. (11.283 secs) done. (11.286 secs) 

but took more time; assuming due gil?

i've tried use multiprocessing got:

  file "multiwrite.py", line 30, in <module>     pool = multiprocessing.pool(processes=4)   file "/usr/lib64/python2.6/multiprocessing/__init__.py", line 227, in pool     return pool(processes, initializer, initargs)   file "/usr/lib64/python2.6/multiprocessing/pool.py", line 84, in __init__     self._setup_queues()   file "/usr/lib64/python2.6/multiprocessing/pool.py", line 131, in _setup_queues     self._inqueue = simplequeue()   file "/usr/lib64/python2.6/multiprocessing/queues.py", line 328, in __init__     self._rlock = lock()   file "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 117, in __init__     semlock.__init__(self, semaphore, 1, 1)   file "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 49, in __init__     sl = self._semlock = _multiprocessing.semlock(kind, value, maxvalue) oserror: [errno 13] permission denied 

so i've tried use os.fork() method wasnt sucessful @ it.

any suggestion make writes done in parallel?

when try write several files simultaneously, makes sense if spend more time computing data writing, or files on different physical devices.

bots hdds , ssds work much better sequential access. doing interleaved i/o hurts performance (think of constant write head repositioning).

this probable cause. go sequential, streamed i/o possible.

also, instead of being i/o-bound, task can cpu-bound, python's threading can hurt lock contention.

your program creates relatively small amount of data , writes them files. chances os first gets data file system cache entirely, , writes. of time in code may spent in pickle cpu-bound , executes 1 thread @ time. i've seen in practice, , quite noticeable on complicated object graphs, though data simple.


Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -