Python: multiple writes to different files -
import pickle import time def save_dict(name, dict_to_save): stime = time.time() open(name, 'wb') output: pickle.dump(dict_to_save, output, 1) print 'done. (%.3f secs)' % (time.time() - stime) class simpleobject(object): def __init__(self, name): self.name = name return obj_dict1 = {} obj_dict2 = {} obj_dict3 = {} in range(90000): if < 30000: obj_dict1[i] = simpleobject(i) elif < 60000: obj_dict2[i] = simpleobject(i) else: obj_dict3[i] = simpleobject(i) save_dict('zzz.1', obj_dict1) save_dict('zzz.2', obj_dict2) save_dict('zzz.3', obj_dict3)
output:
done. (1.997 secs) done. (2.067 secs) done. (2.020 secs)
i writes happen in parallel i've tried using threads
import pickle import time import threading def save_dict(name, dict_to_save): stime = time.time() open(name, 'wb') output: pickle.dump(dict_to_save, output, 1) print 'done. (%.3f secs)' % (time.time() - stime) class simpleobject(object): def __init__(self, name): self.name = name return obj_dict1 = {} obj_dict2 = {} obj_dict3 = {} in range(90000): if < 30000: obj_dict1[i] = simpleobject(i) elif < 60000: obj_dict2[i] = simpleobject(i) else: obj_dict3[i] = simpleobject(i) names =['zzz.1', 'zzz.2', 'zzz.3'] dicts = [obj_dict1, obj_dict2, obj_dict3] thrs = [threading.thread(target=save_dict, args=(info, data)) (info, data) in zip(names, dicts)] thr in thrs: thr.start() thr in thrs: thr.join()
output:
done. (10.761 secs) done. (11.283 secs) done. (11.286 secs)
but took more time; assuming due gil?
i've tried use multiprocessing got:
file "multiwrite.py", line 30, in <module> pool = multiprocessing.pool(processes=4) file "/usr/lib64/python2.6/multiprocessing/__init__.py", line 227, in pool return pool(processes, initializer, initargs) file "/usr/lib64/python2.6/multiprocessing/pool.py", line 84, in __init__ self._setup_queues() file "/usr/lib64/python2.6/multiprocessing/pool.py", line 131, in _setup_queues self._inqueue = simplequeue() file "/usr/lib64/python2.6/multiprocessing/queues.py", line 328, in __init__ self._rlock = lock() file "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 117, in __init__ semlock.__init__(self, semaphore, 1, 1) file "/usr/lib64/python2.6/multiprocessing/synchronize.py", line 49, in __init__ sl = self._semlock = _multiprocessing.semlock(kind, value, maxvalue) oserror: [errno 13] permission denied
so i've tried use os.fork() method wasnt sucessful @ it.
any suggestion make writes done in parallel?
when try write several files simultaneously, makes sense if spend more time computing data writing, or files on different physical devices.
bots hdds , ssds work much better sequential access. doing interleaved i/o hurts performance (think of constant write head repositioning).
this probable cause. go sequential, streamed i/o possible.
also, instead of being i/o-bound, task can cpu-bound, python's threading can hurt lock contention.
your program creates relatively small amount of data , writes them files. chances os first gets data file system cache entirely, , writes. of time in code may spent in pickle
cpu-bound , executes 1 thread @ time. i've seen in practice, , quite noticeable on complicated object graphs, though data simple.
Comments
Post a Comment