java - Data inconsistency using ConcurrentHashMap -
the count changes every run same set of files. following code still not data consistent. how make thread safe? simple word count code.
package concurrenthashmapdemo; import java.io.bufferedreader; import java.io.file; import java.io.filereader; import java.util.map; import java.util.concurrent.concurrenthashmap; import java.util.concurrent.concurrentmap; class filereadertask implements runnable { private string filepath; private string filename; private concurrentmap<string, integer> wordcountmap; public filereadertask(string filepath, string filename, concurrentmap<string, integer> wordcountmap) { this.filepath = filepath; this.filename = filename; this.wordcountmap = wordcountmap; } public void run() { file jobfile = new file(filepath + filename); try { bufferedreader breader = new bufferedreader(new filereader(jobfile)); string line = ""; while ((line = breader.readline()) != null) { string[] strarray = line.split(" "); (string str : strarray) { if (wordcountmap.containskey(str)) { wordcountmap.replace (str.trim(), wordcountmap.get(str.trim()) + 1); } else { wordcountmap.putifabsent(str.trim(), 1); } } } //thread.sleep(10000); } catch (exception e) { // todo auto-generated catch block e.printstacktrace(); } } } public class main { public static void main(string[] args) { concurrentmap<string, integer> wordcountmap = new concurrenthashmap<string, integer>(); file filedir = new file("c://job_files"); thread[] threads = new thread[filedir.listfiles().length]; for(int i=0;i<threads.length;i++){ filereadertask frt = new filereadertask("c:/job_files/", filedir.listfiles()[i].getname(), wordcountmap); threads[i]= new thread(frt); threads[i].start(); } // for(int i=0;i<threads.length;i++){ try { threads[i].join(); } catch (interruptedexception e) { // todo auto-generated catch block e.printstacktrace(); } } for(map.entry<string, integer> entry: wordcountmap.entryset()){ string key = entry.getkey(); system.out.println(key +" - - "+wordcountmap.get(key)); } system.out.println("main"); } }
the concurrent containers ensure internal consistency (for example not adding same key twice), nothing protect stored values. code stands has race condition. thread can increment counter between call get
, call replace
. replace
puts wrong value in map, losing increment performed other thread.
you need make increment atomic. this, uses version of replace
ensures value in map still same before peforming replacement:
str = str.trim(); while(true) { integer oldvalue = wordcountmap.putifabsent(str, 1); if(oldvalue != null) { if(wordcountmap.replace(str, oldvalue, oldvalue + 1)) break; // incremented existing count } else { break; // added new count of 1 } }
Comments
Post a Comment