java - Data inconsistency using ConcurrentHashMap -

September 15, 2012

the count changes every run same set of files. following code still not data consistent. how make thread safe? simple word count code.

package concurrenthashmapdemo;  import java.io.bufferedreader; import java.io.file; import java.io.filereader; import java.util.map; import java.util.concurrent.concurrenthashmap; import java.util.concurrent.concurrentmap;  class filereadertask implements runnable {     private string filepath;     private string filename;     private concurrentmap<string, integer> wordcountmap;      public filereadertask(string filepath, string filename,             concurrentmap<string, integer> wordcountmap) {         this.filepath = filepath;         this.filename = filename;         this.wordcountmap = wordcountmap;     }      public void run() {         file jobfile = new file(filepath + filename);         try {             bufferedreader breader = new bufferedreader(new filereader(jobfile));             string line = "";             while ((line = breader.readline()) != null) {                 string[] strarray = line.split(" ");                 (string str : strarray) {                     if (wordcountmap.containskey(str)) {                         wordcountmap.replace (str.trim(),                                 wordcountmap.get(str.trim()) + 1);                     } else {                         wordcountmap.putifabsent(str.trim(), 1);                     }                 }             }             //thread.sleep(10000);         } catch (exception e) {             // todo auto-generated catch block             e.printstacktrace();         }     } }  public class main {     public static void main(string[] args) {         concurrentmap<string, integer> wordcountmap = new concurrenthashmap<string, integer>();         file filedir = new file("c://job_files");         thread[] threads = new thread[filedir.listfiles().length];         for(int i=0;i<threads.length;i++){             filereadertask frt = new filereadertask("c:/job_files/", filedir.listfiles()[i].getname(), wordcountmap);             threads[i]= new thread(frt);             threads[i].start();         }         //         for(int i=0;i<threads.length;i++){         try {         threads[i].join();         } catch (interruptedexception e) {             // todo auto-generated catch block             e.printstacktrace();         }         }          for(map.entry<string, integer> entry: wordcountmap.entryset()){             string key = entry.getkey();             system.out.println(key +" - - "+wordcountmap.get(key));         }         system.out.println("main");     } }

the concurrent containers ensure internal consistency (for example not adding same key twice), nothing protect stored values. code stands has race condition. thread can increment counter between call get , call replace. replace puts wrong value in map, losing increment performed other thread.

you need make increment atomic. this, uses version of replace ensures value in map still same before peforming replacement:

str = str.trim(); while(true) {     integer oldvalue = wordcountmap.putifabsent(str, 1);     if(oldvalue != null) {         if(wordcountmap.replace(str, oldvalue, oldvalue + 1))           break; // incremented existing count     } else {         break; // added new count of 1     } }

Search This Blog

Macro

java - Data inconsistency using ConcurrentHashMap -

Comments

Post a Comment

Popular posts from this blog

symfony - TEST environment only: The database schema is not in sync with the current mapping file -

twig - Using Twigbridge in a Laravel 5.1 Package -

jdbc - Not able to establish database connection in eclipse -