java - How is Guava Splitter.onPattern(..).split() different from String.split(..)? -


i harnessed power of look-ahead regular expression split string:

"abc8".split("(?=\\d)|\\w") 

if printed console expression returns:

[abc, 8] 

very pleased result, wanted transfer guava further development, looked this:

splitter.onpattern("(?=\\d)|\\w").split("abc8") 

to surprise output changed to:

[abc] 

why?

you found bug!

system.out.println(s.split("abc82")); // [abc, 8] system.out.println(s.split("abc8"));  // [abc] 

this method splitter uses split strings (splitter.splittingiterator::computenext):

@override protected string computenext() {   /*    * returned string end of last match    * beginning of next one. nextstart start position of    * returned substring, while offset place start looking    * separator.    */   int nextstart = offset;   while (offset != -1) {     int start = nextstart;     int end;      int separatorposition = separatorstart(offset);      if (separatorposition == -1) {       end = tosplit.length();       offset = -1;     } else {       end = separatorposition;       offset = separatorend(separatorposition);     }      if (offset == nextstart) {       /*        * occurs when pattern has empty match, if        * doesn't match empty string -- example, if requires        * lookahead or like. offset must increased        * separators beyond point, without changing start position        * of next returned substring -- nextstart stays same.        */       offset++;       if (offset >= tosplit.length()) {         offset = -1;       }       continue;     }      while (start < end && trimmer.matches(tosplit.charat(start))) {       start++;     }     while (end > start && trimmer.matches(tosplit.charat(end - 1))) {       end--;     }      if (omitemptystrings && start == end) {       // don't include (unused) separator in next split string.       nextstart = offset;       continue;     }      if (limit == 1) {       // limit has been reached, return rest of string       // final item.  tested after empty string removal       // empty strings not count towards limit.       end = tosplit.length();       offset = -1;       // since may have changed end, need trim again.       while (end > start && trimmer.matches(tosplit.charat(end - 1))) {         end--;       }     } else {       limit--;     }      return tosplit.subsequence(start, end).tostring();   }   return endofdata(); } 

the area of interest is:

if (offset == nextstart) {   /*    * occurs when pattern has empty match, if    * doesn't match empty string -- example, if requires    * lookahead or like. offset must increased    * separators beyond point, without changing start position    * of next returned substring -- nextstart stays same.    */   offset++;   if (offset >= tosplit.length()) {     offset = -1;   }   continue; } 

this logic works great, unless empty match happens @ end of string. if empty match does occur @ end of string, end skipping character. part should (notice >= -> >):

if (offset == nextstart) {   /*    * occurs when pattern has empty match, if    * doesn't match empty string -- example, if requires    * lookahead or like. offset must increased    * separators beyond point, without changing start position    * of next returned substring -- nextstart stays same.    */   offset++;   if (offset > tosplit.length()) {     offset = -1;   }   continue; } 

Comments

Popular posts from this blog

powershell Start-Process exit code -1073741502 when used with Credential from a windows service environment -

twig - Using Twigbridge in a Laravel 5.1 Package -

c# - LINQ join Entities from HashSet's, Join vs Dictionary vs HashSet performance -