java - How is Guava Splitter.onPattern(..).split() different from String.split(..)? -
i harnessed power of look-ahead regular expression split string:
"abc8".split("(?=\\d)|\\w")
if printed console expression returns:
[abc, 8]
very pleased result, wanted transfer guava further development, looked this:
splitter.onpattern("(?=\\d)|\\w").split("abc8")
to surprise output changed to:
[abc]
why?
you found bug!
system.out.println(s.split("abc82")); // [abc, 8] system.out.println(s.split("abc8")); // [abc]
this method splitter
uses split string
s (splitter.splittingiterator::computenext
):
@override protected string computenext() { /* * returned string end of last match * beginning of next one. nextstart start position of * returned substring, while offset place start looking * separator. */ int nextstart = offset; while (offset != -1) { int start = nextstart; int end; int separatorposition = separatorstart(offset); if (separatorposition == -1) { end = tosplit.length(); offset = -1; } else { end = separatorposition; offset = separatorend(separatorposition); } if (offset == nextstart) { /* * occurs when pattern has empty match, if * doesn't match empty string -- example, if requires * lookahead or like. offset must increased * separators beyond point, without changing start position * of next returned substring -- nextstart stays same. */ offset++; if (offset >= tosplit.length()) { offset = -1; } continue; } while (start < end && trimmer.matches(tosplit.charat(start))) { start++; } while (end > start && trimmer.matches(tosplit.charat(end - 1))) { end--; } if (omitemptystrings && start == end) { // don't include (unused) separator in next split string. nextstart = offset; continue; } if (limit == 1) { // limit has been reached, return rest of string // final item. tested after empty string removal // empty strings not count towards limit. end = tosplit.length(); offset = -1; // since may have changed end, need trim again. while (end > start && trimmer.matches(tosplit.charat(end - 1))) { end--; } } else { limit--; } return tosplit.subsequence(start, end).tostring(); } return endofdata(); }
the area of interest is:
if (offset == nextstart) { /* * occurs when pattern has empty match, if * doesn't match empty string -- example, if requires * lookahead or like. offset must increased * separators beyond point, without changing start position * of next returned substring -- nextstart stays same. */ offset++; if (offset >= tosplit.length()) { offset = -1; } continue; }
this logic works great, unless empty match happens @ end of string
. if empty match does occur @ end of string
, end skipping character. part should (notice >=
-> >
):
if (offset == nextstart) { /* * occurs when pattern has empty match, if * doesn't match empty string -- example, if requires * lookahead or like. offset must increased * separators beyond point, without changing start position * of next returned substring -- nextstart stays same. */ offset++; if (offset > tosplit.length()) { offset = -1; } continue; }
Comments
Post a Comment