Wednesday, March 30, 2011

Test, Test, Test

I am working on some of the relational code in our server and decided to test performance of a small function to see which alternative was faster.  The function tests if a string contains all whitespace and I thought one alternative may be much faster than the other. 

The first alternative uses regular expressions.  It does look more elegant, but I didn't look at the Java source code to see what it is actually doing.  Further, I don't use regular expressions much and thought I would take the opportunity to see if it would help me in my quest for performance.  Regular expressions can do some complex processing, so it makes sense they may not be real fast, but for some reason I thought this option may be better. 

The second alternative merely trims the string and evaluates whether the length of the trimmed string is zero (in which case, of course, the original string was all spaces).  This alternative creates a new String object each time it is called and thus I thought it may have some performance problems.

Here is the code:

public static void main(String[] args) {
  performanceTest("   ", 1000000, 1);
  performanceTest("   ", 1000000, 2);
  performanceTest(" bogus ", 1000000, 1);
  performanceTest(" bogus ", 1000000, 2);
}

static boolean isWhiteSpaces( String s ) {
  return s != null && s.matches("\\s*");
}

static boolean isWhiteSpaces2( String s ) {
  return s.trim().length() == 0;
}

public static void performanceTest(String s, int loops, int version) {
  long start = System.currentTimeMillis();

  for (int i = 0; i < loops; i++) {
    if (version == 1) {
      isWhiteSpaces(s);
    } else {
      isWhiteSpaces2(s);
    }
  }

  System.out.println("Total time for " + loops + 
                     " loops using algorithm #" + 
                     version + ": " + 
                 (System.currentTimeMillis() - start));
}

I was surprised at the difference in the performance:
Total time for 1000000 loops using algorithm #1: 1042
Total time for 1000000 loops using algorithm #2: 41
Total time for 1000000 loops using algorithm #1: 991
Total time for 1000000 loops using algorithm #2: 41
The second option, which trims the string, then compares the length, wins hands down.  Now I know which option I will use in my code.

3 comments:

Robb Salzmann said...

Hi Tim,

That makes sense - great tip. I think beneath the covers Java stores Strings as char arrays, so anytime you can get away with just using .length() to satisfy your parse, it should be as fast as myArray.length . Java (documentation claim that it will) try to optimize string mangling in its regex libraries, but for something simple like this, your idea is elegant and _easy_!. Leave regex's for the PERL nerds :)

Tim Tow said...

Actually, the trim() method returns a new String object as the String object itself is immutable. Generally, the best practice is to not create new objects due to performance reasons, but in this case, it seems to be the much more performant option. As I say, test, test, test!

Tim

Jason said...

Tim, you're absolutely right. Using regex's in this case is slow because the regex parser is a relatively heavy object. Using trim() creates a new String object which is also relatively heavy. Yet there is even another way to go that gets what you want without creating an intermediary String. It is more code than your solution, but under the covers it is probably less code, and my quick and dirty performance test indicates that it's a winner on my machine.

I have introduced a third test to your code, and to make the performance comparison even more dramatic, increased the iterations by a factor of 10. Here is the new function:


static boolean isWhiteSpaces3(String s) {
for (int i = 0; i < s.length(); i++) {
if (s.charAt(i) != ' ') {
return false;
}
}
return true;
}

And the results on my machine:

Total time for 10000000 loops using algorithm #1: 2722
Total time for 10000000 loops using algorithm #2: 92
Total time for 10000000 loops using algorithm #3: 66
Total time for 10000000 loops using algorithm #1: 2694
Total time for 10000000 loops using algorithm #2: 135
Total time for 10000000 loops using algorithm #3: 50

Note that, of course, my solution is ever so slightly different than yours because it only looks for a space character, not any sort of white space (tab, carriage return, new line).