Java's String Class Contains Something Utterly Evil

   Before I unveil the atrocities of the String class in Java we are going to do a simple test. In order to get the most accurate results from the test you should try to do it as if it was a normal task at work. By the very fact that you know this is a test you will probably not do it like it is a normal task at work but at least try not to read the source code of the Java methods you are going to call. So here is the task – Write a method that given a string replaces all the occurrences of the substring "…" with the string ",,,". For example given the input " a.,...b...c." the method should return "a.,,,,b,,,c.".



Ready?



Was it easy?



Did you need to fix it after the initial test run?



Did you need to consult the documentation?



Well I believe this is enough scrolling already :)



   If you have a little bit of experience you are probably thinking "What a noob question. Of course I will remember to escape the '.' because replaceAll takes a regex and yes I DO know that when I am escaping characters in regex I do it with two backslashes instead of one."

   Chances are most Java developers wrote something like this:

   input.replaceAll("\\.\\.\\.", ",,,")

   Maybe you are even more advanced developer and you are thinking "This is ugly. Of course I would use the quote method of the Pattern class to escape all these stupid backslashes". Maybe you wrote code like this:

   input.replaceAll(Pattern.quote("..."), ",,,")

   I have asked several professional Java developers with multiple years of experience this question and all of them went for one of the above answers. If you did you failed the test. In fact it is not your fault. It is the String public API that failed. You just did the most logical thing.

   The great evil, the unspeakable Lovecraftian horror of the String class is the method replaceAll . First of all what evil mind decided to pollute the String public API with a regular expression? When working with String I expect simple string operations. If I wanted to use regular expressions I would head for the Pattern class. Simple substring replace is done much more often than regex replace and you need to do all this escaping. By the way did you know that the replacement string needs escaping too? Try adding a couple of backslashes in front of every "$" in a string and see what happens. If for some bizarre reason you do believe that regular expressions belongs in String's public API then why is it only in replaceAll? Why not an overload of substring that returns a substring matched by a regular expression?

   However this is not the big problem. The big problem is the name of this method. The name obscures the much more useful replace method. If you want to replace all occurrences of a substring in a string you are probably heading for replace but the moment you see replaceAll pop into auto complete you immediately head for it. It makes sense, right? I will try to guess what the Java developer reading this is thinking right now once again – "Wait! What replace method are you talking about. It does not do the job!" I am referring to the overload of the replace method that was added in Java 1.5 more than 6 years ago that has the signature public String replace(CharSequence target, CharSequence replacement). You have never heard of CharSequence? It turns out it is an interface implemented by String, StringBuilder and some others. This is cool from OOP point of view but is not the friendliest way to find the method when you have two strings and are looking for a suitable method in the auto complete especially when replaceAll that conveniently takes two strings is right beneath it. Here is what most developers would probably have done if replaceAll did not exist or was called something different like replaceRegex:

   input.replace("...", ",,,")

Nice and simple.

   At first I was going to point out the inevitable albeit small performance hit of spinning up a regular expression engine an instantiating all the necessary classes to do a simple string replace with a regular expression but as it turns out the replace implementation just calls replaceAll escaping the arguments. While a special implementation of the regular expression engine is used when the regular expression contains only character literals and not any special characters it still instantiates a double digit number of objects and the call stack goes quite deep. This is not necessarily what you want to do for a simple replace method in something as fundamental as the String class.

   When the replaceAll method was first written in the dawn of time for the first version of Java probably no one gave it much thought. I doubt the creators of Java expected that the platform will become so big and will be used by so many developers. They probably had a deadline to meet so do not blame them too much. Just use the replace method and stop writing stupid articles all over the Internet on how to replace strings using replaceAll. Right now the Google results for "replace substring java" make no mention of the replace method. Several mention replaceAll and some of them provide alternative implementations of replace but I hope this will change in the future.

   Please post how you solved the problem in the comments and be honest! I am curious what the results will be.
Tags:   english programming 
Posted by:   Stilgar
13:11 28.02.2011

Comments:

First Previous 1 Next Last 

Posted by   JOKe (Unregistered)   on   15:17 28.02.2011

ok let me put some light because I was not a java developer that you have asked for replace.. :( huh.

so...
the replace method with 2 chars since the beginning of time :)
the replace method with 2 CharSequences since 1.5 !!!! not very people are aware of it this is true.
replaceAll since 1.4 ! it has been called all because there was a replace method with 2 chars at that time and they choose a new name.(keep in mind Pattern and regexp shits are also since 1.4).

so in 1.5 I think some developer from sun decided that this 2 methods dont do the simple task that you are speaking about so he added a replace with this charSequence...  clear right ?

Posted by   Stilgar   on   15:36 28.02.2011

WTF? They added replaceAll in 1.4? How were they replacing in string before?

Anyway my point is that the bad naming and some other unfortunate circumstances obscure the much more useful replace method.

Posted by   Stilgar   on   17:04 28.02.2011

BTW in Ruby you do it like this:
input["..."]= ",,,"
(the string is mutated)

Posted by   JOKe (Unregistered)   on   17:19 28.02.2011

Stilgar : yeah the bad naming is real yep.
how they ware doing it before that ?
probably with code yeah I know it sounds scary ... code.. but :) still..

p.s. did you check the content of replace method ?
public String replace(CharSequence target, CharSequence replacement) {
       return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
           this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
   }

:D

Posted by   Stilgar   on   17:22 28.02.2011

Yeah I do mention it in the article :)

Posted by   Ricardo (Unregistered)   on   19:22 28.02.2011

I must admit that my try was:
String text = " a.,...b...c.";
String newText = text.replaceAll("...", ",,,");
System.out.println("The Text:" + text);
System.out.println("The repl:" + newText);

When I saw the result, I was "WTF!?"

Posted by   Stilgar   on   19:25 28.02.2011

Ricardo how did you fix it after you saw the result?

Posted by   Anon (Unregistered)   on   19:51 28.02.2011

Perhaps this shows that I program the wrong way for Java (I'm not what you would call good with Java) but I had to look stuff up in Google to answer your question (I googled for "Java string"). I landed at http://download.oracle.com/javase/1.5.0/docs/api/java/lang/String.html and scrolled down to the replace section (I guess I thought it would be called substring or something like that). I saw both the replace methods but the first only took a Character. I clicked on the CharSequence link of the second replace method and that in turn told me it was implemented by String (among other things). I didn't know any better so I thought I would just use that.

After stumbling (I had forgotten to start my class with class) I ended up with the following hack:
       public static void main(String args[]) {
               String replaced = args[0].replace("...", ",,,");
               System.out.println(replaced);
       }

Typed up in vim without autocomplete plugins. I thought your post was going to be all about how using replace was a trap but it turns out it was replaceAll you didn't like. Perhaps your complaint is about IDEs very occasionally leading programmers the wrong way?

Posted by   Stilgar   on   20:02 28.02.2011

I admire people who always read the documentation before starting to code :) I personally love IDEs and believe languages like C# and Java are a great boost to productivity mainly because they are friendly to IDEs. They are also supposed to guard stupid or lazy programmers who don't read documentation. This is why I believe the existance of this method is a failure.

I wonder what you would have done if you scrolled to the replaceAll method first :)

Posted by   Anon (Unregistered)   on   21:57 28.02.2011

> I admire people who always read the documentation before starting to code :)

I had no choice (I wasn't using an IDE) - I didn't know what the method was called so I *had* to look it up. I didn't look up System.out :)

> I wonder what you would have done if you scrolled to the replaceAll method first :)

Well the very "first" replace on the documentation page was actually unsuitable because it only accepted Char... I did notice replaceAll too (it was very close on the page) but I thought a straight substitution function would be easier so I went with the non regex version (and I thought that you were going to point out a bug in the naive usage of replace). I suspect I would have continued looking for a non-regex replace if I had seen replaceAll first because I've seen simple replacement functions in other languages (I was actually looking for something called substring initially which shows how unfamiliar with Java I am) so there would be a reason for me to continue looking for it.

This sort of problem traps you if you don't know the area at all (never programmed that area dependent on the first hit) or you are an expert (know area inside out but knowledge is a bit stale dependent on the first hit).

Posted by   Stilgar   on   22:26 28.02.2011

Yeah I wonder how many will go the path you did and how many will use replaceAll. The people I asked went for replaceAll straight away. Normally you don't think much and don't google when you have such a simple task. This is exactly the kind of problems auto complete is supposed to help you solve.

Posted by   Conclusion (Unregistered)   on   18:54 01.03.2011

Java is crap

Posted by   JC (Unregistered)   on   20:31 02.03.2011

I just did:
       System.out.println(" a.,...b...c.".replaceAll("\\.{3}", ",,,"));

What I find shocking is that someone is writing code without actually taking time to understand the API. That's fine for a college student, but you should really learn your tools.

Posted by   Stilgar   on   20:48 02.03.2011

@JC shocked by yourself? :)

Posted by   JOKe (Unregistered)   on   13:55 15.03.2011

..................... lol great there is one strange method :D wtf who cares :D

First Previous 1 Next Last 


Post as:



Post a comment: