Before I unveil the atrocities of the String class in Java we are going to do a simple test. In order to get the most accurate results from the test you should try to do it as if it was a normal task at work. By the very fact that you know this is a test you will probably not do it like it is a normal task at work but at least try not to read the source code of the Java methods you are going to call. So here is the task – Write a method that given a string replaces all the occurrences of the substring "…" with the string ",,,". For example given the input " a.,...b...c." the method should return "a.,,,,b,,,c.".
Ready?
Was it easy?
Did you need to fix it after the initial test run?
Did you need to consult the documentation?
Well I believe this is enough scrolling already :)
If you have a little bit of experience you are probably thinking "What a noob question. Of course I will remember to escape the '.' because replaceAll takes a regex and yes I DO know that when I am escaping characters in regex I do it with two backslashes instead of one."
Chances are most Java developers wrote something like this:
input.replaceAll("\\.\\.\\.", ",,,")
Maybe you are even more advanced developer and you are thinking "This is ugly. Of course I would use the quote method of the Pattern class to escape all these stupid backslashes". Maybe you wrote code like this:
input.replaceAll(Pattern.quote("..."), ",,,")
I have asked several professional Java developers with multiple years of experience this question and all of them went for one of the above answers. If you did you failed the test. In fact it is not your fault. It is the String public API that failed. You just did the most logical thing.
The great evil, the unspeakable Lovecraftian horror of the String class is the method
replaceAll . First of all what evil mind decided to pollute the String public API with a regular expression? When working with String I expect simple string operations. If I wanted to use regular expressions I would head for the Pattern class. Simple substring replace is done much more often than regex replace and you need to do all this escaping. By the way did you know that the replacement string needs escaping too? Try adding a couple of backslashes in front of every "$" in a string and see what happens. If for some bizarre reason you do believe that regular expressions belongs in String's public API then why is it only in replaceAll? Why not an overload of substring that returns a substring matched by a regular expression?
However this is not the big problem. The big problem is the name of this method. The name obscures the much more useful replace method. If you want to replace all occurrences of a substring in a string you are probably heading for replace but the moment you see replaceAll pop into auto complete you immediately head for it. It makes sense, right? I will try to guess what the Java developer reading this is thinking right now once again – "Wait! What replace method are you talking about. It does not do the job!" I am referring to the overload of the replace method that was added in Java 1.5 more than 6 years ago that has the signature
public String replace(CharSequence target, CharSequence replacement). You have never heard of
CharSequence? It turns out it is an interface implemented by String, StringBuilder and some others. This is cool from OOP point of view but is not the friendliest way to find the method when you have two strings and are looking for a suitable method in the auto complete especially when replaceAll that conveniently takes two strings is right beneath it. Here is what most developers would probably have done if replaceAll did not exist or was called something different like replaceRegex:
input.replace("...", ",,,")
Nice and simple.
At first I was going to point out the inevitable albeit small performance hit of spinning up a regular expression engine an instantiating all the necessary classes to do a simple string replace with a regular expression but as it turns out the replace implementation just calls replaceAll escaping the arguments. While a special implementation of the regular expression engine is used when the regular expression contains only character literals and not any special characters it still instantiates a double digit number of objects and the call stack goes quite deep. This is not necessarily what you want to do for a simple replace method in something as fundamental as the String class.
When the replaceAll method was first written in the dawn of time for the first version of Java probably no one gave it much thought. I doubt the creators of Java expected that the platform will become so big and will be used by so many developers. They probably had a deadline to meet so do not blame them too much. Just use the replace method and stop writing stupid articles all over the Internet on how to replace strings using replaceAll. Right now the Google results for "
replace substring java" make no mention of the replace method. Several mention replaceAll and some of them provide alternative implementations of replace but I hope this will change in the future.
Please post how you solved the problem in the comments and be honest! I am curious what the results will be.