An Easy Way to Improve Code Readability

An Easy Way to Improve Code Readability
   Readable code has a lot of properties. Following conventions, consistent formatting, proper naming and even architecture affect readability. There is one aspect of readable code that I find often neglected and even broken on purpose for dubious reasons. Good code reads almost like a natural language sentence. While I am all for conventions and formatting I think staying closer to natural language is the most impactful feature of readable code.

Consider the following code

   refreshButton.Height = 100;

See how it reads like a sentence.

"If refresh button is visible make the height of the refresh button 100"


"If refresh button is visible set the height of the refresh button to 100"

Readable Code Is in English

Your programming language is in English and your standard library is in English so your own code should be in English too. Consider what happens if I was using my native Bulgarian

   бутонОпресняване.Height = 100;

It makes it hard even for a native Bulgarian to read when we start mixing languages. The code will become even harder if some methods and properties were in Bulgarian and others (from the framework) in English. Note that this is Bulgarian - a language with sentence structure relatively similar to English. I cannot even imagine what happens if someone tries to do the same thing with Arabic, Chinese or Hindi. If someone wants to become a programmer he should learn (written, technical) English. It is not optional and this is why I do not write programming articles in Bulgarian.

Readable Code Avoids Unnatural Sentences (When Possible)

Some programmers will write the code above like this

if(refreshButton.IsVisible == true)
   refreshButton.Height = 100;

The equivalent sentence will be:
"If it is truth that refresh button is visible make the height of the refresh button 100"

Nobody talks like this and there is no reason to write it in code unless you are using a language where that is technically important (for example JavaScript) because of some weird automatic conversion. If there is a technical reason to compare with boolean please do so but I have seen people claim that comparing to true/false makes the code more readable and code should be written like this in languages like Java or C#. I strongly disagree. I believe they have simply internalized the flaws of specific languages. Which brings us to...


Consider this code

decimal grossSalary = netSalary + incomeTax;

I believe no one would ever think that it would be better as

decimal grossSalary = SumOf(netSalary, incomeTax);


decimal grossSalary = netSalary plus incomeTax;

The reason that we find the first version to be more readable is that we have internalized Math notation. All the way through school we study this DSL and it feels as natural to us as natural languages and in fact we prefer it for Math-related tasks. This applies not only to Math but also to all terminology specific to certain domains. Feel free to name a variable representing time in a Physics computation t. Everyone who knows anything about Physics knows that t means time and we actually talk like this even outside the domain of Physics ("T minus 5").

    Internalizing also applies to programming language syntax and libraries. We have internalized the dot as meaning "'s" - person.Salary means "person's salary" or alternatively "the salary of the person". We have internalized "=" to mean "assign" or "becomes", we have internalized "i" as the loop variable (index) and calling it something else will not improve readability (what is more what are you gonna do when you need two dimensions and Math tells you a matrix is indexed with "i" and "j".


   Conventions are of course important. They help us internalize things faster and we can recognize things by the name, casing and indentation. The well-known conventions are already quite suitable for writing code that looks like an English sentence. Name your classes with nouns, name your methods with verbs, name your variables and method parameters with nouns, avoid abbreviations that are not common in the problem domain (i.e. HTML), etc. However we should not be dogmatic with naming conventions because sometimes there can be conventions that prevent the most natural English sentence in a specific scenario. For example the API of the Ninject IoC container is in my opinion a thing of beauty that happens to violate a lot of conventions and even abuse some language features to achieve that English-like code:


"To" and "InRequestScope" are certainly not properly named methods according to conventions since there are no verbs in their names but the end result is quite readable. The moral of the story is that we can achieve greater readability by having not only good naming conventions but also violating them if we can find a better way to achieve code that looks more like natural language. Another example I have seen in practice is some Java tools and Java programmers insisting on naming booleans with names such as "isExist". Obviously a simple "exists" wins although it violates the convention.

How Can Languages Help?

   If you asked me what was the single feature that made C# better than Java when I was younger I would probably talk about how lambdas let you write this beautiful functional code. Now when I am older I have quite different answer. The most important feature is properties. Consider this code:

if(person.Age > 18)  
   person.IdentificationCard = new IdentificationCard(person.Name);

We can easily read it as "If person's age is greater than 18 person's identification card becomes a new identification card with person's name".

Now consider this Java version

if(person.getAge() > 18)
   person.setIdentificationCard(new IdentificationCard(person.getName()));

How do we read it? "If the person's age we get is greater than 18 set the identification card of the person with new identification card with the name of the person we get"? Even if we accept that we have internalized the get/set convention enough (which I fail to do even after all these years) the code still has insane amount of parenthesis that disrupt the flow of my reading. As a matter of fact it has almost as many parenthesis as the Lisp version which has 12 vs 10 for the Java version. Tell me again how Lisp has too many parenthesis but Java is fine!

(if ( > (person-age person) 18) (set-identification-card person (make-identification-card (person-name person))))

   So one thing languages can do is introduce features which let us express common concept in a way that is closer to natural language and have less syntax or alternatively make the syntax map to a common, well-known DSL such as Math notation. Operator overloading is a fine example for the latter and properties are example for the former.

   However the natural language aspect of readability is not the only concern with source code and even not the only concern with readability. For example removing static typing can reduce readability by not giving the reader enough information about the types of the object but this is actually rare. What is not rare is that dynamic typing makes the tooling inferior and makes reporting a certain class of errors at compile time impossible which is certainly a loss that may not be worth the improvement in readability. In addition natural languages can be ambiguous and even if the actual programming language is not ambiguous it may be confusing for the reader without additional syntax. For example I can never remember if && or || has higher precedence and therefore code that does not add parenthesis for boolean expressions involving both && and || is less readable to me (and I assume most programmers). Same can happen with a language of too few braces and parenthesis. In addition I have seen a presentation (or maybe an article?) by the great Walter Bright where he claims that redundant syntax is needed for more accurate error reporting. I certainly would not want to deal with weird compiler errors reported 20 lines after the actual place where I made the error just so I can get rid of semicolons at the end of each statement. Another example here is Ruby's unless statement that is reverse if. It works well to make the code read more like a natural language but I am not sure I agree with the existence of what is effectively a second if statement just to avoid the admittedly bad if(!something). On the other hand I really wish C creators had used "and" and "or" instead of && and ||. After so many years I still find the keywords approach significantly more readable (and I see it in languages like SQL). I can see no downside in using keywords for these operators instead of symbols.

   So here you have it - a way to make code more readable that results in significant improvements without much investment. You do not need to have decades of experience to be able to recognize natural sentences so even beginners can aim for improvements in readability. However do not get stuck chasing this type of readability at all costs because you may end up sacrificing too much time or other aspects of quality code. Just be sure to pick the low-hanging fruit.
Tags:   english programming 
Posted by:   Stilgar
16:33 14.05.2016


First Previous 1 Next Last 

Posted by   wqw (Unregistered)   on   17:25 14.05.2016

Was the && vs || precedence a lie to enhance the story? Just map these to * and + to leverage existing knowledge from school. Then using keywords for and/or is like using 3 mult 5 plus 4 in math -- yes, much readble for 6y olds.

Btw, the original symbols were not doubled -- the bitwise & and |

Posted by   Stilgar   on   18:18 14.05.2016

No, I really cannot remember which one has precedence. If I map them to * and + I would need to think about why one maps to * and the other to + which is far from obvious.

Posted by   Guest (Unregistered)   on   12:44 19.05.2016

Probably a typo, there is no language called Hindu. It is Hindi.

Posted by   Stilgar   on   10:48 20.05.2016

Thanks. Fixed.

First Previous 1 Next Last 

Post as:

Post a comment: