Programing Language Typing Disciplines

   Recently I run into another case of developers who are confused by all the static vs. dynamic and weak vs. strong type system classification. And you know what?  This is actually OK. Usually one can be a good developer in one type of language and understand how it works without knowing all the kinds of type systems and the terminology of the subject. However I still find it useful to try to explain it even though there are other and probably better resources out there. It is especially useful since most of my readers use C# and C# developers can benefit the most from this knowledge because C# combines several typing approaches.

   

Dynamic vs. Static


   The most popular separation of type systems is by when type checks are done. If type checks take place at compile time the language is said to be statically typed. If however type checks are done at runtime the language is said to be dynamically typed. Languages can employ a mix of static and dynamic typing and they can check some types at runtime and another at compile time or they can let the developer pick when types of specific variables should be checked which is the case with C#'s "dynamic" feature.

   When checks are done at compile time some common coding errors can be avoided. The compiler will not let values of incompatible types be used as method arguments, operands and in variable assignments. What is more if a change is made to some part of the code all parts that use this code will be checked and if the change breaks the client code the compiler will issue an error. Statically typed code is to some extend self-documenting. If the types are known at compile time tools like IDEs can provide information via features like intellisense (auto complete). Developers do not need to consult the documentation to know what the return type of a method is or what the types of the arguments are. Developers do not like writing documentation and even if they do documentation can "rot" because the code can be changed without the documentation being updated. Finally tools for code completion, refactoring and static analysis are much easier to develop and become much more powerful for statically typed languages.

   However static typing has its downsides. Sometimes writing in a statically typed language is like having an argument with the compiler. Sometimes the developer knows things that the compiler does not and at this point he needs to provide the information to the compiler even though this information is not needed for the problem being solved. This information is required to be present in the source code and it can make the code longer, more complex and harder to read than the same code written in a dynamically typed language. Another problem is that sometimes the information that is read is inherently typeless. A good example is XML data that does not have a schema. In this case statically typed languages do not provide additional value but developers still need to work with a syntax tweaked for static typing which makes the code much harder to read. In order to preserve type information statically typed languages need a number of complex features. For example in order to develop a simple reusable data structure like a list that works with any data type and preserves type information you need to use generics or templates. Generics bring their own limitations and more features like covariance and contravariance are needed to work around them. All these features add complexity to the language.

   The on the other side of the coin are dynamically typed languages. Code in dynamically typed languages is usually shorter and easier to read. In a dynamically typed language code can automatically become polymorphic and generic without the need to declare interfaces and write complex generic code. Developers can just call a method on a variable and if the method is present on the object it will be called.

   The problem arises when you try to access a member that does not exist. What happens is that the program crashes and the users get errors. You are probably thinking that the developer can just run the program and check if it works but sometimes the developer cannot check every possible code path like the compiler of a statically typed language can. What is more sometimes a developer would change code that is used by some other code and he will not remember to check the client code or he may not know that the client code exists at all. Dynamically typed languages limit the power of tools like intellisense and refactoring tools and need more documentation for the code that should be kept up to date. The absence of type information prevents certain types of optimizations by the compiler which can lead to greater memory consumption and slower execution compared to statically typed languages.

   

Weak vs. Strong


   It seems like strong typing and weak typing are terms that are loosely defined and their meaning can vary. It is usually accepted that a weakly typed language will allow operations to succeed on types that are not exactly expected. In the C programming language one can have an array of 8 chars (bytes), cast it to a double and use the double in operations that require double. In C# the cast will fail. This is why it is said that C has weak typing and C# has strong typing.

   Another definition is that the shape of the object in a strongly typed language cannot change at runtime. For example in C# the developer cannot add a method to an object at runtime but in JavaScript it is possible and done quite often.

   It seems like even more definitions exist.

   It is considered that strongly typed languages protect developers from some errors but weakly typed languages provide additional flexibility.

   I should also point out that strong typing is not binary option. There are various degrees of strength of the type system. For example C# is considered a strongly typed language but it allows implicit conversions from int to double. Consider the following example:

   42 + 42.0

This code will successfully compile and run in C#. On the other hand in F# this code will require explicit cast from int to double like this:

   (float)42 + 42.0

This is why F# has stronger type system than C# despite the fact that both languages are strongly typed.

   It is worth noting that there is no special relationship between static/dynamic typing and strong/weak typing. All combinations are possible and in fact popular languages provide examples for all of them:

C and C++ – statically and weakly typed
C# and Java – statically and strongly typed
JavaScript – dynamically and weakly typed
Python – dynamically and strongly typed

   

Type Inference


   While not really part of the type system type inference is a feature that is closely related. All it means is that the compiler does not require the developer to explicitly state the type of the variable if it can be inferred from the context in a statically typed language. Consider the following C# code:

   var foo = "bar";

What are the possible types for foo that will allow the program to compile? It is either an object or a string but the compiler will infer string as the type of foo because the most derived type is the most useful.  This way the above code becomes equivalent to:

   string foo = "bar";

   This concept can be expanded further by allowing method arguments, generic arguments, method return types and many more to be inferred. One of the more interesting extensions of the concept is anonymous types.

   var foo = new { Bar = "bar" };

This code creates a variable of a type with one string property named Bar. The type does not have a name but it is still statically typed as any other normal type. Using this method it is possible to compose whole programs with very complex types that are statically typed without writing a single type name. This approach is heavily used in languages like F# and Haskell.

   Type inference can reduce the amount of code and negotiations with the compiler thus getting some of the benefits of dynamically typed languages while retaining all of the benefits of statically typed languages. Type checks are still done at compile time, intellisense and refactoring tools are still as powerful as they are when types are declared explicitly.

   I hope this helps people get into the world of typing disciplines and provide good introduction for further research. The field has a lot of subtleties and is a subject of ongoing research and experimentation both in the academia and in the mainstream programming languages. I personally have demonstrated misunderstanding of the matter in the past including public embarrassment. In fact I hope this is not another embarrassment.

   Update 29.10.2010:

   People have provided related posts on the subject:
   http://james-iry.blogspot.com/2010/05/types-la-chart.html
   http://blog.steveklabnik.com/what-to-know-before-debating-type-systems-0

Both posts are more detailed and deep than mine but be warned that they are longer. Especially the second one.
Tags:   english programming 
Last edited by:   Stilgar
on   17:21 29.10.2010
Posted by:   Stilgar
14:03 28.10.2010

Comments:

First Previous 1 Next Last 

Posted by   Guest (Unregistered)   on   15:53 28.10.2010

I define "weakly typed" languages as those in which data structures can become corrupted. To whit I have a completely objective way to determine if a language is weakly typed:

Are buffer overflows possible?

If the answer is yes, then the memory addresses you are writing over clearly don't know their own type nor can they enforce type safety. Therefore the language is weakly typed.

Java is the only language I know that defies this definition. And that is because its implementation for generics allow you to place strings in a List<Integer> in such a way that no runtime exception is thrown until you later try to read from the now-corrupted list.

Posted by   Stilgar   on   16:44 28.10.2010

Your definition is valid but it would classify JavaScript as strongly typed language and it seems like most people agree that it is not. Also in Java you don't get buffer overflow but class cast exception.

Posted by   Guest (Unregistered)   on   21:52 28.10.2010

Perhaps you should read Chris Smiths seminal article. Its no longer online, but you can grab the cache or read a re-post here:
http://blog.steveklabnik.com/what-to-know-before-debating-type-systems-0

Posted by   Stilgar   on   17:56 29.10.2010

Thanks for the link. I've added it to the article.

Posted by   ikirachen (Unregistered)   on   14:52 27.12.2010

Posted by   Stilgar   on   19:43 27.12.2010

@ikirachen I was taking this talk seriously up to the point where he claimed that generics were a bad thing to add in Java. The talk seems biased and you can see some people critisizing him in the comments.

His theory that dynamic languages can get faster and can have better tooling than static languages falls short because of the simple fact that optimizations and tooling support are done based on information about the program that the tools (IDEs or JIT compilers) have. However all the information available from dynamic languages is available in static languages as well. Static languaes can never have less information they can only have more therefore they will always be faster and with better tooling but one must admit that the gap is getting smaller and smaller.

The article was interesting read especially the part describing modern optimizations for dynamic languages.

First Previous 1 Next Last 


Post as:



Post a comment: