After a long delay I am proud (well not really) to present part 6 of the “What Is New in C# 3.0” series. This is arguably the most difficult part of all but you only need to fully understand how expression trees work if you are going to develop LINQ data providers or similar frameworks. I personally am nowhere near to fully understanding how expression trees work but learning the principles they rely on will help you and me use them more effectively.
Expression trees are closely related to lambda expressions (if you do not know what lambda expressions are check out part 5 of the series). In fact expression trees are a way to represent lambda expressions in a tree data structure. As you probably remember lambda expressions can be compiled to a delegate but depending on the context lambda expressions can be compiled to an instance of a Expression<TDelegate> class where TDelegate is a delegate whose signature the lambda expression will match. Here is an example:
Expression<Func<int, int>> exp = x => x * 2;
Here we are using the Func delegate which is designed specially for lambda expressions but we could use any delegate as a generic parameter. We can also pass lambda expressions to methods with parameters of type Expression<TDelegate>. The compiler will guess whether the lambda expression should be compiled to delegate or expression tree depending on the context.
Lets see what is in the exp variable. There is a Type property that returns the return type of the expression in the sense of .NET Framework. In our case that is the System.Int32 type (same as int.GetType()). The NodeType property is one of the ExpressionType enumeration. Its values are meant to specify what exactly the expression does. This is necessary because classes for expressions are quite general. For example a BinaryExpression represents expression with two arguments but this includes addition, subtraction, division and many more. Using the ExpressionType enumeration the developer can specify the exact operation. It includes every basic expression I can think of. I guess they made it following the C# specification for expressions so practically every C# expression can be described in expression tree. The Parameters collection holds the parameters in our case it has one member and it is x. The Parameters collection consists of ParameterExpression items and the ParameterExpression class derives from Expression so it has Type and NodeType properties and a Name property in addition.
This is a list of expression classes (copy/paste from
MSDN )
BinaryExpression
ConditionalExpression
ConstantExpression
InvocationExpression
LambdaExpression
MemberExpression
MethodCallExpression
NewExpression
NewArrayExpression
MemberInitExpression
ListInitExpression
ParameterExpression
TypeBinaryExpression
UnaryExpression
Together with the NodeType enumeration this covers all C# expressions (I think). We have expressions for calling methods, creating instances, representing constants, etc. Each of this classes has the necessary properties to fully represent what it is designed to represent. For example the MethodCallExpression contains a MethodInfo object so the method can be invoked through reflection. In our example the expression consinsts of a BinaryExpression with NodeType that is ExpressionType.Multiply. The Left (property) expression of the BinaryExpression is ParameterExpression representing x and the Right expression is ConstantExpression with the value 2. Instead of using lambda expressions you can use pure C# 2.0 code and write it like this:
ParameterExpression p = Expression.Parameter(typeof(int), "x"); //declare the parameter
Expression<Func<int, int>> exp = Expression.Lambda<Func<int, int>>(
Expression.Multiply(
p, //this is the left expression of the multiply expression
Expression.Constant(2)), //this is the right expression
p);//this is the list of parameters for the lambda (outermost) expression in our case only p
Note that when you compile lambda expression to a delegate the lambda expression can be statement. However when compiling to expression tree the lambda expression has to be pure expression. So this
Func<int, int> exp = x => { if (x > 0) return 1; else return -1; };
is legal in C# 3.0 but this
Expression<Func<int, int>> exp = x => { if (x > 0) return 1; else return -1; };
will result in compile-time error.
You may already wonder what is the purpose of all this. What we have here is something like a mini compiler. We get the parser for free (in the C# compiler) that builds the expression tree for us and we can implement a code generator or interpreter to use the tree. For example LINQ to SQL generates T-SQL queries based on the expression tree. Another example is the Compile method.
Func<int, int> compiledExp = exp.Compile();
The Compile method generates a delegate equivalent to what the C# compiler would have generated if we had compiled the lambda expression directly, but uses the expression tree to generate the IL code. I do not know how code generation/interpretation performs but I suspect that when the tree is built in advance it is relatively fast so the power that expression trees give us is worth the performance cost. What is more lambda expressions are expected to be short so there is not much to be generated.
I know this is complicated and I am not sure that I have represented everything right but as I said in the beginning, you do not need to understand the details. What you need to know is:
- Lambda expressions can be compiled to a data structure called expression tree.
- Frameworks can traverse the expression trees and use them for code generation, interpretation or analysis.
That is all for now. Stay tuned for part 7 that will deal with LINQ query syntax and you will be able to see all the new features in action.