What Is New in C# 3.0 - Part 7 (LINQ Syntax)

   Here is the cherry of the cake, the reason for all the new features in C# 3.0, the purpose of human evolution and the reason Google created the Universe in the first place. LINQ (Language Integrated Query) is a set of C# language features for manipulating data. Long time ago Anders (Hallowed be his name) reasoned that as more and more applications interact with different data stores and manipulate large sets of data it would be really cool to make data manipulation a first class language feature. The most popular way for applications to interact with data are the databases but data comes in many more forms for example XML documents. So the first problem was to define what data is.

   Anders (Hallowed be his name) in his infinite wisdom did not launch into philosophical arguments about what data is but instead provided us with easy to understand and very concrete definition of data.

   Anders' (Hallowed be his name) definition of data (write this in your notebooks):
Data is something that is IEnumerable. (i.e. implements this interface)

   From then on it was clear that LINQ should work on everything that is IEnumerable. If you, like most developers, are used to thinking about data as a database table you can imagine that the IEnumerable is a table, every enumerated item in the IEnumerable is a row and every property is a column. Another consideration is that developers are used to manipulating data using SQL. SQL could not be used directly in LINQ but Anders (Hallowed be his name) kept the keywords so  'select', 'from', 'where', 'join', etc. in LINQ have meaning similar to their SQL meaning.

   Lets define some classes to use in our example:

   class Mlass
   {
       public int MID { get; set; }
       public string Malue { get; set; }
   }

   class MexampleMlass
   {
       public int MID { get; set; }
       public DateTime Mate { get; set; }
   }

   We will start with an example with one of the simplest things that satisfies the definition of data – an array. Lets define two arrays to use in our examples (so I will not need to copy and paste them in every example):

           Mlass[] marrayOfMlass = {
               new Mlass { MID = 1, Malue = "a" },
               new Mlass { MID = 2, Malue = "b" },
               new Mlass { MID = 3, Malue = "c" } ,
               new Mlass { MID = 4, Malue = "d" } ,
               new Mlass { MID = 5, Malue = "e" } };

           MexampleMlass[] mexampleMarray = {
               new MexampleMlass { MID = 1, Mate = DateTime.Now },
               new MexampleMlass { MID = 1, Mate = DateTime.Now.AddDays(1) },
               new MexampleMlass { MID = 2, Mate = DateTime.Now.AddDays(2) } ,
               new MexampleMlass { MID = 2, Mate = DateTime.Now.AddDays(3) } ,
               new MexampleMlass { MID = 3, Mate = DateTime.Now.AddDays(4) } };

   And here is our first query (DRUM ROLL!):

           var mesult = from m in marrayOfMlass
                        where m.MID > 2
                        select m;

           foreach (var mar in mesult)
           {
               Console.WriteLine("MID is {0}, Malue is {1}", mar.MID, mar.Malue);
           }

the output will be:

MID is 3, Malue is c
MID is 4, Malue is d
MID is 5, Malue is e

   I bet everyone familiar with SQL can understand this simple query. The syntax looks very much like reversed SQL but is definitely different so lets dissect it.

   In the 'from' part you specify the IEnumerable to be used (in our case marrayOfMlass) and a name for the currently selected element (in our case m). Unlike regular SQL you need to specify a temporary name because the IEnumerable and its items are not interchangeable like they are in SQL. They have different properties and tables and rows have the same columns in SQL. The temporary name is very similar to the name of the current item in the foreach loop. Its purpose is practically the same. Then comes the 'where' part which accepts an expression with boolean result. And the last one is the select part where you tell what will be selected in the result. This is the whole m object (yes we can select something else). The type of what we selected will be the generic argument of the IEnumerable that is the result. The result (mesult) is some really complex type part of LINQ classes but what is important to us is that it is IEnumerable so we can iterate through it later. The type of the items returned from the iteration are the same as what is selected so 'mar' in the foreach will be of type Mlass (including full intellisense).

   Before we look at what happens behind the curtain lets look at a more complex example. How about this:

           var mesult2 = from mlass in marrayOfMlass
                         join mexample in mexampleMarray
                         on mlass.MID equals mexample.MID into mm
                         from mexample in mm
                         where mlass.Malue != "a" && mexample.Mate > DateTime.Now.AddDays(1)
                         orderby mlass.Malue descending
                         select new { mlass.Malue, mexample.Mate };

           foreach (var mar2 in mesult2)
           {
               Console.WriteLine("Malue is {0}, Mate is {1}", mar2.Malue, mar2.Mate);
           }

   This makes a join between the two arrays based on equality in the MIDs then filters based on the Malue (!= “a”) and the Mate ( > DateTime.Now.AddDays(1)). The first is satisfied by all but the first element of marrayOfMlass and the second is satisfied by all but the first two elements of mexampleMarray. After the join we sort the result by Malue descending and select some fields from both classes. The output is something like:

Malue is c, Mate is 24.3.2008 22:35
Malue is b, Mate is 22.3.2008 22:35
Malue is b, Mate is 23.3.2008 22:35

(The date may different of course)

   We have selected an anonymous type (select new) and now you can see why they are needed. If we did not have anonymous types we would have to define a class for every LINQ query that selected properties from different classes and we would have to modify it every time we modified the query. We need the type inference to use anonymous types because we cannot name them so we use 'var' for the result because we cannot explicitly write the name of the generic parameter of the IEnumerable. Also in order to instantiate anonymous type we need object initializers because we cannot explicitly call the constructor.

   Most keywords in LINQ are actually an aliases for methods defined in the System.Linq.Enumerable static class. Guess what! These methods are extension methods for the IEnumerable. Lets return to the first example. Because it is so simple it uses only one method (the from and select are not methods in this context) – the Where method.

           var mesult = from m in marrayOfMlass
                        where m.MID > 2
                        select m;

can also be written as

           var mesult = marrayOfMlass.Where(m => m.MID > 2);

  Yes, that is right. The parameter of the Where method is Func<U, V> delegate (in our case it is inferred to be Func<Mlass, bool>) and that means lambda expression. The boolean expression is in the where clause and all other expressions that come in LINQ queries are actually lambdas! See the pieces falling into their places. In the second query we used type inference, object initializer, anonymous types, extension method (through keywords), lambda expressions and automatic properties are needed for anonymous types. When you look at the C# representation of the first example you may think that the query syntax is unnecessary or even harmful (it is longer) but how would the second example look in C#?

   Because of the heavy use of anonymous types and especially closures in the second example it will be possible but very hard to write it in old school C# (with methods). The Join method alone has four arguments all lambda expression and the Where method needs to capture the p variable. The compiler does some real magic behind the scenes. If you really like to see how it looks use ildeasm but I warn you it is very hard to read even if you are experienced with MSIL. It turns out that the query syntax not only looks better and more familiar but also can express logic that other constructs have trouble with.

    I can go on giving examples about the different query operators like 'group by', 'take', 'skip', etc., but there are a lot of them and there are a lot of examples that you can google. Here is one of the most popular resources – 101 LINQ Samples.



   The only two things needed for LINQ that we did not use were partial methods and expression trees. We will see how they are used in the next and last part of the series when we will look at how LINQ queries databases (and other providers). Stay tuned.
Tags:   english programming 
Posted by:   Stilgar
05:53 21.03.2008

Comments:

First Previous 1 Next Last 

Posted by   thk   on   02:40 23.03.2008

Go Stilgar, Go!

Posted by   Linq RULZ (Unregistered)   on   20:16 23.03.2008

(bow)
Още един шедьовър :):):)

След като приключи за LINQ ... ще пуснеш ли един tutorial за ASP.NET base features?

Posted by   Stilgar   on   20:41 23.03.2008

Ne :)

Posted by   dotNET junkie (Unregistered)   on   19:39 24.03.2008

Most impressive! I just stand... dumbfounded at the sheer brilliance of this article - and the whole series. It's great, simply great! This blog is a must-read for anyone.

First Previous 1 Next Last 


Post as:



Post a comment: