The Sietch - this is not a blog.
This is a no-blog.

Pseudorandom Quote:

The report of my death was an exaggeration.
Mark Twain

Log In
User Name:	*
Password:	*
Remember me next time.

Register Forgotten password?

Main page What is this place? Articles in English Статии на български Browse by tag:

eventsgamesgiantsliteraturemoviesmusicprogrammingstupidtechwebsite

Find selected tags:

The Sietch Latest Comments Latest comments:

AI Models Doing CRUD Programming
19:41 21.04.2025
by mitq (Unregistered) AI Models Doing CRUD Programming
19:55 28.03.2025
by Guest (Unregistered) John McCarthy Has Entered the Matrix
08:34 01.03.2025
by Guest (Unregistered) Generating Swagger for Validation Attributes on Positional Records in ASP.NET
22:05 17.07.2024
by AlcoholicWeeb (Unregistered) Discrediting Questions about the Next Number in a Series with the Lagrange Polynomial
23:22 27.06.2023
by Guest (Unregistered)

Feeds: All Sietch.net Articles Articles in English Статии на български

Works on my machine!

LINQ to Objects Performance

To fill in the vacuum of technical articles on this no-blog I decided to put in this article about small but interesting problem with LINQ to Objects performance that a guy posted on a developers' forum a while ago. Fear not for I am working on more serious technical article but I am doing it slowly like a glacier moving.

The task in question was a simple implementation of arithmetic coding using LINQ to Objects. Basically arithmetic encoding writes a number telling how many times a symbol repeats and then the symbol that repeats. For example 111 is encoded as 31 (3 times 1) and 21 is encoded as 1211 (1 time 2 and 1 time 1). The original version was this:

public static string Encode(string input)
{
StringBuilder result = new StringBuilder();
var piece = input.AsEnumerable();
while (piece.Any())
{
var u0 = piece.First();
var first = piece.TakeWhile(c => c == u0);
var count = first.Count();
result.Append(count.ToString()).Append(u0);
piece = piece.Skip(count);
}
return result.ToString();
}

The algorithm takes the first character of the string ( piece.First() ) queries for repeating characters ( piece.TakeWhile(c => c == u0) ), counts the number of repeating characters ( first.Count() ), encodes them in the result ( result.Append(count.ToString()).Append(u0) ) then reassigns what is left to the same variable ( piece = piece.Skip(count); ) and continues looping until the input is depleted ( while (piece.Any()) ).

If you test it with a short string it works fine:

string input = "1234567890abcdefghijklmnopqrstuvwxyz";
string result = Encode(input);
Console.WriteLine(result);

However if you pass a longer input it becomes really slow:

for (int i = 0; i < 20; i++)
{
input += "1234567890abcdefghijklmnopqrstuvwxyz";
}

result = Encode(input);
Console.WriteLine(result);

The reason for this behavior is that LINQ extension methods defer the execution. When you query an enumerable you do not produce a new collection. You produce an enumerable class that contains the original enumerable and contains information on how to produce a new enumerable. When invoked the new enumerable is produced using this information. Note that like all enumerable objects every enumerator that is produced is independent of the previous one. The fact that you enumerated the collection one does not mean that the next time you enumerate the same enumerator would be used. In fact a new one would be produced.

In this case the Skip() method produces its result in that way. When we assign it to the piece variable we get an enumerable that can be called when needed. However the next time we call the Skip() method we produce an enumerable that contains the previous enumerable. In order to execute the result it has to invoke the original nested enumerable which in turn has to invoke its own nested enumerable and so on until they recursively reach the original string. Creating and destroying the amount of enumerators needed for a longer string kills the performance.

The good news is that we can easily solve this problem. We just need to add ToArray() after the Skip() method:

piece = piece.Skip(count).ToArray();

This method will create an array from the enumerable. This way we avoid all the nesting because the array is produced on the spot just once. Suddenly the performance is back to normal and this method works just fine. Of course if you need every bit of performance there are better ways to solve this task in C#. This method still creates a lot of objects and arrays and puts some pressure on the garbage collector. However the performance here suffers the most by input that is not suitable for arithmetic coding in the first place (i.e. one that does not consist of long sequences of repeating symbols).

I hope that this will help you avoid performance problems with LINQ to Objects. I have no doubt that everyone is using it. I just cannot imagine how people used to write a loop to filter a collection in the past.

Tags: english programming

Posted by: Stilgar

18:03 24.04.2010

Comments:

No comments yet.

Post as:

Post a comment:
*