The Sietch - this is not a blog.
This is a no-blog.

Pseudorandom Quote:

I, for one, bet on science as helping us. I have yet to see how it fundamentally endangers us, even with the H-bomb lurking about. Science has given us more lives than it has taken; we must remember that.
Philip K. Dick

Five Little Tricks to Make Your Life with Entity Framework Easier

The history of data access technologies in .NET is not a pretty one. Microsoft told us to do DataSets, then SqlDataSource, then LINQ to SQL, then Entity Framework in one way, then Entity Framework in another way. In the meantime many people were using raw ADO.NET and wrote their SQL by hand and other people were using NHibernate. It seems that things have finally settled down. DataSets, DataSources and LINQ to SQL are pretty much dead and it seems NHibernate is about to join them. The people who love control over their SQL are still a minority but they are doing just fine and have added some tools like Dapper to their belts. For the past four or five years most people have been doing (at least for new projects) what Microsoft calls Entity Framework Code First which is a name that in typical Microsoft fashion is very confusing and inaccurate. If you are in that camp you might find the knowledge I am about to share useful. It has come with a lot of battle scars.

First of all it should be noted that this list is for Entity Framework 6.2. I have not checked how much of this works with Entity Framework Core but I assume most of it will even if some of the APIs may change. If you happen to know which of the tricks in this list work in EF Core and which do not please leave a comment. I may or may not choose to update this article with EF Core code when it goes final but the general principles should still apply and I believe they apply to most ORMs.

I have ordered the list starting from the least expensive to implement and least dependent on opinion to the most controversial so the first items should be a no-brainer for any project while the end of the list may be more problematic and people may disagree depending on their personal experience. You will also notice that most points are a reflection of the idea that you should not trust the ORM to do the right thing. In my experience this is the correct way to deal with ORMs.

1. Dump your queries

Add the following lines in your DataContext constructor

#if DEBUG
this.Database.Log = s => System.Diagnostics.Debug.WriteLine(s);
#endif

This will simply log every query Entity Framework executes to the Output window of Visual Studio when running in debug mode. The current way to do this in EF Core is more elaborate and involves injecting a logger via the Dependency Injection mechanisms but it is simple enough to make it worth it even if you need to look at the actual queries just once. I cannot think of any reason not to use this. In fact I advise beginners with EF to look at each and every query EF generates and non-beginners should look at more complex queries too to avoid surprises. Just last week I was looking at a performance problem that was a result of a bad EF query. I had seen the query but decided to leave it there because my SQL Server showed the query plans of the generated and hand-written query to be equivalent but it turns out that with the amount of data in production the query plans changed and the EF one was inefficient. Had I not looked at that query in advance I would not be able to fix the problem as fast as I did. The output window log can also show you when you are doing unexpected amount of queries or when a LINQ query is executed on the client rather than in the database.

2. Disable lazy loading

I used to be a big fan of lazy loading but not anymore. Simply disable lazy loading especially if you have less experienced programmers on the team. Add this line of code in your DataContext constructor

this.Configuration.LazyLoadingEnabled = false;

Load your related entities by adding calls to Include where needed. By doing this you will avoid unintentional problems with N+1 queries that happen behind your back. You may not even know you are doing this in your project especially if you are not following advice 1. Remember to use the statically typed version of Include that accepts a lambda rather than the one that accepts a string.

3. Use raw SQL when you must

Sometimes the query you want is too complex and EF cannot generate a reasonable query. Sometimes you want to use a feature of SQL Server that is not exposed in EF. Sometimes you simply wasted too much time trying to make EF do what you want. Well in these cases go write a stored procedure or raw SQL query. It is not that big of a problem. EF has good support for producing entities from a raw query result and a stored procedure can be mapped to a method. These features are there for a reason. Use them! Last time I used a stored procedure with EF was to do a Full Text Search in SQL Server. I hear people are writing magic query interceptors and what not just to avoid writing a couple of lines of SQL which is absurd because it is much harder than simply going one level down.

4. Do not use repositories and unit of work

Various courses, articles and even the official ASP.NET website recommend implementing a repository and unit of work. There are multiple articles out there explaining why this is an anti-pattern with Entity Framework and this could be a separate post but I will give a short version.

Repositories are supposed to abstract your data source. They serve as a data access layer. This means that you should be able to change your data access strategy from raw sql, to an ORM to a REST web service without changes to the client code. If the repository is implemented correctly it does indeed serve this purpose. The problem is that most tutorials, courses and articles recommend practices that leak the underlying implementation in the form of IQueryable. Yes, IQueryable is an interface but implementing it is practically impossible without the help of an ORM. At best you can switch from one ORM to another but for this you don't need a repository - a simple wrapper would be enough. A proper repository would not return anything but IEnumerable/List and will not accept IQueryable related argument (this is where the example on the ASP.NET website fails). So why do I advise not using a repository instead of implementing it correctly? The answer is simply – YAGNI. Implementing a proper repository is a lot of work and it makes your Service/Business Layer harder to implement due to the lack of convenient IQueryable. Unless you have specific requirements that point to changing the data source at some point you should not implement a repository. This specific abstraction costs a lot and you want to evaluate the risk. What if you need the abstraction later? Well, you man up and refactor. People have lived through serious refactoring before.

And what about the unit of work? Once you kicked out the repository you have a premade unit of work in the form of the DataContext (and a DbSet happens to be a repository). You do not need another. Also at least in web applications this unit of work has a logical lifetime - the web request. No need to manage the lifetime of the unit of work everywhere. Just create a data context when the request starts and dispose it at the end. All dependency injection frameworks do this out of the box. They call it Request Scope.

Get rid of your repositories and units of work and watch your code base become dramatically simpler with practically no downsides! You will thank me later.

5. Do not let Entity Framework create your database

In my experience EF sucks at generating a database. You miss a navigation property here and it does not generate a foreign key. You do not add a column name there and it generates it as User1. Yes you can make EF generate whatever you want but you have to check that it does. I find it less painful to simply create my database and then write a model that matches the database. I do not generate my model from database either because I have found that it suffers from the same issues but in reverse. In addition if you create your database and model manually you have the option to drop properties you do not need. For example if you have a Post entity that has a list of Comments the Comment entity may not need a navigation property Post so you can omit it and only include a PostID property. After all you do not query Comments by Post title so why add a Post property? Omitting redundant navigation properties makes the entities simpler and clearer. Of course if you happen to need the property at some point you just add it without changes to existing code or database.

The biggest downside of this approach is that you cannot use migrations. However you can add migrations later in the development lifecycle. Usually initial development involves a great number of tables and then the database is modified a couple of tables at a time. In this case you can create the initial database by hand, then enable migrations and make sure EF generates precisely the intended tables and keys from this point on via the migrations system. Since a few tables will be modified it is not so painful to keep track.

Here comes the obvious question. If I do not trust my ORM (and I do not) is it still worth it to use an ORM? Is it not faster to write plain SQL from the start instead of writing LINQ and check the SQL every time? In my experience it is. The type safety, the reduction in boilerplate code, the ability to avoid minor errors (like misspelling column names) makes me more productive with an ORM. This may not be true for everybody and I am not about to argue that you should use an ORM if you do not feel these benefits but if you do feel like ORM makes you more productive maybe these tricks can increase your productivity even more. And remember - an ORM is not an excuse to not know (or forget) SQL.

Tags: english programming

Posted by: Stilgar

10:22 08.02.2016

Comments:

First Previous 1 Next Last

Posted by Jothay (Unregistered) on 07:05 09.02.2016

#4 I agree that if you have a specific solution with a defined set of use cases that do not involve a need to ever change the db, removing the repository pattern will save you some time and effort. However, in a larger system where you may find that your original setup isn't going to scale with a new client's requirements, having that little bit of extra work/interface/abstraction with a single abstract Repository<T> class can make all the difference. Refactoring becomes a few hours of spot checking and running a few unit tests instead of days of work.

#5 Sure, if you are building a 50 plus table db new from scratch and haven't really landed out what you actually need, making the db first and just making a model.edmx Ferron it might save you some time in the short term but if you paid attention to what those generate, you can build it with code first the right way with less long term refactoring I'd say. There's also a key in here to mentioning an aggregate model pattern, where you don't access secondary objects except through the primary. Your mention of the post being the primary and the comments being secondary is the perfect example.

Posted by Stilgar on 20:46 09.02.2016

I strongly disagree that creating a proper repository is "a little bit of extra work". In my experience it is a lot of work. The very fact that all the tutorials out there take the IQueryable shortcut is proof that implementing a proper repository is a lot of work. It also results in quite annoying cases where the service layer has to work with multiple repositories and sometimes query between multiple repositories.

Posted by Guest (Unregistered) on 20:15 16.02.2016

#4 is controversial, but you won me over. I recently completed a project with repository and service layers, and I still cannot understand why we needed the repository layer. Sometimes I find people calling .Queryable in our service layer, so what is the point?

#5 kind of depends on your tolerance for the pain of writing raw SQL as opposed to simple POCO classes. In our project we got off the ground much faster doing the latter, and then once we had most of our tables, we would add columns and other additions with raw SQL.

I'd like to see #6 = Aim for zero stored procedures.

Posted by Guest (Unregistered) on 09:52 26.02.2016

I agree mostly with everything, except the lazy loading. If someone on the team does not understand the way the framework works you should provide them with some training instead of disabling this useful (IMO) feature.

Posted by Guest (Unregistered) on 09:54 26.02.2016

Btw "I'd like to see #6 = Aim for zero stored procedures." why would you aim for that?
Stored procedures have the benefit of being an SQL script which you do can change without recompiling your whole project. Also when EF generates your query, the whole query string is sent through the wire which makes for a heavier networking communication.

Posted by Stilgar on 13:58 29.02.2016

In my experience the cases where lazy loading is useful are very rare usually related to caching and the lazy loading in EF specifically does not play well with caching because it depends on the DataContext still being alive. If you are not caching why not load all the needed data in one database request?

Posted by Guest (Unregistered) on 09:53 01.03.2016

Well why eagerly load a bunch related entity data that I may not need?

Posted by Stilgar on 15:54 03.03.2016

Please give an example of a case where you don't know in advance what data you are going to need when you make a query. I think these examples do exist but they are rare. So if they are rare I'd rather use explicit lazy loading via the Load method ( https://msdn.microsoft.com/en-us/data/jj574232.aspx ) so that I can be sure I won't get unexpected number of queries.

Posted by Guest (Unregistered) on 18:04 14.03.2016

True story. I agree with every point here.

Posted by Carlos A. Morales (Unregistered) on 19:49 14.08.2016

Gracias buen articulo