As one gets more experience as a developer the moments when new information can totally change understanding of basic things get rarer. Last month I was in an argument with three coworkers of mine and I was stomped so hard that I was forced to rethink and change the way I do something very basic namely the way I model many to many relationships.
In the past I just went for what I thought was the most natural approach. For example if I needed to model the relationship between users and groups where each User can be in many groups and each Group can have multiple users I would just create a cross table in the database where each row consists of UserID and GroupID. Both IDs together will be a primary key and they are both a foreign key to the corresponding tables. In my C# model I would put a collection of Groups in the User object and a collection of Users in the Group object. In the UI I would only provide a way to delete the relation and a way to add the relation but no edit. It is simple and it works. I can confirm it works because I have built multiple projects using this approach. However it turns out it is not as good as it sounds.
The alternative is to give the cross table its own ID, create a C# object representing the relationship and provide an option to edit the relationship in the UI. Why is this approach better?
Now I must admit I am a total noob when it comes to database performance. When people explain something I usually think I understand it and I agree but the problem is that if someone comes after that and explains the opposite I would probably agree again. When I was told the approach with the ID required less memory I agreed at once but did not agree that this was enough of a reason to add IDs to the cross tables. Turns out that because the primary key is an index you are actually better off having a separate ID because it is half the size and you will actually save memory due to smaller indexes. If you use the first approach you will have the benefit that the "left" column is a part of the index and it will result in faster search when this column is involved. However you can add this index and get that performance back. While at this point you are not saving memory you may gain performance. If the index is on two columns it gets bigger which means that it can spread on more pages. Algorithmically the search has the same complexity but because you will search half the pages you will get performance advantage as the table gets bigger. Of course I quickly dismissed these arguments as premature optimization.
Next point I was arguing was that the interface becomes more complex when it had an edit option. The user would be confused what the difference was between delete and add on one side and edit on the other. We should simplify the UI by having less buttons. Again my arguments were destroyed. First of all line of business apps usually have consistent interface where each item has add, delete and edit buttons. By removing the edit button I was making the interface less consistent and therefore more complex. In addition when the normal user adds a relationship and makes a mistake he does not think "I want to remove the incorrect relationship", he thinks "I want to fix my error". The user wants to edit. The edit button does make sense.
So I lost the UI argument and my last resort was the object model. In an OOP model many to many relationships are represented by having a collection of references to the corresponding type in both types. This is still true and if I am programming some algorithm or some framework I would probably use this representation by default. However in this case we are talking about persistence and UI. For our application the edit button was in the requirements anyway so we had to implement it. I was asked what I gain from going with the direct relationship approach. Getting the Groups a User belongs to would be easier. I guess the code for creating a relationship would be slightly simpler if I already had one side of the relationship (i.e. if you have the Group object and want to add a User). However in practice our SPA app was sending two IDs via AJAX to create a relationship. Creating a new relationship object in the alternative approach would not be much harder since I needed to get the User and Group objects anyway. If instead of objects the relationship object only used the IDs it would be even simpler. On the other hand what did I lose? First of all we had to create different code for editing relationships than what we had for everything else. This one had two IDs. Deleting a relationship also required two IDs all this code was quite different from all the other CRUD code that just used a single ID to do those operations. And what would happen if the requirements changed and a date on which the relationship was created was required? I would fall back to the approach with the ID and the separate object anyway. So basically the approach with the separate relationship object would be easier to implement and easier to change in the future. I lost on all accounts :(
One thing I was not convinced to do was rewrite the code that was tested and working in order to fit the new ideas I was just converted to. "Don't change things that work" is still something I strongly believe in.
The moral of the story is that what seems simpler at first is not always the better option.
This information comes to you thanks to my colleague Andrey Nikolov