The Principle of Least Astonishment
There is a good “rule of thumb” in software engineering called “The Principle of Least Astonishment”. In a nutshell, the principle states that the result of performing some operation should be obvious, consistent, and predictable, based upon the name of the operation and other clues(1). This seems like a fairly obvious principle. Be warned, however, that it’s easier said than done, and breaking this rule in your code can become easy if the design under the covers is somewhat lackluster. This is an illustration of something that does not follow The Principle of Least Astonishment.
A Little Background
.NET 3.5 introduces some excellent new technologies. One of these new technologies is Extension Methods(2); a static method that can be invoked by using instance method syntax. In essence, extension methods allow you to extend existing types and constructed types with additional methods without having create a new class which inherits the class you want to extend (or by marking the original class as partial). The IEnumerable<T>(3) interface in .NET 3.5 has been extended with almost 4 dozen extension methods.
A few of the extension methods in IEnumerable<T> allow for easy comparison between 2 IEnumerable<T>’s (arrays,List<T>’s, etc.). For example, with a single line of code, I can create a new IEnumerable<T> that contains the elements common between 2 other IEnumerable<T>’s by using the Intersect<T>(4) extension method. For example:
int[] nArray1 = {1, 2, 3, 4, 5, 6}; int[] nArray2 = {3, 6, 7, 8, 9} int[] commonIntsArray = nArray1.Intersect(nArray2).ToArray();
In the above example, the Intersect method will return the values that are common between nArray1 and nArray2. In this case, [3, 6]. Slick rick! 1 line of code, no looping, and you have the common set. But what happens if you want to compare two IEnumerable<T>’s of some custom object, such as 2 IEnumerable<Order>? You can’t very well say that Order o1 = Order o2… what’s are you comparing on? Thankfully, .NET 3.5 provides us an interface which we can implement on a class.
The IEqualityComparer<T> Interface
The IEqualityComparer<T> interface allows us to tell .NET how 2 objects should be compared. We can create equality comparer classes for any class we want (including .NET types). We can then use this equality comparer by passing it into the overloaded Intersect<T>. For example:
public class OrderComparer : IEqualityComparer<T> // defines Equals and GetHashCode { // implements the IEqualityComparer<T>.Equals(T x, T y) : bool public bool Equals(Order x, Order y) { Return x.OrderID == y.OrderID; } // implements the IEqualityComparer<T>.GetHashCode(T obj) : int public int GetHashCode(Order obj) { return obj.OrderID.GetHashCode(); } } public IEnumerable<Order> GetCommonOrders(Order[] orderSet1, Order[] orderSet2) { return orderSet1.Intersect(orderSet2, new OrderComparer()); }
In the above example, GetCommonOrders will return the common Orders between the 2 order sets using a new OrderComparer. .NET will use the Equals() of our OrderComparer method to determine if 2 of our objects (both of type Order) are equal (in our case, 2 Orders are equal if they have the same OrderID). If we called Intersect without passing an IEqualityComparer<Order> then .NET would use the default equality comparer, which would be the virtual Object.Equals() method – not what we want.
I thought this post was about something astonishing…?
You’re right – it is. Here’s the thing. Over the last 2 work days (about a total of 6.5 hours) I’ve been trying to get the following, bolded line of code to work.
public class MyWcfService : IMyWcfService { public Customer UpdateCustomer(Customer updated) { // some miscellaneous code IEqualityComparer<Order> comparer = new OrderComparer(); IEnumerable<Order> commonOrders = updated.Orders.Intersect(original.Orders, comparer).ToList(); // save user stuff } } public class OrderComparer : IEqualityComparer<T> { // implements the IEqualityComparer<T>.Equals(T x, T y) : bool public bool Equals(Order x, Order y) { return x.OrderID == y.OrderID; } // implements the IEqualityComparer<T>.GetHashCode(T obj) : int public int GetHashCode(Order obj) { return obj.GetHashCode(); } }
I had a Unit Test that tests the above UpdateCustomer method, and I found that the bolded line in question, for some unknown reason, never, ever returned a common result set – despite my unit test having a common result set. The result was always an empty IEnumerable. What gives? Can you find what’s going on? Can you determine why my OrderComparer was not working? Think about it for a bit (I thought about it for 6.5 hours, so pay me some dues and take 60 seconds to see if you can figure it out).
The reason why the above line was not working had nothing to do with the call to ToList() (I do that so that the Intersect executes – the extension method itself is deferred and only executes when a call to GetEnumerator is called). It also has nothing to do with the OrderComparer.Equals method. I’ll let you think about it a little more.
Give up? The reason why the call to Intersect was failing was because of OrderComparer’s implementation of GetHashCode. I found this nice little nugget in the IEqualityComparer<T>.GetHashCode() documentation:
Implementations are required to ensure that if the Equals method returns true for two objects x and y, then the value returned by the GetHashCode method for x must equal the value returned for y.
Seriously? *cries* I changed my GetHashCode method to the following and everything worked (change bolded).
public int GetHashCode(Order obj) { return obj.OrderID.GetHashCode(); }
Oye
As far as I’m concerned, Microsoft breaks The Principle of Least Astonishment here (this also goes for if you override Object.Equals – you get a compilation warning that states you haven’t overridden GetHashCode either). When I have a class that implements IEqualityComparer.Equals(T, T), I fully expect the result used by whatever calls that method to, you know, adhere to the result of Equals. However that is not the case with Intersect<T> (and other comparative IEnumerable<T> extension methods) – it’s true result hinges on a call to GetHashCode – which frankly confuses me; a) I don’t understand why a call to GetHashCode is required to compare 2 values, and b) it’s completely unintuitive. Despite my having read the help file about a gazillion times on IEnumerable<T>.Intersect<T>, and despite reading a gazillion pages online about Intersect<T>, and despite stepping through over and over and over again in debug mode, I couldn’t figure out what was possibly wrong. 1 line of code. 6.5 hours. 1 unhappy coder.
So be warned, fellow coders! The Principle of Least Astonishment is critical to the sanity of others using your code (especially if you’re writing code/classes that are low level and used by others developers). Don’t surprise me! Don’t have subtle dependencies between methods and the only place where it’s described is in the documentation! When I have to implement an Equals method, I expect that Equals will be used to determine the equality between 2 objects, and not be coupled with the result of some other method I have to implement! Please, won’t somebody please think about the children!
Love,
-
ryan.
on June 22, 2008 on 1:52 pm
This bit me just now. Thanks for explaining as I did not have much time to read the documentation. I am surprised at the result too!
on October 4, 2008 on 7:39 pm
Thanks – Saved me from the same fate.
on October 29, 2008 on 1:26 am
Me three (four?)!
Maybe there is the expectation that getting and comparing hashcode would potentially be a faster execution (in general), where as an Equals is going to do a deeper comparison?
Nevertheless, I was also surprised.
on October 29, 2008 on 7:32 am
@Tony – I thought about that as well, but the GetHashCode, if I recall correctly (and I very well may not be at this point) always gets called after Equals… maybe it’s an application-lifetime caching thing? I can’t remember what was going on during my tests… either way, it is lame. There’s a few things that MS continues to do which, for me, don’t make a whole lot of sense. Their whole “yield” thing in linq expressions just doesn’t make a whole lot of sense to me… but i digress.
on January 9, 2009 on 7:51 am
Thanks for reading the full documentation for me. Saved me from myself, and many more hours of frustration. I read, but didn’t READ, the documentation. I know Object.GetHashCode should be overridden in classes that will be used as Hashtable keys, but what’s that got to do with Intersect. Well, apparently it stores the objects in some hash structure to make all those Equals perform better. An implementation detail that should be irrelevant. My take away is anywhere IEqualityComparer is used expect a Hash structure is going to be used some how.
on January 20, 2009 on 9:49 pm
Pretty much landed into the same issue while trying to make the HashSet set operations to work. Realized that I needed to implement a custom comparer while comparing objects contained in the set, but stumbled at 2 places. Initially while implementing the IEqualityComparer, the GetHashCode() had to be overriden, which I was puzzled at & secondly while debugging, the code walked-through the GetHashCode() even before the Equals() was invoked. Originally like yourself, I returned the obj.GetHashCode(), but the final output was not as expected. I tried reading MSDN to crack this nut, but of no use. I then landed into your blog after googling and realized I was not alone to land in this maze. Took your advise and boom, my resultset was as expected. Tx a ton, appreciate your hardwork.
on May 27, 2009 on 12:52 pm
Wow I was so lucky, I was having this issue with the Except method and I thought I had everything correct. But I stumbled across this in my first hour of digging. Thanks so much for such a complete explanation of what was happening.