DataSets Are Not Evil#

Ok that's it, everybody stop talking smack about the DataSet. As self-elected chairman of the inexistent "DataSet Preservation Fund", I feel urged to respond to years of incorrect positioning, biased thinking and over-simplified black-and-white views. I won't allow the "enterprise community" to make the DataSet the Clippy of .NET, so to speak.

All joking aside, I've long been putting off writing about this topic since I wanted to make sure my opinion was balanced enough but David Boschmans pointed me at a recent MSDN article on Custom Entity Classes by Karl Seguin which strikes me as pretty typical so here's my two cents on the matter. I'll take the article as my guide to inject some of my opinions, so it should be helpful to have a quick scan of the article to see what it's all about. By the way, why this article appears in the ASP.NET Developer Center and not in an Enterprise Development Center or the like (think PAG) is unclear to me, since this kind of discussion should target a broader audience.

Introduction

First and foremost, as the disclaimer in the introduction states: the article only talks about untyped DataSets, not about strongly-typed DataSets. This is a very unfortunate limitation by the author in my opinion, since Typed DataSets do indeed solve most problems mentioned in the article, as well as provide extra benefits, and this type of simplification will only give more traction to the anti-DataSet camp.

Alleged Problems

  • Lack Of Abstraction.
    The statement is made (a few times throughout the article actually) that a DataSet makes it impossible to decouple your code from the database structure. While true that you still deal with logical tables, columns and relationships, a Typed DataSet will make this much less visible because you can navigate all of these by just using strongly-typed properties and methods. The argument that changing a column name in the database schema would result in adapting your client code doesn't make sense, you can (and, for this reason, always should) use "AS" in your SQL query to abstract away the actual column name, so it won't "trickle down" to the calling layers. So if the UserId column would actually change to Id for example, the SQL statement would simply become:
    SELECT Id AS UserId, FirstName AS FirstName, LastName AS LastName FROM Users
  • Weakly-typed.
    The obvious statement is made that regular DataSets are untyped, the issues mentioned by the author are of course 100% correct. Hence the invention of Typed DataSets. They will eliminate all the raised concerns. This (along with the next paragraph) also renders the "Why Are They Beneficial" section superfluous.
  • Not object-oriented.
    Sure enough, if you create a Typed DataSet, you get a class that inherits from System.DataSet, which means you can't make it derive from another class anymore. But does this make it "not object-oriented"? This type of problem is often referred to in the Java world as not being a "Plain Old Java Object" (POJO), so in this case I'll refer to it as a "Plain Old .NET Object" (PONO). The problem seems big enough to have Indigo address this by not requiring your components to inherit from any base class (as is the case with Enterprise Services COM+ components, which need to derive from ServicedComponent). If this is not a serious issue for you (it's never been for me), nothing prohibits you from just subclassing that generated DataSet class to add functionality to it without resorting to external utility methods. In the upcoming 2.0 version of .NET, it will even be possible to add functionality to the class without the need for subclassing it by using the partial classes feature.
    The Scott Hanselman quote, taken from his rant against DataSets on service boundaries, makes a good point though: "DataSets are bowls, not fruit". I'm actually cool with Typed DataSets being "bowls with a picture of fruit on them", they're data containers (data transfer objects) after all. Custom entities aren't fruit either, they're another binary representation of the same concept, but they still don't smell like fruit.
  • NULLs.
    A point is made that "dealing with NULLs in DataSets isn't the easiest thing, because every time you pull a value you need to check if it's NULL". While I certainly agree that NULLs can be a pain, you could partially deal with this through annotations on your Typed DataSet, which allow you to define the value returned if the field actually contains a NULL value. Besides, a field having a functional value or not (i.e. being NULL or not) is something that affects all layers, and how you handle this in code will not be very different when using DataSets or custom classes. This is certainly true for value types (while we wait for nullable types anyway, gimme int? now!).

Other points that make me go "hmmm"...

  • "Never return a class from the System.Data or child namespace from the DAL". If you are convinced that System.Data is the namespace for all server-side database stuff, then you're right. But I don't think that's the case, the provider child namespaces (e.g. System.Data.SqlClient) are server-side. I would allow System.Data.DataSet to pass through customs without a problem here.
  • On performance: "whatever processing time you are able to save probably doesn't amount to much compared to the difference in maintainability". This is really comparing apples and pears, as runtime and designtime have very different characteristics and requirements in a project. It certainly is true that a "blanket statement" is pretty useless here, but I would like to say that although the performance hit and XML overhead on the wire of a DataSet might be larger in absolute figures, you should be communicating with the service backend through chunky calls anyway and the main performance bottleneck will lie beyond the service facade most of the time anyway. But the best performance tweaks can only be done through one optimal path: measuring and interpreting live data.
  • "You can certainly assign nothing/null to [value types], but this will assign a default value. If you just declare an integer, or assign nothing/null to it, the variable will actually hold the value 0." Excuse me? Have you actually tried to run the "int i = null" statement through a C# compiler? This largely diminishes my trust in the author's, eh, authority on the subject of object oriented programming.

Free Code

So, what do you get with Typed DataSets? Free code mostly.

  • You can use the boilerplate DataTable.Select method for searching through a table (which is illustrated as "custom behaviour" for the custom entity classes - i.e. code you need to type or generate).
  • The remark that this Select call isn't strongly typed is again correct, but if you define a key on the UserId field in your Typed DataSet for example, you will also get a free typed method called FindByUserId which will select rows based on that key.
  • In ADO.NET 2.0, you'll get the data column names as strongly typed properties so you can use these in your code. The databinding example in the article actually isn't strongly typed itself: the call to DataBinder.Eval(Container.DataItem, "UserName") contains the column name as a plain string, so it would benefit from these properties here as well.
  • If you model relationships between tables, you get free methods and properties to navigate between parent and child tables. You can even use databinding expressions to navigate through these relationships.
  • As stated by the author, "the DataView's built-in support for sorting and filtering, although requiring knowledge of both SQL and the underlying data structure, is a convenience that is somewhat lost with custom collections". So this is basically free code again, and as an extra remark: of course you need to know the data structure, how would you build an application if you knew nothing at all about your data structures? The main point, again, is that you're not supposed to know the database schema here, just the logical data structure. And regarding sorting: if you call "UserName DESC" (which you can use as a sorting expression) knowledge of SQL and think that's too hard to grasp, then you can't really expect much from a developer anyway. There needs to be some way of pouring that sort expression into a language construct. Indeed, it would be better to add strongly typed methods to sort with a sort flag enumeration (Ascending/Descending) and a column name, but it's really not that bad, is it?
  • Data binding is far better supported with DataSets than with any other mechanism in the .NET Framework. Developer productivity is indeed a project feature.

DataSet Issues

So it can't all be good, right? There are indeed some issues with the DataSet and I can only hope they are fixed in upcoming versions of the framework. But the main point I'm trying to convey here is that these don't suffice to radically kill DataSets or tell everyone they're all Bad and Evil. Some real problems:

  • They behave bad on the wire when passing over a Web Service or otherwise serialized as XML. This is indeed a problem and a very good reason to avoid them, if (and only if) you need to interoperate with platforms other than the .NET runtime. Don't forget, a lot of times you have both ends under control so this is actually not always an issue.
  • They have a bloated wire format, since the XML representation can become very large very fast. If this is really a problem in your case and causes a measurable performance hit, you could opt for gzip compression over the HTTP transport protocol, or rethink the way you are using the services. The more or less "fixed" overhead will also become smaller in relative figures if the communication with the backend service is chunky in stead of chatty, so that's certainly an important point of attention. And you can consider using the GetChanges method to only include a diffgram of the changes since the DataSet was initially populated - but beware that this is incredibly platform-dependant though.

Good Points

Fortunately, the author makes some other very valid points as well.

  • "I realize that code generation sounds like something of a dream. But when properly used and understood, it truly is a powerful arsenal in your bag of tools - even if you aren't doing custom entities". This is very, very true.
  • "Making the switch to custom entities and collections shouldn't be a decision you make lightly". This is even more true. It has a major impact on the way your applications and services will be developed.
  • "Jimmy Nilsson has an overview of some of these alternatives in his 5 part series Choosing Data Containers for .NET (part 1, 2, 3, 4, and 5)". These articles are quite good, indeed. Recommended reading!

Conclusion

I think part of the real discussion here is: do we go for a full Object Oriented domain model (most likely with some kind of Object/Relational mapping tool that handles lazy loading, automatic persistence, etc.) or do we build a (Typed) DataSet model where you create different ad-hoc DataSets according to each separate use case (which means there is no unified "one fits all" model). I personally think the full OO model is very hard to maintain in a large corporate domain with many projects and dependencies. Combine this with Service Orientation where it's much harder to "deploy" your domain model to the outside world (since you can only share contract and certainly not type) and I tend to think the latter is more flexible. Either that or you'll have to re-map the service boundary messages to your domain model internally, which causes even more work (although this might be a good approach if that fits your existing internal OO model).

To conclude, I believe in one thing: choose the right tool for the job. Given two viable options, choose the option that would solve the problem best under the constraints at hand. As Andres Hejlsberg often puts it when comparing performance between C++ and C#: given infinite time and resources, you'll do better in C++. Given infinite time and budget, I might agree that custom entity classes are the best possible solution. Unfortunately, that's not very realistic in real-world projects. Developer productivity and shipping should be considered two very important "features" of your project, and if DataSets help you accomplish these, then I would certainly advise (or at least consider) using them.

All content © 2014, Jelle Druyts
On this page
Top Picks
Statistics
Total Posts: 351
This Year: 0
This Month: 0
This Week: 0
Comments: 530
Archives
Sitemap
Disclaimer
This is my personal website, not my boss', not my mother's, and certainly not the pope's. My personal opinions may be irrelevant, inaccurate, boring or even plain wrong, I'm sorry if that makes you feel uncomfortable. But then again, you don't have to read them, I just hope you'll find something interesting here now and then. I'll certainly do my best. But if you don't like it, go read the pope's blog. I'm sure it's fascinating.

Powered by:
newtelligence dasBlog 2.3.12105.0

Sign In