DataSets Are Not Evil#

Ok that's it, everybody stop talking smack about the DataSet. As self-elected chairman of the inexistent "DataSet Preservation Fund", I feel urged to respond to years of incorrect positioning, biased thinking and over-simplified black-and-white views. I won't allow the "enterprise community" to make the DataSet the Clippy of .NET, so to speak.

All joking aside, I've long been putting off writing about this topic since I wanted to make sure my opinion was balanced enough but David Boschmans pointed me at a recent MSDN article on Custom Entity Classes by Karl Seguin which strikes me as pretty typical so here's my two cents on the matter. I'll take the article as my guide to inject some of my opinions, so it should be helpful to have a quick scan of the article to see what it's all about. By the way, why this article appears in the ASP.NET Developer Center and not in an Enterprise Development Center or the like (think PAG) is unclear to me, since this kind of discussion should target a broader audience.

Introduction

First and foremost, as the disclaimer in the introduction states: the article only talks about untyped DataSets, not about strongly-typed DataSets. This is a very unfortunate limitation by the author in my opinion, since Typed DataSets do indeed solve most problems mentioned in the article, as well as provide extra benefits, and this type of simplification will only give more traction to the anti-DataSet camp.

Alleged Problems

  • Lack Of Abstraction.
    The statement is made (a few times throughout the article actually) that a DataSet makes it impossible to decouple your code from the database structure. While true that you still deal with logical tables, columns and relationships, a Typed DataSet will make this much less visible because you can navigate all of these by just using strongly-typed properties and methods. The argument that changing a column name in the database schema would result in adapting your client code doesn't make sense, you can (and, for this reason, always should) use "AS" in your SQL query to abstract away the actual column name, so it won't "trickle down" to the calling layers. So if the UserId column would actually change to Id for example, the SQL statement would simply become:
    SELECT Id AS UserId, FirstName AS FirstName, LastName AS LastName FROM Users
  • Weakly-typed.
    The obvious statement is made that regular DataSets are untyped, the issues mentioned by the author are of course 100% correct. Hence the invention of Typed DataSets. They will eliminate all the raised concerns. This (along with the next paragraph) also renders the "Why Are They Beneficial" section superfluous.
  • Not object-oriented.
    Sure enough, if you create a Typed DataSet, you get a class that inherits from System.DataSet, which means you can't make it derive from another class anymore. But does this make it "not object-oriented"? This type of problem is often referred to in the Java world as not being a "Plain Old Java Object" (POJO), so in this case I'll refer to it as a "Plain Old .NET Object" (PONO). The problem seems big enough to have Indigo address this by not requiring your components to inherit from any base class (as is the case with Enterprise Services COM+ components, which need to derive from ServicedComponent). If this is not a serious issue for you (it's never been for me), nothing prohibits you from just subclassing that generated DataSet class to add functionality to it without resorting to external utility methods. In the upcoming 2.0 version of .NET, it will even be possible to add functionality to the class without the need for subclassing it by using the partial classes feature.
    The Scott Hanselman quote, taken from his rant against DataSets on service boundaries, makes a good point though: "DataSets are bowls, not fruit". I'm actually cool with Typed DataSets being "bowls with a picture of fruit on them", they're data containers (data transfer objects) after all. Custom entities aren't fruit either, they're another binary representation of the same concept, but they still don't smell like fruit.
  • NULLs.
    A point is made that "dealing with NULLs in DataSets isn't the easiest thing, because every time you pull a value you need to check if it's NULL". While I certainly agree that NULLs can be a pain, you could partially deal with this through annotations on your Typed DataSet, which allow you to define the value returned if the field actually contains a NULL value. Besides, a field having a functional value or not (i.e. being NULL or not) is something that affects all layers, and how you handle this in code will not be very different when using DataSets or custom classes. This is certainly true for value types (while we wait for nullable types anyway, gimme int? now!).

Other points that make me go "hmmm"...

  • "Never return a class from the System.Data or child namespace from the DAL". If you are convinced that System.Data is the namespace for all server-side database stuff, then you're right. But I don't think that's the case, the provider child namespaces (e.g. System.Data.SqlClient) are server-side. I would allow System.Data.DataSet to pass through customs without a problem here.
  • On performance: "whatever processing time you are able to save probably doesn't amount to much compared to the difference in maintainability". This is really comparing apples and pears, as runtime and designtime have very different characteristics and requirements in a project. It certainly is true that a "blanket statement" is pretty useless here, but I would like to say that although the performance hit and XML overhead on the wire of a DataSet might be larger in absolute figures, you should be communicating with the service backend through chunky calls anyway and the main performance bottleneck will lie beyond the service facade most of the time anyway. But the best performance tweaks can only be done through one optimal path: measuring and interpreting live data.
  • "You can certainly assign nothing/null to [value types], but this will assign a default value. If you just declare an integer, or assign nothing/null to it, the variable will actually hold the value 0." Excuse me? Have you actually tried to run the "int i = null" statement through a C# compiler? This largely diminishes my trust in the author's, eh, authority on the subject of object oriented programming.

Free Code

So, what do you get with Typed DataSets? Free code mostly.

  • You can use the boilerplate DataTable.Select method for searching through a table (which is illustrated as "custom behaviour" for the custom entity classes - i.e. code you need to type or generate).
  • The remark that this Select call isn't strongly typed is again correct, but if you define a key on the UserId field in your Typed DataSet for example, you will also get a free typed method called FindByUserId which will select rows based on that key.
  • In ADO.NET 2.0, you'll get the data column names as strongly typed properties so you can use these in your code. The databinding example in the article actually isn't strongly typed itself: the call to DataBinder.Eval(Container.DataItem, "UserName") contains the column name as a plain string, so it would benefit from these properties here as well.
  • If you model relationships between tables, you get free methods and properties to navigate between parent and child tables. You can even use databinding expressions to navigate through these relationships.
  • As stated by the author, "the DataView's built-in support for sorting and filtering, although requiring knowledge of both SQL and the underlying data structure, is a convenience that is somewhat lost with custom collections". So this is basically free code again, and as an extra remark: of course you need to know the data structure, how would you build an application if you knew nothing at all about your data structures? The main point, again, is that you're not supposed to know the database schema here, just the logical data structure. And regarding sorting: if you call "UserName DESC" (which you can use as a sorting expression) knowledge of SQL and think that's too hard to grasp, then you can't really expect much from a developer anyway. There needs to be some way of pouring that sort expression into a language construct. Indeed, it would be better to add strongly typed methods to sort with a sort flag enumeration (Ascending/Descending) and a column name, but it's really not that bad, is it?
  • Data binding is far better supported with DataSets than with any other mechanism in the .NET Framework. Developer productivity is indeed a project feature.

DataSet Issues

So it can't all be good, right? There are indeed some issues with the DataSet and I can only hope they are fixed in upcoming versions of the framework. But the main point I'm trying to convey here is that these don't suffice to radically kill DataSets or tell everyone they're all Bad and Evil. Some real problems:

  • They behave bad on the wire when passing over a Web Service or otherwise serialized as XML. This is indeed a problem and a very good reason to avoid them, if (and only if) you need to interoperate with platforms other than the .NET runtime. Don't forget, a lot of times you have both ends under control so this is actually not always an issue.
  • They have a bloated wire format, since the XML representation can become very large very fast. If this is really a problem in your case and causes a measurable performance hit, you could opt for gzip compression over the HTTP transport protocol, or rethink the way you are using the services. The more or less "fixed" overhead will also become smaller in relative figures if the communication with the backend service is chunky in stead of chatty, so that's certainly an important point of attention. And you can consider using the GetChanges method to only include a diffgram of the changes since the DataSet was initially populated - but beware that this is incredibly platform-dependant though.

Good Points

Fortunately, the author makes some other very valid points as well.

  • "I realize that code generation sounds like something of a dream. But when properly used and understood, it truly is a powerful arsenal in your bag of tools - even if you aren't doing custom entities". This is very, very true.
  • "Making the switch to custom entities and collections shouldn't be a decision you make lightly". This is even more true. It has a major impact on the way your applications and services will be developed.
  • "Jimmy Nilsson has an overview of some of these alternatives in his 5 part series Choosing Data Containers for .NET (part 1, 2, 3, 4, and 5)". These articles are quite good, indeed. Recommended reading!

Conclusion

I think part of the real discussion here is: do we go for a full Object Oriented domain model (most likely with some kind of Object/Relational mapping tool that handles lazy loading, automatic persistence, etc.) or do we build a (Typed) DataSet model where you create different ad-hoc DataSets according to each separate use case (which means there is no unified "one fits all" model). I personally think the full OO model is very hard to maintain in a large corporate domain with many projects and dependencies. Combine this with Service Orientation where it's much harder to "deploy" your domain model to the outside world (since you can only share contract and certainly not type) and I tend to think the latter is more flexible. Either that or you'll have to re-map the service boundary messages to your domain model internally, which causes even more work (although this might be a good approach if that fits your existing internal OO model).

To conclude, I believe in one thing: choose the right tool for the job. Given two viable options, choose the option that would solve the problem best under the constraints at hand. As Andres Hejlsberg often puts it when comparing performance between C++ and C#: given infinite time and resources, you'll do better in C++. Given infinite time and budget, I might agree that custom entity classes are the best possible solution. Unfortunately, that's not very realistic in real-world projects. Developer productivity and shipping should be considered two very important "features" of your project, and if DataSets help you accomplish these, then I would certainly advise (or at least consider) using them.

Thursday, March 24, 2005 12:39:22 PM (Romance Standard Time, UTC+01:00)
Nice post!

When I'm using DataSets, I always feel a little bit dirty afterwards... :-)
Thursday, March 24, 2005 4:30:09 PM (Romance Standard Time, UTC+01:00)
There've been a handful of people who've vocally disagreed with my article, but you've probably been the less venomous, so I'd like to respond.

Firs though, I'd like to say that dim i as integer = nothing does compile (even with option strict on) and work as behaved which might be why I said what I did about assigning null's to value types. I won't claim that I simply checked it in VB.Net and didn't in C#...chances are I didn't check it in either languages so shame on me...it won't be the last time a compiler behaves a little differently than I expect it to...

Anyways, on to the serious point. Obviously the topic was going to be controversial, but I want to be clear on what my intentions were. They were, 100%, to introduce custom entities. I think I did a good job of that....I think what's thrown some people off is the first “anti-dataset” part of it. The goal here was only to point out shortcomings in order to highlight why you'd ever want to use another solution. Some people kept pointing out that I wasn't making a fair "debate" about the two methods, and that irks me because I never meant to debate the two methods. I'll be the first to admit that I pointed out very high level concepts just to get people thinking about them.

It seems a lot of people would have been happier had I written a 2nd article titled "On the way to Mastering ASP.Net – Introducing Typed-DataSets" and had the exact same 1st part about untyped, and then talked about typed. In this 2nd article I wouldn't talk, whatsoever, about custom entities. Then it would be clear that I wasn't trying to compare the two methods, but simply introduce them and how they solve certain (incredibly crucial) problems. Maybe that's what I'll do...dunno...

Anyways, I hope I haven't misunderstood your complaint. If you took out the first part about untyped-datasets and simply read the rest with respect to a potential design implementation, I think you'd find the article far less controversial...you might still find it useless since you don't agree with the specific implementation, but hopefully you'd be happy to see the information available out there for others to make the decision based on their own needs....

I enjoyed your post though ?

Cheers,
Karl
Thursday, March 24, 2005 4:32:58 PM (Romance Standard Time, UTC+01:00)
uhhmm...you don't have validation when the captcha fails...had to do that twice....and I don't know why "i enjoyed your post though" ended with a question mark...that was supposed to be a smiley :) (changes the meaning of that sentence...)
Friday, March 25, 2005 12:40:59 AM (Romance Standard Time, UTC+01:00)
Hi Karl,

Thanks for replying, it's always great to hear back from people you "comment" on :-) I'm sorry if I still sounded venomous, I'm not trying to fight you, I'm certainly not trying to fight custom entity classes, I'm just trying to fight the anti-dataset demagoguery that's been going on lately on blogs, conferences, workshops and articles. And yours just happened to be the trigger that provided a good set of points that I wanted to share my opinion on...

I think the problem with the article is that you try to strengthen your points by directly opposing the custom entity classes to the weak points of the (untyped) dataset. It's not just in the introduction but it's all over the article, so that's what bothered me, and apparently some other people along with me. Your idea of using the typed dataset against the same issues would be a very good move I think, and, bundled with the first article, give a much more balanced view on the topic.

So don't get me wrong, I have absolutely nothing against custom entity classes from a technology/implementation/conceptual point of view, it's just that I think they're an equally viable solution to the crucial problems you address in your article.

Regarding the compiler issue though, I think it's very logical that you cannot assign null to a value type. The fact that the VB.NET compiler allows that and turns it into something else (the default value for that value type) is the typical kind of compiler trickery that I'm not very fond of.

And I've noticed the tricky thing with the captcha control as well: I suspect it only works within some kind of timespan, and if you type "too much" you'll just get another captcha. The smiley turning into a question mark is new though... Clemens? Omar? Scott? Anyone got a fix ready ;-)

Anyway, thanks again for replying, no hard feelings? :-)

Jelle
Friday, March 25, 2005 1:46:39 PM (Romance Standard Time, UTC+01:00)
Absolutely no hard feelings...had I not found your post constructive, I wouldn't have replied, so thanks. I'm an amateur writer, and while I'm written other stuff, this is the first one for MSDN...I think when I saw the first person give me a vote of 1 I thought to myself I was going to hunt them down and....hahah..seriously, it's a learning/growing experience.
Friday, March 25, 2005 4:02:03 PM (Romance Standard Time, UTC+01:00)
Interesting post! ;-)

Grtz,
Wimske
Tuesday, June 21, 2005 6:27:23 PM (Romance Standard Time, UTC+01:00)
Linked and Subscribed! I like the way you think...
Wednesday, June 22, 2005 2:27:18 AM (Romance Standard Time, UTC+01:00)
I recently had quite a heated and good discussion about datasets vs. business objects on my blog.

http://codebetter.com/blogs/sahil.malik/archive/2005/06/07/64172.aspx

Wednesday, July 20, 2005 4:32:52 PM (Romance Standard Time, UTC+01:00)
Good article.
Your site sucks for printing from IE.
Maybe try using a PRINT CSS StyleSheet. There are tons of developers that still like to read hardcopy instead of a screen.
Sunday, July 24, 2005 5:34:17 PM (Romance Standard Time, UTC+01:00)
TA: thanks for your feedback on the print stylesheet, I'll be sure to take a look at it so that it rolls out nicer on hardcopy!
Thursday, December 22, 2005 12:49:56 PM (Romance Standard Time, UTC+01:00)
Hi,

Good article over all but...
You have said that typed datasets make it object-oriented which is not really true. Typed dataset make the dataset (strongly) typed and nothing more. Dataset (obviously) promotes relational structures. If you have master-child relationships then dataset will give you two tables linked with the relation. Custom entity objects will give you list of master objects (each containing corresponding child objects). Similar thing with many-to-many relationships.
Advantage with custom objects is that their consumers get completely decoupled with relational world.
No, I am not saying datasets are evil. They do save lot of time (for most of business applications) but there is the flip side other than "performance over wire" that one must acknowledge.
Ease of coding (or saving time) and "do in 30 minuts" kind of samples make developers to use datasets (and only datasets) w/o realizing there can be other alternatives. You may be correct in saying that dataset gets bashed every where, however in my experience, most of developers that I know use nothing but datasets. Many of them don't even bother with typed datasets - "why work more when simple dataset can do it".
Bottomline (as you have put it) should always be "know all alternatives and choose one that applies to ur case".
Thursday, January 26, 2006 11:08:41 PM (Romance Standard Time, UTC+01:00)
so, I love coming into a converation almost a year after the fact, but i've been a huge proponent of strong typed datasets for years now, and I think there's one key ingredient that you've left out of your position, in terms of abstraction. I rarely see clear messaging on this, but to be completely clear, the schema of your strong typed dataset can (and in my experience, should) have absolutely *nothing* to do with your actual database schema! I design my dataset schema as the optimal representation of my entities, from the applications perspective; I then use the table/column mapping features of the DataAdapter/Command pattern to map my optimal entity represenation to resultsets and parameters from stored procedures. the actual, underlying physical schema is completely divorced from the schema that my app developers work with. I achieve 100% data abstraction, purely by using strong typed datasets; my middle tier and front end developers never know what the underlying schema looks like. this is particularly valuable in circumstances where the schema is not directly under your control; i.e., working with 3rd party apps, or working with particularly stringent database groups under strict privacy/compliance guidelines.
Friday, June 30, 2006 9:29:35 AM (Romance Standard Time, UTC+01:00)
I lOVE STYPED Datasets but how the fXXX to you handle putting nulls back into the dataset

when I have a integer data column it coverts nothing to 0.
WTF???

with dr

. iCol1 = cint(dropdown.selectedValue)

end with

if dropdown.selectedValue = ""
if changes iCol1 to 0.

Tuesday, July 25, 2006 2:36:03 AM (Romance Standard Time, UTC+01:00)
This is a great debate. I really enjoy reading about these techniques. In the end, each architecture has its own pros and cons. A good architect will know how to make a decision on it.

I will explain how I make the decision to use Datasets vs. Custom Entities

When to use datasets:
use when it is a database driven application. Period =]

When to use custom entities:
use when it is NOT a database driven application. For example, user controls like Infragistics, network applicactions like xCeed FTP, any software that will be resold or reused by others external to your organization.

p.s. Somebody who is a guru of sql is more valuable then somebody who is a guru of c#.

p.s.s. nobody will ever convince me that custom entities are faster than well written optimized sql queries!
Friday, December 15, 2006 10:48:43 AM (Romance Standard Time, UTC+01:00)
good
Monday, January 01, 2007 10:07:30 PM (Romance Standard Time, UTC+01:00)
I recently wrote an article on the advantages of business objects over datasets. Please visit and let me know what you think.

http://www.kellermansoftware.com/t-articlebusinessobjects.aspx
Thursday, June 07, 2007 10:24:40 AM (Romance Standard Time, UTC+01:00)
Sehr gut gemachter Internetauftritt - gef?llt uns ausgezeichnet. Bei uns finden Sie WOW Gold unter
Beste Grü?e,
Comments are closed.
All content © 2008, Jelle Druyts
On this page

Recent Photos
www.flickr.com
This is a Flickr badge showing public photos from Jelle Druyts. Make your own badge here.
Advertising
Top Picks
Statistics
Total Posts: 344
This Year: 7
This Month: 0
This Week: 0
Comments: 522
Archives
Sitemap
Disclaimer
This is my personal website, not my boss', not my mother's, and certainly not the pope's. My personal opinions may be irrelevant, inaccurate, boring or even plain wrong, I'm sorry if that makes you feel uncomfortable. But then again, you don't have to read them, I just hope you'll find something interesting here now and then. I'll certainly do my best. But if you don't like it, go read the pope's blog. I'm sure it's fascinating.

Powered by:
newtelligence dasBlog 2.0.7226.0

Sign In