Wednesday, March 12, 2008

Implementing N-Tier Change Tracking with Linq to SQL


When designing our application data model, we think some of our entities as an in-memory cache of small pieces of data living in a database (MSSQL, Oracle, Xml files, etc.). This data will be jumping between both worlds.

To keep this "cache" synchronized, we need mechanisms to read data from storage, and update changes back, to perform that we create CRUD operations.

The "U" in CRUD is the subject of this entry, update changes back.

Once we read an entity from the db we must track changes on it, in order to reproduce them in the storage.

To facilitate this, ADO.Net brought features as the DataRow.RowState property. Based on that we wrote code, that typically uses the RowState to perform the corresponding creations, updates, or deletes in the db.

What happens when we have Linq to SQL custom classes?


Linq to SQL DataContext Object Tracking

When entities are obtained from a DataContext object, this DataContext subscribes to change notification events on every entity, and tracks changes automatically, generating an internal change set. When we want to persist changes, we call the DataContext SubmitChanges() method and all the INSERT, UPDATE and DELETE commands are sent to the SQL Server.

This mechanism works great in a connected scenario, but in an N-tier architecture, modifications of your entities can be done in a machine far far away from the DataContext that created them, they can't (or shouldn't! :)) notify this changes thru the wire.

As we explained in previous posts, we need detached entities.

How do Linq to SQL supports this scenario? It allows you to detach and re-attach entities, but now change tracking is your job.

Detaching entities can be achieved in two ways:

  • serializing-deserializing entities, deserialized entities are detached by nature.
  • setting the DataContext ObjectTrackingEnabled property to false (after that all entities obtained are not tracked).

Notice that the first option would force serialization and deserialization even in a connected Winforms app,  so we'll always use the second.

Once your entity comes back modified to the data access layer you can re-attach it to a new DataContext using the Attach() method in every Linq Table, but you must tell if this entity has been modified/created/deleted (or not).

"Those who don’t know their history are doomed to repeat it."

That means we need to track this information on the client, and send it back to the server. We need to build our own... disconnected change tracking!


Disconnected Change Tracking

Of course we are not the first in getting here!, using Linq to SQL in an N-tier scenario is something many people is working on. Actually the same issue is been discussed in the Entity Framework world, because there seems to be no official solution there either (yet).

So I googled a bit to see what others are doing. And most of the solutions can be grouped in this two categories:

Portable Client-DataContext

At an early stage of Linq to SQL Microsoft were planning to ship this with the first release, as explained by Mat Warren, he described it like this:

"The mechanism that does the change tracking on the client is similar to a mini-connectionless DataContext.  It is a type that packages up all the objects, lists of objects and graphs that you want to send to the client.  It serializes itself and everything you've given it automatically.  (It implements IXmlSerializable.)  On the client, the same class also manages change-tracking for all the objects that were serialized with it.  When serialized again (on the trip back) it serializes both the objects and their change information that it logged.  Back on the middle tier, you just make one call to re-attach the whole package to a new DataContext instance and then call SubmitChanges."

Finally, this didn't get to the current release, and many people came up with their own implementation of it, deeper ruminations on this approach can be found in this OakLeaf article.

The best thing here is "entity pureness", entities completely ignore persistence, they can be pure POCOs, without a base class. But be aware that this pureness is not so absolute, entities must implement interfaces an events for property change notification, we just don't note this because the implementation is in auto-generated code.

Also, this means that an entity never knows his own dirty state or original values, to know it you must reference this portable context. This makes rollback changes a complicated task (actually there's no support for rollback to original values in the Linq to SQL DataContext).

This also requires some "packing" and "unpacking" code in the client.


Entities with state flags

The other approach implies including in every entity a state field, and eventually original values. An attractive point here is that this is just what our old well known ADO.Net DataSet has been doing all this time (se DataRow.RowState property). It's a pattern we've seeing for years in change tracking.

We can achieve this using an entity base class (this can be set in a Dbml file, but unfortunately this attribute is not visible in the O/R Designer).

To avoid sticking ourselves to an specific implementation we can use an "entity with state" interface.


The Tercer Planeta's choice

Matthew Hunter is working on the same problem, and asked in a previous post which way are we taking, that's why I decided to write this entry.

When standing against this fork in the road so crucial in our lives, first of all we breath deeply, and thought why we need this, and decide to list the kind of changes we want to track. And we found at that all we need to track is:

  • Dirty state in every entity
  • Added/Removed/Existent states only in entity collections

Why we don't want Added/Removed/Existent states in single entities? Because that info should be in the business logic, in other words the business logic knows if I'm creating, deleting or updating. We don't want a generic "ApplyChanges" method internally doing an insert, update or delete at will.

This means, if I'm in the ProducEditForm and click on the "save" button, I expect the Form (and not the entity) to tell me which action to perform (create, update or delete).

Based on that we came up with a separation of concerns here.

We chose the path of entities with state, in this path of green hills and crystal clear waters, every entity knows his dirty state thru a boolean flag in their base class. But adds/deletes of children are tracked by a custom collection class (replacing EntitySet).

Optionally, entities can keep a copy of their original values (for change rollback and concurrency checking when there's no timestamp), to allow this we may force ICloneable implementation.

Tracking must be activated explicitly. This doesn't only allow optional "read-only" mode, but also to distinguish new and pre-existent (on the db) entities.

ULinqGen compatibility

To facilitate change tracking the way I described above, a minor surgery is required on the Linq to SQL auto-generated entities... thanks god we are building a code generator custom tool for Dbml!

We are adding some improvements to our ULinqGen tool.

Our code generator must support change tracking but without binding it to our (or any) custom change tracking implementation.

What should we add to auto-generated code?

  • Entity base class, Linq to SQL already has this feature, so no extra work would be necessary. the only drawback is that the EntityBaseClass attribute is global to all the entities in a Dbml, and is not visible from the O/R Designer (you can write it on the Dbml file with notepad). We'll probably add capability to specify per-entity base class.
  • Property change notification, but as we don't have a listening context, we only need a generic "I'm getting dirty!" instance method call, we are requiring an IEntityState interface with this method. The implementation is let to the base class (if any), it could be an extension method of IEntityState!.
  • We want to support optional "original values tracking", to allow this ULinqGen could easily generate an ICloneable implementation based on the entity metadata.

This will keep our code generator tool pretty much "naive" about the specific change tracking system. And read-only mode is still the default.

I'm not showing any code because we're currently discussing much of this, so you'll have to wait for the following ULinqGen releases! ;)


Matthew Hunter said...

Hi Benjamin,

I think you're on a very good path here!

I like your idea of using a custom tool to generate instead of the standard code generator that's shipped. I've actually downloaded and checked out the source code, nice work! This also avoids having to use reflection (which I resorted/degraded too for a quick win!), not to mention complete control over how it works and not having to work within the existing limitations.

The reason why I went down the path I have, is purely because I didn't have that much time to do it. I really wanted to use LINQ but it had that disconnected limitation. And hence, my solution is a quickly thought through answer to a pain in the butt problem.

I'm glad that you're taking a more "architected" approach, one I wish I could given the time.

In response to your post specifically:

I've also gone down the path of having the entity know whether it's dirty or not, although I'm not keeping the state of each object (at least not yet) like you - instead I'm defining whether it's a new, modify or delete.

However keeping the state has advantages as you can roll back the value of objects quite easily.

You may hit a similar problem to me, i.e. which is where do you stick deleted objects when they are removed. I may have to implement a change tracking object (perhaps in the root?) that attaches itself to all objects events so that I can record deletion event - however you can probably do it in a very smart way by keeping it in the customer EntitySet<> that you mentioned - although you may need to do a little bit of faking with one-to-one relationships by keeping a hidden EntitySet<> behind them.

If your interested, one idea I really like that you might want to consider is implementing IEnumerable on the objects OR something similar. This made the solution very flexible. It allows a number of things:
1. Being able to traverse all the relationships
2. Being able to search for an object without knowing the path to it
3. Being able to pull all objects into one collection, whith the ability to filter.
4. Being able to expose one method that allows the developer to sync all objects with the Data Context (i.e. no attach statements required).

Check out my blog if you want to see how it can be done.

Anyway, I am really looking forward to your next revision and I'm enjoying sharing ideas with you.

Anything I can do to help, just ask!



Benjamin Eidelman said...

Thanks for the feedback Matthew!

I noticed that you are handling "new" and "delete" states togheter with "modified/unchanged", this is more similar to DataRow.RowState, and that looks great too, in Tercer Planeta we had big polemics about that :), finally the "separation of concerns" idea won the fight. At last, maybe both options aren't very far from each other.

The way to solve the "deleted objects problem", we solved just as you suggest, having a "custom" entity set, that mantains an inner list of removed items.

I'm gonna check better your articles about this, but I found interesting the IEnumerable idea, I think, maybe that can be transformed later into a mixin feature on the entity base class (that we haven't coded yet), or implemented as extension methods, I'll check it out

Thanks a lot!,

Joe Albahari said...

Hi there!

I notice ULINQGen is distributed under a GPL license (rather than a LGPL license or permissive license). Is this a mistake? It certainly limits its usefuless!

Benjamin Eidelman said...

Joe, It's indeed a mistake, thanks for your observation!, I'll correct the license in the CodePlex Project site now, and update the license in the source code soon.