Wednesday, February 13, 2008

First thoughts on Designing a LINQ-enabled Application Framework


In the previous weeks (and the followings :), we've been intensively stressing the different LINQ-to-SQL features, lots of prototyping and architecture sketches, where made trying to obtain some conclusions about: What role (if any at all) do we want to give to LINQ in the architecture of our applications?

First of all, we need to answer a basic question "Do we want to add LINQ to our model?"

As Jose wrote on the previous article, there's no doubt we love LINQ as set of language extensions, combined with a set providers (LINQ-to-*) allowing us to write elegant strong-typed queries over heterogeneous collections without knowing all their specific APIs.

Considering this, it would be great to have a LINQ "Queryable" data access layer. With that idea,  we started to analyze LINQ-to-SQL integration in Enterprise applications of different scale.


Two-tier (logically separated) WinForms/WPF Application

This is our simple case: a presentation layer designed to be always physically connected to a business layer retrieving business entities from a local Data Source.

Even while there's a logical separation between layers, they share a common Application Domain. It's the case of simple desktop application accessing directly to a local (or remote) database.

Unit of Work

When we retrieve entities from our database we will use a DataContext, who follows the Unit-of-Work pattern.

It stays alive during a single business operation, handling the SQL Connection, and tracking changes on all the entities associated to it.

Every time we insert, modify or delete entities from a DataContext it updates an in-memory ChangeSet, with copies of the original and modified values of this entities.

Finally, when we finished working with them, we tell to the DataContext to submit this changes, and all the necessary commands are sent to the database. Then it's ready to be disposed.

The DataContext follows the Unit-of-Work pattern.

This is absolutely great in this connected environment. We query our DataContext, bind the IQueryable result to a BindingSource, a DataBinded Grid, edit, insert, or delete records, and when we are ready, all we need to do is MyDataContext.SubmitChanges();

And this won't only update any change we made to the entities, it will handle foreign-keys, concurrency checks and transactability.

This also means that the entities belong to their DataContext during all their lifetime, this wiring allows features as deferred loading (of properties, associated entities, child collections) and db-generated fields.

For a lot of reasons, this seems to be the main scenario for which the current Linq-To-Sql implementation has been designed.


N-tier (physically separated) Application

Let's try to scale the previous approach to a N-tier application, in this case our Business Layer is exposed thru a Service Layer, consumed (WCF) by a physically remote presentation layer (Winforms/WPF client, Asp.Net website, etc.)

How does LINQ-to-SQL supports this scenario?

Initially, we could say that LINQ-to-SQL will remain behind the Business Layer, and won't trespass the WCF barrier.

Out-of-Topic There's a few adventurous developers (here is a project in CodePlex) implementing serialization of Expression Trees, allowing to query a remote collection exposed thru a WCF Service, serializing the query (represented in an Expression Tree), deserializing it on the server, and returning the results to the client.


But there's something we surely want to move around these layers, Entities.


The auto-generated LINQ-to-SQL entities, can get decorated (selecting unidirectional serialization in the O/R Designer) with [DataContract] and [DataMember] attributes, allowing them to travel as parameters or results of a WCF Service Operation.

As expected, this would break the connected state of this entities, loosing all the cool features we had in the previous scenario (change tracking, deferred loading, etc.)

Those aren't actually very bad news, because having that features would encourage data-centric practices, opposed to the SOA model, that WCF is based on.

If we look to the Fowler's Lazy Load pattern description "An object that doesn't contain all of the data you need but knows how to get it.", we can note that the last underlined words, are in deep contradiction with the Persistence Ignorance pattern that we are trying to follow.

One of the reasons for this is that Linq, and all the new language extensions in C# 3.0 and VB9, eases  the handling of POCO entities, a principle that LINQ-to-SQL and the new Entity Framework seems to take advantage of.

In this scenario, having our entities detached from the DataContext is something we want. And by-design, entities get detached when serialized.

When this entities (or collection of entities), return modified to the Business Layer, they are detached, we just need to attach them to a new DataContext and submit their changes.

As track changing has been broken, when we re-attach an entity to a DataContext we need to tell how this entity must be updated, specifying:

  • Original and current copies of the entities

or simply:

  • Only current copies, as all modified


In few words, the ability to re-attach entities, adds basic N-tier support to LINQ-to-SQL, cutting off all the magic features (see Change Tracking, Deferred Loading, etc) that the connected state gave us.


One size fits all Solution

The previous scenarios seems to be well handled by the current LINQ implementation. But, an immediate conclusion we had studying them, it's that they imply a different logic behind the business layer.

The connected nature of the first type of application, is certainly un-scalable to the second, having a DataContext alive thru all the lifetime of an entity is unacceptable in an enterprise application model.

Besides that, it would be a bad choice in Asp.Net website to keep the DataContext (with it's ChangeSet) alive in memory between postbacks.

We want the 2 tiers in the Two-tier application, to be not only logically separated, but "physically separable", that would improve scalability (allowing reuse of the business layer in an N-tier application), and force a better responsibility-delegation between business and presentation layers.

Disclaimer: Forcing a "one size fits all" solution, "N-tier ready", implies some over-engineering for people building a simple desktop RAD applications (like in first scenario), but our main concern is focusing in Enterprise Solutions.
In this Two-tier simpler always-connected desktop app, a possible advice could be: use LINQ-to-SQL "as it is".


All this took us to the significant choice of allowing only detached entities outside the business layer.

That implies destroying the DataContext after the entities are retrieved, and re-attaching them to a new DataContext at the moment of submitting changes. Many people got there, and found themselves struggling with the "Attach only when detached" nightmare. Rick Strahl is one of them (or should I say, us).

Attach only when detached

As explained in the Dinesh Kulkarni's blog detaching-attaching of Entities has been thought for N-Tier scenarios only, that's why attaching is allowed only for entities who has been previously serialized and deserialized.

That works great in N-Tier, but serializing-deserializing has no sense in a common application domain.

A workaround that many people had found for this, is roughly cut all the wires between entities and their DataContext, that can be accomplished resetting some Event Handlers, and replacing some deferred-loading-aware collections (EntitySet and EntityRef) with a simpler array or List<T>.

Fortunately for us, there's a feature in LINQ-to-SQL that comes to solve (in a more elegant way) this issues!.

LINQ-to-SQL POCO support

Even while the O/R Designer and SqlMetal provide automatic generation of wrapper-classes over our data entities, it's perfectly allowed to use our own POCOs decorated with the appropriate attributes.

The POCO movement (nicely explained here), it's based on the Persistence Ignorance pattern, which ensures Responsibility Delegation, in other words, we don't want our Entities to know anything about how they are persisted. The default auto-generated entities, pretty much respect this principle, they don't know anything about persistence (part of this info is in attributes decoration or mapping files).

But they do participate in their persistence mechanism!, by being closely associated to a DataContext, not only notifying changes, but loading deferred values or associated entities from it.

This behavior is mainly achieved thru change notifying events (declared in INotifyPropertyChanging/ed interface), and the new types EntityRef and EntitySet.

This two classes are used in auto-generated entities to load (lazy or not) properties created from foreign-keys, EntityRef is used for single reference (as in Product.Manufacturer.Name), and EntitySet for child-collections (as in Manufacturer.Products[2].Price).

They not only contain associated entities, they have the logic for deferred loading, and notifying the DataContext about modifications in references and child collections, allowing the change tracking feature.

As we read in LINQ-to-SQL blogs, it's possible to replace this types, with simpler, disconnected versions, EntityRef, can be replaced by a direct reference, and EntitySet by any ICollection<T>.


Putting the pieces together

With this ideas in mind, we started building prototypes, messing with O/R Designer and SqlMetal auto-generated code.

Putting together the pieces we want, replacing/discarding others.

These days we're starting to see the light and the end of tunnel, with custom tools and code we started to write.

More on this on following posts...


Jon Kruger said...

Great post. You commented on my blog so I thought I'd come see what you had to say. It looks like we're taking a pretty similar approach to things.

One thing that you're doing that I haven't thought of is using POCOs with attributes instead of using the designer-generated classes. We're doing something similar to this (we create properties in our half of the entity partial classes that wrap some LINQ properties, like some of the EntitySet properties), but we haven't taken it as far as creating our own classes.

The DBML designer gives you a great relational model of your objects. I wish that they would make it easy for you to write code that would allow you easily customize the code generator so that you could tweak how the code was generated. Then you could get the benefits of the code generator and still customize things to your liking.

Anyways... looking forward to the next post!


Anonymous said...

IMHO your solution to expose LINQ results though WCF is missed main point of the SOA. SOA never done CRUD operations. SOA is all about messaging, and UpdateCustomer is not a valid message. MakeCutomerActive or UpdateCustomersAddress are valid messages. In other words you do not need to do that attach/detach staff. You should write method UpdateCustomersAddress which will load Customer from LINQ Data Context, do some logic, setup address field, and then submit it back to database. Thats all.

Benjamin Eidelman said...

what your pointing out is a very fare question and a common discussion (SOA vs Data Centric).

In my opinion basing your apps in Data Centric operations is generally a bad decision, but very often trivial data centric operations need to be exposed on web services and there's many overheads (in performance, in designing interfaces, in the implementation of this operations, in building unit tests, etc.) in updating a Customer the way you describe when you are building simple Customer management website.

I found a nice post about this from Pablo Castro (tech leader in the ADO.Net Team) that talks about this: