Friday, April 25, 2008

Object Materialization

From the moment we included Linq to SQL in the Data Access Layer of our prototype applications, as a nice side-effect we started using POCOs for most entities.

Linq to SQL (as any ORM) enables the use of POCOs. POCO entities have many appeals, most of them based on Persistence Ignorance.  We delegate the persistence responsibility to our Data Access Layer, where we have code to materialize (read from a data source) and persist (save to a data source) entities (as in-memory objects).

Object Materialization, when working with ADO.Net, means projecting Object Collections from DataReaders, populating the properties of a custom object with the fields of a data record.

Linq to SQL performs this task internally, but it's limited to Linq queries over MS SQL databases. Now that we have POCO entities, it would be nice to obtain them from different data sources, like *-SQL, ODBC, Excel spreadsheets, CSV, or anything implementing an IDataReader interface.

As Jose Marcenaro anticipated here, and here, we want to design a Data Access Layer were Linq to SQL and Enterprise Library DAAB can live in harmony, that means: transparently sharing the same entity model.

Our primary objective is a custom stored procedure invocation based on Enterprise Library, projecting the same class of entities living in Linq to SQL Dbml files. Enterprise Library (or pure ADO.Net) queries deliver DataReaders (or DataSets), so the piece that's missing here is Object Materialization.

Like an alchemist seeking for a process to turn lead into gold, I began my quest for a Linq compatible ADO.Net Object Materializing mechanism.

I'm gonna show 3 different attempts, and at the end of this post, you can download a simple benchmarking app with all the different mechanisms I tried on.

First Attempt, FieldInfo.SetValue()

Fist of all, I noticed that an object materializer is a necessary part of any Linq Provider, and found an "official sample" in the Matt Warren blog, in a series of posts about "Building an IQueryable Provider".

http://blogs.msdn.com/mattwar/archive/2007/07/31/linq-building-an-iqueryable-provider-part-ii.aspx (below the title "The Object Reader")

He shows a simple Object Reader (aka Materializer), described as:

"The job of the object reader is to turn the results of a SQL query into objects. I’m going to build a simple class that takes a DbDataReader and a type ‘T’ and I’ll make it implement IEnumerable<T>. There are no bells and whistles in this implementation. It will only work for writing into class fields via reflection. The names of the fields must match the names of the columns in the reader and the types must match whatever the DataReader thinks is the correct type."

Basically, what this Object Reader does is:

  • Use Reflection over the target object type, to obtain the collection of FieldInfos.
  • Map the names in the FieldInfos with the DataReader field names.
  • While iterating the DataReader, use the FieldInfo.SetValue() method to populate the new target object type instances.

When working with Reflection performance is the first we worry about. As he advices, this is not a real world implementation, the use of Reflection to set the field values resulted very expensive.

Just to make it more Linq-compatible, I modified this object reader to look at properties instead of fields, setting private fields when its specified in a ColumnAttribute, like this:

private int _OrderID; 

[Column(Storage="_OrderID", AutoSync=AutoSync.OnInsert, DbType="Int NOT NULL IDENTITY", IsPrimaryKey=true, IsDbGenerated=true)]
public int OrderID
{
...
}


This is what Linq to SQL does.


The performance remained almost unchanged, because the cost of the initial lookup of Property/FieldInfo is negligible compared to the (NumberOfFields * NumberOfRecords) SetValue invocations.


This option becomes extremely unperformant when reading more than 500 rows. Its intended for didactic purposes only.


Dynamic Translator Lambda Expression


My second attempt were the most fun and educational prototype I wrote about Linq. As the previous attempt showed, using Reflection to populate fields must be avoided, The most simple and performant way to do this job is:




    while (dataReader.Read()) {

Pet pet = Translate(dataReader);

yield return pet;

}

function Pet Translate(IDataRecord dr) {

return new Pet {

Id = (int)dataReader.GetValue(0),

Name = (string)dataReader.GetValue(1),

Birthdate = (DateTime)dataReader.GetValue(2)

}

}



But life isn't that easy, I don't know fields names and positions until runtime, even further I want generic code, independent of the entity type (e.g. Pet).


Note that the Translate function above contains only one Object Initializer (new in C# 3.0), it could be wrote as a Lambda Expression (new in C# 3.0 too)




    Func<IDataRecord,Pet> Translate = (IDataRecord dr => new Pet {

Id = (int)dataReader.GetValue(0),

Name = (string)dataReader.GetValue(1),

Birthdate = (DateTime)dataReader.GetValue(2)

}



Again, this code can't be hardcoded, how can we create this Lambda Expression dynamically at runtime? with Expression Trees (yes, new in C# 3.0 too!)


In C# 3.0 we can programmatically build Expression Trees and compile them later into function delegates. We're going to use the Reflection info to build the above Func<IDataRecord,*>. Once it's compiled is (almost) as fast as directly getting values from the DataReader as shown above.


The code looks a little scary because it uses (and abuses of) Linq to Objects over the PropertyInfo collection to build the Expression Tree: it's like "Linq to Linq". I found big Lambda Expressions a little difficult to indent (don't worry you can download the source files at the end :))




using System; 
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data;
using System.Reflection;
using System.Linq.Expressions;
namespace ObjectMaterializer
{
public static class TranslatorBuilder
{

/// <summary>
/// Dynamically creates a Translator Lambda Expression to project T instances from the IDataRecord
/// </summary>
/// <typeparam name="T">The projected type</typeparam>
/// <param name="record">A source data record</param>
/// <returns></returns>
public static Expression<Func<IDataRecord, T>> CreateTranslator<T>(IDataRecord record)
{
// get properties info from the output type
Dictionary<string, PropertyInfo> propInfos = typeof(T).GetProperties().ToDictionary(pi => pi.Name);

// get field names in the DataRecord
var fieldMapping = new Dictionary<int, PropertyInfo>();
for (int i = 0; i < record.FieldCount; i++)
{
string name = record.GetName(i);
if (propInfos.ContainsKey(name))
fieldMapping[i] = propInfos[name];
}

// prepare method info to invoke GetValue and IsDBNull on the IDataRecord
MethodInfo rdrGetValue = typeof(IDataRecord).GetMethod("GetValue");
MethodInfo rdrIsDBNull = typeof(IDataRecord).GetMethod("IsDBNull");
// prepare reference to the IDataRecord rdr parameter
ParameterExpression rdrRef = Expression.Parameter(typeof(IDataRecord), "rdr");

/** builds the translator Lambda Expression
*
* assing each property to its matching field, e.g.:
* new T {
* PropertyName1 = (ProperCast1)rdr.GetValue(ordinal1),
* PropertyName2 = rdr.IdDbNull(ordinal2) ? (ProperCast2)null : (ProperCast2)rdr.GetValue(ordinal2),
* ...
* }
*
* Note that null values on non-nullable properties will throw an Exception on assignment
*
* **/
Expression<Func<IDataRecord, T>> proj = (Expression.Lambda<Func<IDataRecord, T>>(
Expression<T>.MemberInit(Expression<T>.New(typeof(T)),
fieldMapping
.Select(fm =>
Expression.Bind(fm.Value,

((!fm.Value.PropertyType.IsValueType) || (fm.Value.PropertyType.IsGenericType && fm.Value.PropertyType.GetGenericTypeDefinition() == typeof(Nullable<>))) ?
//accepts nulls, test IsDbNull
(Expression)Expression.Condition(Expression<bool>.Call(rdrRef, rdrIsDBNull, Expression<int>.Constant(fm.Key)),
Expression.Convert(Expression.Constant(null), fm.Value.PropertyType) // value is System.DbNull, assign null
,
Expression.Convert(
(fm.Value.PropertyType == typeof(System.Data.Linq.Binary)) ? // convert byte[] to System.Data.Linq.Binary

(Expression)Expression.New(typeof(System.Data.Linq.Binary).GetConstructor(new Type[] { typeof(byte[]) }),
Expression.Convert(Expression.Call(rdrRef, rdrGetValue, Expression<int>.Constant(fm.Key)), typeof(byte[])))

:
(Expression)Expression.Call(rdrRef, rdrGetValue, Expression<int>.Constant(fm.Key)) // value is not-null, assign
, fm.Value.PropertyType)
)

:
// doesn't accept nulls, direct assign
(Expression)Expression.Convert(Expression.Call(rdrRef, rdrGetValue, Expression<int>.Constant(fm.Key)), fm.Value.PropertyType)

) as MemberBinding
)
)
, rdrRef
));

return proj;
}

}
}


Note that timestamps, returned as byte[] by ADO.Net, are transformed into System.Linq.Binary by Linq to SQL, I added support for that.


Now we can use this TranslatorBuilder like this:




// generic method
public IEnumerable<T> ReadAllObjects<T>(IDataReader reader){

Expression<Func<IDataRecord,T>> translatorExpression = TranslatorBuilder.CreateTranslator<T>(reader);
Func<IDataRecord,T> translator = translatorExpression.Compile();

while (reader.Read())
{
T instance = translator(reader);
yield return instance;
}
}
public IEnumerable<Pet> ReadAllPets(IDataReader reader) {
return ReadAllObjects<Pet>(reader);
}


This is pretty elegant and performs great... but is not enough for us.


The Linq Object Materializer normally sets the private fields to avoid invoking the public property setters. This is not only to avoid a performance overhead, but because public property setters are often used for change tracking. With a Lambda Expression (or any C# expression) we can't access private fields.


Here I almost surrendered. How can I set private fields without using Reflection?


If Microsoft guys can, we can!, my boss (Jose Marcenaro) told me about an advanced and mysterious .Net feature, brought with the .Net 2.0 Framework: LCG, Lightweight Code Generation.


Googling around I found that LCG is what the Linq to SQL team used to build their Object Materializer, used on the DataContext.Translate() function.


Wait!, Why not just use the DataContext.Translate() function?, because:

  • It works only for Microsoft SQL Server databases
  • It requires an open db connection as a parameter
  • It requires .Net 3.5 (LCG is in .Net 2.0). Of course using an Object Materializer in .Net 2.0 if you don't have Linq to Objects may not sound so interesting.

Lightweight Code Generation


Since .Net 2.0 under the namespace System.Reflection.Emit are a couple classes that allow to programmatically generate dynamic methods from MSIL (Microsoft Intermediate Language) instructions. It's like adding at runtime a little piece of pre-compiled code.


Using this, we can build at runtime fast methods to set or get a field or property (even private ones). Here you may think "IL instructions??? I don't want to learn a low-level programming language for this!!". Relax, you only need 5 MSIL instructions.


Here's a helping class that generates a field set:




using System; 
using System.Collections.Generic;
using System.Text;
using System.Reflection;
using System.Reflection.Emit;

namespace ObjectMaterializer
{
public static class AccessorBuilder
{

public delegate void MemberSet<T>(T obj, object value);

public static MemberSet<T> CreateFieldSet<T>(FieldInfo fi)
{
Type type = typeof(T);

DynamicMethod dm = new DynamicMethod("Set" + fi.Name, null, new Type[] { type, typeof(object) }, type);

ILGenerator il = dm.GetILGenerator();
// load the target object instance (argument 0) in the stack
il.Emit(OpCodes.Ldarg_0);

// load the new value (argument 1) in the stack
il.Emit(OpCodes.Ldarg_1);

if (fi.FieldType.IsValueType)
// if field contains a value type, we need to unbox it
il.Emit(OpCodes.Unbox_Any, fi.FieldType);
else
// if field contains a non-value type, we need to cast it
il.Emit(OpCodes.Castclass, fi.FieldType);

// set fi object's field value from the stack
il.Emit(OpCodes.Stfld, fi);

// return the value on the top of the stack
il.Emit(OpCodes.Ret);

return (MemberSet<T>)dm.CreateDelegate(typeof(MemberSet<T>));

}

}

}

Lightweight Generated Code, and MSIL are advanced subjects, you can find a lot of samples googling around.


Using this field setters, I improved my first attempt, replacing FieldInfo.SetValue() with this dynamically generated methods, which once compiled into delegates perform as fast as conventional methods.


Later, I added a Field Setters Cache, to avoid building this dynamic methods again on every query.


Some benchmarking shows that this approach is (almost) as fast as the DataContext.Translate() function. There's a small performance difference yet, Why? If anyone can tell me, I'll be glad to update this post! :)


Anyway, our primary objective (in bold at the very beginning of this post) is achieved!!!, (it wasn't to beat the Linq to SQL Object Materializer performance).


The code


I put all these mechanisms in a simple benchmarking app that you can download here


  • First Attempt (using FieldInfo.SetValue): SimpleObjectReader.cs
  • Dynamic Translator Lambda Expression: TranslatorObjectReader.cs
  • LCG setters: LinqObjectReader.cs
  • re-using the DataContext.Translate() function: LTSObjectReader.cs

To run this you will need VisualStudio 2008, .Net 3.5 Framework and a Northwind db, which connection string you can set in the app.config.

Wednesday, March 12, 2008

Implementing N-Tier Change Tracking with Linq to SQL

 

When designing our application data model, we think some of our entities as an in-memory cache of small pieces of data living in a database (MSSQL, Oracle, Xml files, etc.). This data will be jumping between both worlds.

To keep this "cache" synchronized, we need mechanisms to read data from storage, and update changes back, to perform that we create CRUD operations.

The "U" in CRUD is the subject of this entry, update changes back.

Once we read an entity from the db we must track changes on it, in order to reproduce them in the storage.

To facilitate this, ADO.Net brought features as the DataRow.RowState property. Based on that we wrote code, that typically uses the RowState to perform the corresponding creations, updates, or deletes in the db.

What happens when we have Linq to SQL custom classes?

 

Linq to SQL DataContext Object Tracking

When entities are obtained from a DataContext object, this DataContext subscribes to change notification events on every entity, and tracks changes automatically, generating an internal change set. When we want to persist changes, we call the DataContext SubmitChanges() method and all the INSERT, UPDATE and DELETE commands are sent to the SQL Server.

This mechanism works great in a connected scenario, but in an N-tier architecture, modifications of your entities can be done in a machine far far away from the DataContext that created them, they can't (or shouldn't! :)) notify this changes thru the wire.

As we explained in previous posts, we need detached entities.

How do Linq to SQL supports this scenario? It allows you to detach and re-attach entities, but now change tracking is your job.

Detaching entities can be achieved in two ways:

  • serializing-deserializing entities, deserialized entities are detached by nature.
  • setting the DataContext ObjectTrackingEnabled property to false (after that all entities obtained are not tracked).

Notice that the first option would force serialization and deserialization even in a connected Winforms app,  so we'll always use the second.

Once your entity comes back modified to the data access layer you can re-attach it to a new DataContext using the Attach() method in every Linq Table, but you must tell if this entity has been modified/created/deleted (or not).

"Those who don’t know their history are doomed to repeat it."

That means we need to track this information on the client, and send it back to the server. We need to build our own... disconnected change tracking!

 

Disconnected Change Tracking

Of course we are not the first in getting here!, using Linq to SQL in an N-tier scenario is something many people is working on. Actually the same issue is been discussed in the Entity Framework world, because there seems to be no official solution there either (yet).

So I googled a bit to see what others are doing. And most of the solutions can be grouped in this two categories:

Portable Client-DataContext

At an early stage of Linq to SQL Microsoft were planning to ship this with the first release, as explained by Mat Warren, he described it like this:

"The mechanism that does the change tracking on the client is similar to a mini-connectionless DataContext.  It is a type that packages up all the objects, lists of objects and graphs that you want to send to the client.  It serializes itself and everything you've given it automatically.  (It implements IXmlSerializable.)  On the client, the same class also manages change-tracking for all the objects that were serialized with it.  When serialized again (on the trip back) it serializes both the objects and their change information that it logged.  Back on the middle tier, you just make one call to re-attach the whole package to a new DataContext instance and then call SubmitChanges."

Finally, this didn't get to the current release, and many people came up with their own implementation of it, deeper ruminations on this approach can be found in this OakLeaf article.

The best thing here is "entity pureness", entities completely ignore persistence, they can be pure POCOs, without a base class. But be aware that this pureness is not so absolute, entities must implement interfaces an events for property change notification, we just don't note this because the implementation is in auto-generated code.

Also, this means that an entity never knows his own dirty state or original values, to know it you must reference this portable context. This makes rollback changes a complicated task (actually there's no support for rollback to original values in the Linq to SQL DataContext).

This also requires some "packing" and "unpacking" code in the client.

 

Entities with state flags

The other approach implies including in every entity a state field, and eventually original values. An attractive point here is that this is just what our old well known ADO.Net DataSet has been doing all this time (se DataRow.RowState property). It's a pattern we've seeing for years in change tracking.

We can achieve this using an entity base class (this can be set in a Dbml file, but unfortunately this attribute is not visible in the O/R Designer).

To avoid sticking ourselves to an specific implementation we can use an "entity with state" interface.

 

The Tercer Planeta's choice

Matthew Hunter is working on the same problem, and asked in a previous post which way are we taking, that's why I decided to write this entry.

When standing against this fork in the road so crucial in our lives, first of all we breath deeply, and thought why we need this, and decide to list the kind of changes we want to track. And we found at that all we need to track is:

  • Dirty state in every entity
  • Added/Removed/Existent states only in entity collections

Why we don't want Added/Removed/Existent states in single entities? Because that info should be in the business logic, in other words the business logic knows if I'm creating, deleting or updating. We don't want a generic "ApplyChanges" method internally doing an insert, update or delete at will.

This means, if I'm in the ProducEditForm and click on the "save" button, I expect the Form (and not the entity) to tell me which action to perform (create, update or delete).

Based on that we came up with a separation of concerns here.

We chose the path of entities with state, in this path of green hills and crystal clear waters, every entity knows his dirty state thru a boolean flag in their base class. But adds/deletes of children are tracked by a custom collection class (replacing EntitySet).

Optionally, entities can keep a copy of their original values (for change rollback and concurrency checking when there's no timestamp), to allow this we may force ICloneable implementation.

Tracking must be activated explicitly. This doesn't only allow optional "read-only" mode, but also to distinguish new and pre-existent (on the db) entities.

ULinqGen compatibility

To facilitate change tracking the way I described above, a minor surgery is required on the Linq to SQL auto-generated entities... thanks god we are building a code generator custom tool for Dbml!

We are adding some improvements to our ULinqGen tool.

Our code generator must support change tracking but without binding it to our (or any) custom change tracking implementation.

What should we add to auto-generated code?

  • Entity base class, Linq to SQL already has this feature, so no extra work would be necessary. the only drawback is that the EntityBaseClass attribute is global to all the entities in a Dbml, and is not visible from the O/R Designer (you can write it on the Dbml file with notepad). We'll probably add capability to specify per-entity base class.
  • Property change notification, but as we don't have a listening context, we only need a generic "I'm getting dirty!" instance method call, we are requiring an IEntityState interface with this method. The implementation is let to the base class (if any), it could be an extension method of IEntityState!.
  • We want to support optional "original values tracking", to allow this ULinqGen could easily generate an ICloneable implementation based on the entity metadata.

This will keep our code generator tool pretty much "naive" about the specific change tracking system. And read-only mode is still the default.

I'm not showing any code because we're currently discussing much of this, so you'll have to wait for the following ULinqGen releases! ;)

Tuesday, March 11, 2008

Fitting LINQ to SQL in an Application Framework

I've gathered some of the ideas we've discussing on the LINQ to SQL technologies and how to fit them in our own layered application framework and wrote an article for the Level Extreme .NET online magazine.

The topics covered are:

  • What's good in the current LTS implementation.
  • Why it's not ready to fit - as it is - in a layered application framework.
  • Aspects of LTS we need to override.
  • How to tear apart the LTS package, use what we need and provide the rest.

The article concludes with the need of a custom LINQ to SQL generator ( see our CodePlex ULinqGen Project )  and of a custom way to provide disconnected change tracking on the entities.

Read the full article here.

Saturday, March 1, 2008

How to use the Unplugged LINQ to SQL Generator

As Chris Rock pointed out, the first release of our Code Project custom tool for LINQ to SQL code generation lacks any usage documentation. We'll fix this on the next few days, meanwhile these are the basic instructions:

Install the Custom Tool

After building (VS 2008) the ULinqGen project, you may either register the assembly with regasm, or build and run the setup project. Then you should close the IDE (all VS2008 instances) and reopen it to make the new custom tool available.

Code a generic Data Context class in your project

It may be as simple as this:
    public class MyDataContext : DataContext
{

private static System.Data.Linq.Mapping.MappingSource
mappingSource = new AttributeMappingSource();

public MyDataContext(string connection) :
base(connection, mappingSource)
{
}
}

Associate each DBML to the custom tool

On the project explorer, for each model in which you want to use this tool, set the "Custom Tool Name" property to "ULinqToSQLGenerator"

Get tables from your generic data context and you are ready to LINQ

Assuming your DBML has an Invoice entity with InvoiceItem children and a Customer foreign relation, it may look as this:
    MyDataContext dc = new MyDataContext( myConnectionString);
Table invoices = dc.GetTable<Invoice>();

var results = from i in invoices
where i.IsApproved
&& i.Customer.FirstName.StartsWith("John")
&& i.InvoiceItems.Count > 2
order by i.ApprovalDate
select i;
return results.ToArray();

From here it's up to you!

Friday, February 29, 2008

Does LINQ to SQL replace the whole DAL?

Probably the balanced answer should be "it depends on the application you are building", but I usually go for a straight "no way" - there's always time to subtleties.

At least in any SQL Server based application that is not trivial (in terms of data model and user concurrence), there's little doubt about the key role that stored procedures should still play. Their functionality may now be complemented with LINQ to SQL direct operations in a mix whose exact proportion depends on the application context.

This may range from air-tight corporate DB environments (no direct access allowed)  to more pragmatic scenarios in which a big chunk of the simple data operations are easily implemented as LINQ queries. In almost any case, some stored procedures are still required as the best way of solving complex queries and handling complex or critical updates.

Having established this, there's still a question about the necessity of a "duplicate" data access layer, considering that LINQ to SQL does allow stored procedure calls. In my opinion, several reasons make a separate, full-featured DAL an important component of you application

  • Full control over the SP invocation: your generic data access code may be customized with the desired exception handling, parameter examination and completion, and any fine-grain control your application requires.
  • Ability to return or fill datasets: in several situations (i.e. reporting) datasets are still a simple and flexible way to carry a list of tabular data thru your application tiers.
  • If you prefer to return custom objects (or you are a TDD / mockable objects fan), you may extend the DAL to materialize a POCO instance or collection from a Data Reader obtained thru a SP call.

The last assertion - returning a POCO instance or collection in your custom DAL, instead of just calling the SP thru LINQ to SQL is motivated by the need to establish a clear rule of use:

  • Use the LTS data access for direct LINQ queries and updates
  • Use the custom DAL for all SP calls.

How to implement the custom objects materialization from a DataReader ? You may write your own code -examining attributes or thru reflection - or you may just delegate this task to the Translate method of a LTS Data Context, inside your own DAL. Assuming the necessary LTS attributes are present in the Customer class,  this code works:

    DbDataReader dr = cmd.ExecuteReader();
myDataContext.Translate<Customer>(dr);

NOTES:


  • For a reason unknown to me, the provided DC should hold a valid database connection, even when data is pulled from the already open data reader.


  • And if you are thinking of this as a way to workaround LINQ to SQL 's ties to SQL Server (after all, a DbDataReader may be obtained from many data sources) forget it: the Translate method with throw an exception when the reader is not a SqlDataReader. Nice try!



Side topic: for an interesting discussion on the typed dataset vs LINQ to SQL objects, you may read this entry in Aaron's Technology Musings blog.

Wednesday, February 27, 2008

Unplugged LINQ to SQL Generator

As several smart guys already pointed out,  the LINQ to SQL (LTS) package as it is doesn't always fit well in an N-Tier architecture. On this topic you may read Rick Strahl 's early post on "LINQ to SQL and attaching Entities" and Nick Kellet's review on Planet Moss: LINQ To SQL ≠ N-Tier Architecture?

The RTM version offers a way of detaching thru serialization and re-attaching thru the Attach API for the frequent scenario of Web Services / WCF distributed apps. And even some capability of serving "non tracked" objects thru the EnableObjectTracking data context property.

But in my opinion this is not good enough for building an "all-terrain" business layer, serving disconnected DTOs which may be either light-weight POCOs or more complex entities with stand-alone change tracking, in the same way the good old dataset allows. A discussion on this topic can be found in Benjamin Eidelman previous entry.

That said, it's also palpable that many of the tools in the LTS bag are just too good to be ignored. So maybe the key for a successful adoption of this technology is to have a clear understanding of its pieces and combine then in the way that best serve your particular scenario.

With that in mind, we started by manually coding entities to replace the ones generated by the LTS custom tool (MSLinqToSqlGenerator), and we come across a couple of interesting conclusions:

  • The Data Context class does not need to have strongly typed table properties for making the queries: a generic data context may be used, and invoking GetTable<yourEntity>() provides the virtual collections on which LINQ queries are based. This allows us to break the awkward coupling of DC and entities in a single (big) DBML file, and take a more flexible approach.

  • Almost any class decorated with the right LTS attributes is able to be used as a query entity, even pure POCOs (Plain Old CLR objects). The default classes, closely tied up to the DC by notify events and EntitySet collections, are not mandatory for using the power of the LINQ provider in LTS.

At that point we decided we wanted to develop our own custom tool for translating the DBML build by the O/R designer into our own entities code. As we intended our classes to be detached from the DC by design, the name "Unplugged LINQ to SQL Generator" was an unanimous decision.

So we are walking that path. The first release of our custom tool, a proof of concept with limited capabilities, was published today as a Code Plex project: http://www.codeplex.com/ULinqGen .

Stay tuned! More to come soon...

Wednesday, February 13, 2008

First thoughts on Designing a LINQ-enabled Application Framework

 

In the previous weeks (and the followings :), we've been intensively stressing the different LINQ-to-SQL features, lots of prototyping and architecture sketches, where made trying to obtain some conclusions about: What role (if any at all) do we want to give to LINQ in the architecture of our applications?

First of all, we need to answer a basic question "Do we want to add LINQ to our model?"

As Jose wrote on the previous article, there's no doubt we love LINQ as set of language extensions, combined with a set providers (LINQ-to-*) allowing us to write elegant strong-typed queries over heterogeneous collections without knowing all their specific APIs.

Considering this, it would be great to have a LINQ "Queryable" data access layer. With that idea,  we started to analyze LINQ-to-SQL integration in Enterprise applications of different scale.

 

Two-tier (logically separated) WinForms/WPF Application

This is our simple case: a presentation layer designed to be always physically connected to a business layer retrieving business entities from a local Data Source.

Even while there's a logical separation between layers, they share a common Application Domain. It's the case of simple desktop application accessing directly to a local (or remote) database.

Unit of Work

When we retrieve entities from our database we will use a DataContext, who follows the Unit-of-Work pattern.

It stays alive during a single business operation, handling the SQL Connection, and tracking changes on all the entities associated to it.

Every time we insert, modify or delete entities from a DataContext it updates an in-memory ChangeSet, with copies of the original and modified values of this entities.

Finally, when we finished working with them, we tell to the DataContext to submit this changes, and all the necessary commands are sent to the database. Then it's ready to be disposed.

The DataContext follows the Unit-of-Work pattern.

This is absolutely great in this connected environment. We query our DataContext, bind the IQueryable result to a BindingSource, a DataBinded Grid, edit, insert, or delete records, and when we are ready, all we need to do is MyDataContext.SubmitChanges();

And this won't only update any change we made to the entities, it will handle foreign-keys, concurrency checks and transactability.

This also means that the entities belong to their DataContext during all their lifetime, this wiring allows features as deferred loading (of properties, associated entities, child collections) and db-generated fields.

For a lot of reasons, this seems to be the main scenario for which the current Linq-To-Sql implementation has been designed.

 

N-tier (physically separated) Application

Let's try to scale the previous approach to a N-tier application, in this case our Business Layer is exposed thru a Service Layer, consumed (WCF) by a physically remote presentation layer (Winforms/WPF client, Asp.Net website, etc.)

How does LINQ-to-SQL supports this scenario?

Initially, we could say that LINQ-to-SQL will remain behind the Business Layer, and won't trespass the WCF barrier.

Out-of-Topic There's a few adventurous developers (here is a project in CodePlex) implementing serialization of Expression Trees, allowing to query a remote collection exposed thru a WCF Service, serializing the query (represented in an Expression Tree), deserializing it on the server, and returning the results to the client.

 

But there's something we surely want to move around these layers, Entities.

 

The auto-generated LINQ-to-SQL entities, can get decorated (selecting unidirectional serialization in the O/R Designer) with [DataContract] and [DataMember] attributes, allowing them to travel as parameters or results of a WCF Service Operation.

As expected, this would break the connected state of this entities, loosing all the cool features we had in the previous scenario (change tracking, deferred loading, etc.)

Those aren't actually very bad news, because having that features would encourage data-centric practices, opposed to the SOA model, that WCF is based on.

If we look to the Fowler's Lazy Load pattern description "An object that doesn't contain all of the data you need but knows how to get it.", we can note that the last underlined words, are in deep contradiction with the Persistence Ignorance pattern that we are trying to follow.

One of the reasons for this is that Linq, and all the new language extensions in C# 3.0 and VB9, eases  the handling of POCO entities, a principle that LINQ-to-SQL and the new Entity Framework seems to take advantage of.

In this scenario, having our entities detached from the DataContext is something we want. And by-design, entities get detached when serialized.

When this entities (or collection of entities), return modified to the Business Layer, they are detached, we just need to attach them to a new DataContext and submit their changes.

As track changing has been broken, when we re-attach an entity to a DataContext we need to tell how this entity must be updated, specifying:

  • Original and current copies of the entities

or simply:

  • Only current copies, as all modified

 

In few words, the ability to re-attach entities, adds basic N-tier support to LINQ-to-SQL, cutting off all the magic features (see Change Tracking, Deferred Loading, etc) that the connected state gave us.

 

One size fits all Solution

The previous scenarios seems to be well handled by the current LINQ implementation. But, an immediate conclusion we had studying them, it's that they imply a different logic behind the business layer.

The connected nature of the first type of application, is certainly un-scalable to the second, having a DataContext alive thru all the lifetime of an entity is unacceptable in an enterprise application model.

Besides that, it would be a bad choice in Asp.Net website to keep the DataContext (with it's ChangeSet) alive in memory between postbacks.

We want the 2 tiers in the Two-tier application, to be not only logically separated, but "physically separable", that would improve scalability (allowing reuse of the business layer in an N-tier application), and force a better responsibility-delegation between business and presentation layers.

Disclaimer: Forcing a "one size fits all" solution, "N-tier ready", implies some over-engineering for people building a simple desktop RAD applications (like in first scenario), but our main concern is focusing in Enterprise Solutions.
In this Two-tier simpler always-connected desktop app, a possible advice could be: use LINQ-to-SQL "as it is".

 

All this took us to the significant choice of allowing only detached entities outside the business layer.

That implies destroying the DataContext after the entities are retrieved, and re-attaching them to a new DataContext at the moment of submitting changes. Many people got there, and found themselves struggling with the "Attach only when detached" nightmare. Rick Strahl is one of them (or should I say, us).

Attach only when detached

As explained in the Dinesh Kulkarni's blog detaching-attaching of Entities has been thought for N-Tier scenarios only, that's why attaching is allowed only for entities who has been previously serialized and deserialized.

That works great in N-Tier, but serializing-deserializing has no sense in a common application domain.

A workaround that many people had found for this, is roughly cut all the wires between entities and their DataContext, that can be accomplished resetting some Event Handlers, and replacing some deferred-loading-aware collections (EntitySet and EntityRef) with a simpler array or List<T>.

Fortunately for us, there's a feature in LINQ-to-SQL that comes to solve (in a more elegant way) this issues!.

LINQ-to-SQL POCO support

Even while the O/R Designer and SqlMetal provide automatic generation of wrapper-classes over our data entities, it's perfectly allowed to use our own POCOs decorated with the appropriate attributes.

The POCO movement (nicely explained here), it's based on the Persistence Ignorance pattern, which ensures Responsibility Delegation, in other words, we don't want our Entities to know anything about how they are persisted. The default auto-generated entities, pretty much respect this principle, they don't know anything about persistence (part of this info is in attributes decoration or mapping files).

But they do participate in their persistence mechanism!, by being closely associated to a DataContext, not only notifying changes, but loading deferred values or associated entities from it.

This behavior is mainly achieved thru change notifying events (declared in INotifyPropertyChanging/ed interface), and the new types EntityRef and EntitySet.

This two classes are used in auto-generated entities to load (lazy or not) properties created from foreign-keys, EntityRef is used for single reference (as in Product.Manufacturer.Name), and EntitySet for child-collections (as in Manufacturer.Products[2].Price).

They not only contain associated entities, they have the logic for deferred loading, and notifying the DataContext about modifications in references and child collections, allowing the change tracking feature.

As we read in LINQ-to-SQL blogs, it's possible to replace this types, with simpler, disconnected versions, EntityRef, can be replaced by a direct reference, and EntitySet by any ICollection<T>.

 

Putting the pieces together

With this ideas in mind, we started building prototypes, messing with O/R Designer and SqlMetal auto-generated code.

Putting together the pieces we want, replacing/discarding others.

These days we're starting to see the light and the end of tunnel, with custom tools and code we started to write.

More on this on following posts...