Monday, March 03, 2008

Ever since I was using domain objects on the wire and using WCF httpbindings to serialize them, I've had one serious issue with the way cyclic / circular references are serialized. In this post I present a solution that will create a more beautiful XML representation of your object graph than ever before!
As you know, I'm a big proponent of easy Domain Driven Design and flexibility. So, the solution I present you will be supereasy to implement (just use an attribute).

The issue at hand

Let's create two objects that have references to each other:

    [DataContract(Namespace = "myNamespace", Name = "Person")]
    public class Person
    {
        [DataMember]
        public int PersonID { get; set; }
        [DataMember]
        public string FirstName { get; set; }
        [DataMember]
        public string LastName { get; set; }
        [DataMember]
        public int Something;
        [DataMember]
        public List<int> Numbers { get; set; }
        [DataMember]
        public List<int> FieldedNumbers;
        [DataMember]
        public List<Order> Orders { get; set; }
        public Person()
        {
            Orders = new List<Order>();
            Numbers = new List<int>();
            FieldedNumbers = new List<int>();
        }
    }
    [DataContract(Namespace = "myNamespace", Name = "Order")]
    public class Order
    {
        [DataMember]
        public Person Customer { get; set; }
        [DataMember]
        public int Amount { get; set; }
        [DataMember]
        public int ProductID { get; set; }
        [DataMember]
        public int OrderID { get; set; }
        }

The diagram thus looks like this:

image

When you use a regular DataContractSerializer to serialize this, you will get a stack overflow, since there is a circular reference between person and order.

  1         [TestMethod]
  2         public void TestWithoutSurrogateSubstitution()
  3         {
  4             Person p = new Person { PersonID = 23, FirstName = "Ruurd", LastName = "Boeke", Something = 666 };
  5             p.Numbers.Add(31);
  6             p.Numbers.Add(29);
  7 
  8             p.FieldedNumbers.Add(12);
  9             p.FieldedNumbers.Add(24);
 10 
 11             p.Orders.Add(new Order { Amount = 12, Customer = p, OrderID = 12, ProductID = 19 });
 12             p.Orders.Add(new Order { Amount = 12, Customer = p, OrderID = 13, ProductID = 255 });
 13 
 14             DataContractSerializer s = new DataContractSerializer(p.GetType(), null, int.MaxValue, false, true, null);
 15 
 16             string outMessage = GetWellFormedToContract(p, s);
17         }

So, on line 14, you will see that I have to instantiate my serializer with the 'preserveObjectReferences' boolean set to true. The outputmessage of line 16 is shown here:

<Person xmlns:i="http://www.w3.org/2001/XMLSchema-instance" z:Id="1" xmlns:z="http://schemas.microsoft.com/2003/10/Serialization/" xmlns="myNamespace">
    <FieldedNumbers xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays" z:Id="2" z:Size="2">
        <d2p1:int>12</d2p1:int>
        <d2p1:int>24</d2p1:int>
    </FieldedNumbers>
    <FirstName z:Id="3">Ruurd</FirstName>
    <LastName z:Id="4">Boeke</LastName>
    <Numbers xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays" z:Id="5" z:Size="2">
        <d2p1:int>31</d2p1:int>
        <d2p1:int>29</d2p1:int>
    </Numbers>
    <Orders z:Id="6" z:Size="2">
        <Order z:Id="7">
            <Amount>12</Amount>
            <Customer z:Ref="1" i:nil="true" />
            <OrderID>12</OrderID>
            <ProductID>19</ProductID>
        </Order>
        <Order z:Id="8">
            <Amount>12</Amount>
            <Customer z:Ref="1" i:nil="true" />
            <OrderID>13</OrderID>
            <ProductID>255</ProductID>
        </Order>
    </Orders>
    <PersonID>23</PersonID>
    <Something>666</Something>
</Person>

Now, I have had long conversations with the PM of WCF, in the days it was still called Indigo, about the way this is serialized. I think it was harder to actually set the boolean back then, because the xml created here is not platform independent. Look at all the z:ref and z:ID attributes. I hate them!!

This serialization method is only understood by WCF, so you need both WCF on the client and on the server.
That's not really an issue for most, and there does not exist a standard for serializing object graphs. So there just is not a way to solve this issue. However, there is a way to be more friendly and get rid of the ugly z:ID and z:Ref attributes!!

Interested?? Read on.

My Solution: create surrogates on the fly

I use PostSharp to create a surrogate type during the compilation phase. Then I use a DataContractSurrogate to substitute these during serialization and deserialization.

All you need to do is attach that surrogate on your service (I have created an attribute for that) and use an attribute on your domainclasses to inform PostSharp to do it's magic:

    [CreateSerializeSurrogate]
    [DataContract(Namespace = "myNamespace", Name = "Person")]
    public class Person
    {
        [DataMember]
        public int PersonID { get; set; }
}

So, the new XML that is being sent on the wire, looks like this:

  1 <Person xmlns:i="http://www.w3.org/2001/XMLSchema-instance" xmlns="myNamespace">
  2     <FieldedNumbers xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
  3         <d2p1:int>12</d2p1:int>
  4         <d2p1:int>24</d2p1:int>
  5     </FieldedNumbers>
  6     <FirstName>Ruurd</FirstName>
  7     <LastName>Boeke</LastName>
  8     <Numbers xmlns:d2p1="http://schemas.microsoft.com/2003/10/Serialization/Arrays">
  9         <d2p1:int>31</d2p1:int>
 10         <d2p1:int>29</d2p1:int>
 11     </Numbers>
 12     <Orders>
 13         <Order>
 14             <Amount>12</Amount>
 15             <Customer>
 16                 <SerializationID>0</SerializationID>
 17             </Customer>
 18             <OrderID>12</OrderID>
 19             <ProductID>19</ProductID>
 20             <SerializationID>1</SerializationID>
 21         </Order>
 22         <Order>
 23             <Amount>12</Amount>
 24             <Customer>
 25                 <SerializationID>0</SerializationID>
 26             </Customer>
 27             <OrderID>13</OrderID>
 28             <ProductID>255</ProductID>
 29             <SerializationID>2</SerializationID>
 30         </Order>
 31     </Orders>
 32     <PersonID>23</PersonID>
 33     <SerializationID>0</SerializationID>
 34     <Something>666</Something>

35 </Person>

That's much better!

Look at line 33, where our person is assigned an ID. Now, checkout lines 16 and 25: Our Order object is pointing to our already serialized Person class.

How does it work
The CreateSerializeSurrogate attribute

If you have read my posts about PostSharp, you know that it is a framework that is able to alter the assemblies you create after they have been compiled. The CreateSerializeSurrogate instructs PostSharp to create a nested type inside your domainobject, called xxxSurrogate. That surrogate class has a constructor taking in the original type. It copies all values to it's own values. However, it is also passed a List<object>. It checks whether the original object was already serialized. If it was, no values will be copied, except the serializationID. So, at line 15 for instance, you see a complete person instance serialized, but just without it's values.

PostSharp also introduces a Deserialize(List<object>) method, that will do the reverse.

Let's take a look at the result using reflector:

  1 [DataContract(Namespace="myNamespace", Name="Person")]
  2 public class Person
  3 {
  4     // Fields
  5     [CompilerGenerated]
  6     private string <FirstName>k__BackingField;
  7     [CompilerGenerated]
  8     private string <LastName>k__BackingField;
  9     [CompilerGenerated]
 10     private List<int> <Numbers>k__BackingField;
 11     [CompilerGenerated]
 12     private List<Order> <Orders>k__BackingField;
 13     [CompilerGenerated]
 14     private int <PersonID>k__BackingField;
 15     [DataMember]
 16     public List<int> FieldedNumbers;
 17     [DataMember]
 18     public int Something;
 19 
 20     // Methods
 21     static Person();
 22     public Person();
 23     public void CopyDataFromSurrogate(PersonSurrogate surrogate);
 24 
 25     // Properties
 26     [DataMember]
 27     public string FirstName { [CompilerGenerated] get; [CompilerGenerated] set; }
 28     [DataMember]
 29     public string LastName { [CompilerGenerated] get; [CompilerGenerated] set; }
 30     [DataMember]
 31     public List<int> Numbers { [CompilerGenerated] get; [CompilerGenerated] set; }
 32     [DataMember]
 33     public List<Order> Orders { [CompilerGenerated] get; [CompilerGenerated] set; }
 34     [DataMember]
 35     public int PersonID { [CompilerGenerated] get; [CompilerGenerated] set; }
 36 
 37     // Nested Types
 38     [DataContract(Name="Person", Namespace="myNamespace")]
 39     public class PersonSurrogate
 40     {
 41         // Fields
 42         [DataMember(EmitDefaultValue=false)]
 43         public List<int> FieldedNumbers;
 44         [DataMember(EmitDefaultValue=false)]
 45         public string FirstName;
 46         [DataMember(EmitDefaultValue=false)]
 47         public string LastName;
 48         [DataMember(EmitDefaultValue=false)]
 49         public List<int> Numbers;
 50         [DataMember(EmitDefaultValue=false)]
 51         public List<Order> Orders;
 52         [DataMember(EmitDefaultValue=false)]
 53         public int PersonID;
 54         [DataMember]
 55         public int SerializationID;
 56         [DataMember(EmitDefaultValue=false)]
 57         public int Something;
 58 
 59         // Methods
 60         public PersonSurrogate();
 61         public PersonSurrogate(Person source, List<object> graphList);
 62     }
 63 }
 64 
 65  
66

I have chosen not to expand methods. See how there is a complete PersonSurrogate class, that you will not find in your sourcecode. Also, a CopyDataFromSurrogate exists! The constructor on line 61 is shown here:

public PersonSurrogate(Person source, List<object> graphList)
{
    if (!graphList.Contains(source))
    {
        graphList.Add(source);
        this.PersonID = source.PersonID;
        this.FirstName = source.FirstName;
        this.LastName = source.LastName;
        this.Numbers = source.Numbers;
        this.Orders = source.Orders;
        this.Something = source.Something;
        this.FieldedNumbers = source.FieldedNumbers;
    }
    int ListID = graphList.IndexOf(source);
    this.SerializationID = ListID;
}

Well, there you go.

Doing this in PostSharp was not the easiest thing, but also not the hardest. I did it, so you don't have to. Let's look at a small piece of the weaver:

        private void CopyFieldsToSurrogate(InstructionWriter writer)
        {
            foreach (FieldDefDeclaration field in source.Fields)
            {
                if (field.CustomAttributes.FirstOrDefault(attr => attr.ConstructRuntimeObject() is DataMemberAttribute) != null)
                {
                    // has datamember
                    FieldInfo finfo = field.GetReflectionWrapper(null, null);
                    FieldDefDeclaration f = new FieldDefDeclaration();
                    surrogate.Fields.Add(f);
                    f.Name = field.Name;
                    f.FieldType = field.FieldType;
                    f.Attributes = System.Reflection.FieldAttributes.Public;
                    // copies the datamember attribute in the sourcecollection
                    CopyDataMemberAttribute(field.CustomAttributes, f.CustomAttributes);
       &nb