Tuesday, 04 March 2008

Ever since I first got into workflow foundation, I've taken a fancy to statemachines. Once you wrap your head around them, they are a natural fit for most business processes.
The main problem everybody seems to be having with workflow though, is the versioning story. There is none!
That might be a bit harsh, you can certainly version your workflows, but to tell you the truth, you will be in a world of hurt.

The sample solution can be downloaded at the end of the post. It contains two workflows and a console application that you can play with.

Why is this updating so tough?
The workflow template is serialized to the persistence store. Any change in the workflow (adding or removing an activity) will make it impossible to deserialize the workflow again. It's serialized as a blob, so no easy transformation. I've written extensively about problems surrounding updating workflows here.

Your options pretty much exist of running side by side (which gives you a world of even more hurt, because now you have your data exchange services to version as well, and the activity library you have built) or use dynamic changes to alter the structure.
The latter being your best bet, but so much work that it takes away from the flexibility and speed of development that workflow brings to the table.

In my previous post I concluded that you would be best of just destroying your old workflow and create a new one. I stand by that! Today I was finally able to revisit the problem, and I hacked together a solution that might be interesting to people.

This solution has the following restriction:

It will only work for statemachines, that are waiting inside a state for an eventdriven activity, not inside an eventdriven activity. In other words: it is only able to update workflows that have entered a state and started waiting, not ones that have executed a few activities and is now waiting on some other input within a sequence.

Luckily for me, that is no problem at all, and it should not be a problem for you either. Statemachines should be modeled such that waiting happens when entered in a state, never inside a sequence. You can model waits inside a sequence, but I would suggest you make the delays short (minutes, as opposed to days/months/years).

My goal here is to be able to do a relatively easy update, where I have control over how I update (what to do with state etc.) and get my delays initialized to the correct timeouts again. So, in workflow1 I had a delay of 11 months, with 8 months left. When I start workflow2 and update, I need to have 8 months left again, and not 11.

Getting the delays right is the hard part.

I use some nice reflection to get to the actual type of a workflow instance. I described how to do that here. However, I was being silly. It's much easier:

            Workflow1 oldWF = workflowRuntime.GetRootActivity(instance) as Workflow1;

Made possible by these extensions:

    public static class WFExtensions
        public static object GetExecutor(this WorkflowRuntime workflowRuntime, WorkflowInstance instance)
            return workflowRuntime.GetType().InvokeMember(
                "Load", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.InvokeMethod, null, workflowRuntime,
                new object[] { instance.InstanceId, null, instance });
        public static object GetRootActivity(this WorkflowRuntime workflowRuntime, WorkflowInstance instance)
            object executor = workflowRuntime.GetExecutor(instance);
            return executor.GetType().GetField("rootActivity",
                    BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField).GetValue(executor) as CompositeActivity;

So, here goes:

  1. Get to your old workflow instance. In my sample I use types Workflow1 and Workflow2.
                WorkflowInstance instance = runtime.GetWorkflow(g);
                WorkflowRuntime workflowRuntime = runtime;
                Workflow1 oldWF = workflowRuntime.GetRootActivity(instance) as Workflow1;
                if (oldWF == null)
                object executor = workflowRuntime.GetExecutor(instance);
                instance.Suspend("asdf");   // need not to unload, otherwise the database record would be unlocked

    I suspend the workflow, so it does not get into the way, but I can not unload, or worse: terminate. That would kill the record in the database.

  2. Create a new workflow, of your desired type, and copy the workflowInstanceID to it:
                // get a handle to the instanceid property
                DependencyProperty instanceidDP = (DependencyProperty)executor.GetType().GetField("WorkflowInstanceIdProperty",
                    BindingFlags.NonPublic | BindingFlags.Static | BindingFlags.Instance).GetValue(executor);
                // create new wf2, not starting it yet
                WorkflowInstance newWFInstance = workflowRuntime.CreateWorkflow(typeof(Workflow2));
                Workflow2 newWF = workflowRuntime.GetRootActivity(newWFInstance) as Workflow2;
                // copy the guid
                newWF.SetValue(instanceidDP, instance.InstanceId);
  3. Build up a list of activities that are on timers and remember their name and when they expire:
                Dictionary<string, DateTime> activitiesExpireList = new Dictionary<string, DateTime>();
                TimerEventSubscriptionCollection subscriptions = ((TimerEventSubscriptionCollection)
                foreach (TimerEventSubscription subscription in subscriptions)
                    // find out what activity was subscribed
                    var x = from queueInfo in instance.GetWorkflowQueueData()
                            where subscription.QueueName.GetType().Equals(queueInfo.QueueName.GetType())
                            where subscription.QueueName.CompareTo(queueInfo.QueueName) == 0
                            select new { ExpiresAt = subscription.ExpiresAt, Activities = queueInfo.SubscribedActivityNames };
                    foreach (var combination in x)
                        foreach (string activityname in combination.Activities)
                            activitiesExpireList.Add(activityname, combination.ExpiresAt);

    The weird part being the fact that the queue names are mostly guids (for delays atleast).

  4. Call a method on your new type. See how cool it is we can actually communicate this way with it, instead of having to go through communication services!!
                // allow new workflow to read information from old workflow to init itself.
                newWF.Update(oldWF, instance, activitiesExpireList);
  5. Copy the new workflow to the rootactivity of our executor. Ouch.. yeah.. don't worry.
                // copy the new rootactivity to the executor
                    BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.GetField).SetValue(executor, newWF);
  6. Last bits:
                // start it up
                newWFInstance.Unload(); // overwrites current record in persistence store
                instance.Abort();   // kills of our original
                newWFInstance = runtime.GetWorkflow(g);
                StateMachineWorkflowInstance statemachine = new StateMachineWorkflowInstance(runtime, g);
                // still need to unload or unload the runtime to get all timers correctly!
                Console.WriteLine("updated" + newWFInstance.InstanceId);

    You can see me starting and unloading, then killing our old instance. Finally I am trying to be smart by using the statemachineworkflowinstance to do a transition to a new state on the new workflow. The newstate can be determined by the new workflow (who has knowledge of these things) but is usually the same as in your old workflow. (This was build so that you could rename a state).

  7. That's it. In the Workflow2 class, I have an update method, which will set a boolean to true. The initialization activity will look for it in an if/else and not do anything if it is set to true. All the delays in the new workflow have an initTimeout method like so:
            private void initTimeout(object sender, EventArgs e)
                DelayActivity delay = (DelayActivity)sender;
                if (activitiesExpireList.ContainsKey(delay.Name))
                    delay.TimeoutDuration = activitiesExpireList[delay.Name].Subtract(DateTime.Now.ToUniversalTime());

I have uploaded the complete sample here.

When you run it, you can press 'c' to create a new workflow of type Workflow1. Then you can press 'u' and paste in the guid of the workflow just created. It will update the workflow. Pressing 'b' will break and unload the workflow.
Your created workflow has this state:


Where the delay is 40 seconds. Workflow2 has the same state, but has a delay of only 10 seconds.

As a test you can see that after updating, you will have a workflow2 running (there is another activity present that will print out debug information). The delay was set correctly.

Obviously, you might want to deal with the delays your own way. Because you have all the information in your workflow codebehind, you can think of your own rules on how the delay timeouts should be set.

Realize that touching the internals of WF like this is not what Microsoft envisioned and should be done with care.

Have fun, and let me know what you think.

Thursday, 08 May 2008 10:13:58 (Romance Standard Time, UTC+01:00)
I've not tried it yet, but assuming it works, thank you!
Friday, 09 May 2008 16:35:27 (Romance Standard Time, UTC+01:00)
I've had several mails from people actually using the technique in live systems, so it will work.
However, the tracking system might need to be taken care of.
Tuesday, 24 June 2008 14:43:15 (Romance Standard Time, UTC+01:00)
I tried implementing above solution but i am using .net 3.0 framework and above solution works in .Net 3.5
Can we do the same thing in .net 3.0 ? Somehow i tried implmenting it in .Net 3.0 But not abel to set the state of new workflow with the old state. However i am abel to replace the old workflow with the new one.
I guess it happnes because of the framework version. Can you please tell me how exactly can i implement it in .net 3.0 ?

Atul Bachhav
Thursday, 24 July 2008 11:55:54 (Romance Standard Time, UTC+01:00)
Hi Atul,

sorry. I don't have a 3.0 only machine here. It shouldn't really be a problem though! I'm quite sure you'll get it to work.

Good luck,
Wednesday, 17 December 2008 23:30:31 (Romance Standard Time, UTC+01:00)
Very interesting solution Ruurd. We came across the EXACT problem when putting together our own set of long running state machine instances. We will be going through a NUMBER of changes to our workflows, so we needed a way to apply those changes but didn't want to go through the hurdles that you describe above (maintaining multiple assembly versions or going through the WF Upadating objects - both are BAAAAD).

I was never able to get an instance of the workflow once the version number had changed or the XOML behind it had changed. I assume that your reflection tricks and the code in step one rectify that hurdle. If so, I'll have to give that a try.

I eventually came to an unfortunate solution that leaves the old workflow orphaned only to use the new one (until it changes again too). However, the transition to the new workflow is very easy. The only problem was history. I have a customer who needs very good historical information regarding state change history. I had to come up with a solution that keeps track of my instance Ids in the tracking database and gives the user an ongoing history for the state changes regardless of which workflow instance is running at the given time.

How does your solution work with keeping track of state changes in the tracking database?

I'll give your solution a try soon.

Thanks for the post!
Friday, 19 December 2008 18:34:31 (Romance Standard Time, UTC+01:00)
Hi Paul,

the situation is very unfortunate and I have come to the conclusion that you need to design workflows with this upgrade plan in mind.
All in all I have not done anything to the tracking service, so you will be in bad shape there.

It might be possible to just rename the guid of the new workflow to the new one in the database though.

Tricks like that should not be possible, and i'm hoping wf 4 will mitigate the situation.

Thursday, 14 October 2010 15:01:18 (Romance Standard Time, UTC+01:00)
