Monday, September 03, 2007

This is turning into a hassle. I must confess that I feel that Microsoft does not have a good story on this one!

When thinking about versioning within the realm of workflow, there are a few things you have to know:

  • You will need to use strong signing for your processes, the activities, the External Data Exchange services and the items you put on the queue (we use these to correlate commands to queues, bypassing the weirdness of correlation in WF)
  • What is persisted to the datastore is a blob. That blob is created using serialization surrogates and use the normal binary serialization format. However, because of the surrogates, it is difficult (although not impossible) to touch your workflow instance directly, instead of going through the runtime. The surrogates are there for a reason: the serialization process of a workflowinstance is not a straight-forward process: all the activity contexts have to be serialized as well, as do the dependency properties etc.
  • The blob does not only persist your fields, but persists the complete structure of your running instance, called a template. So all the activities (initialized or not) are in that template.
  • Timers and their delays are persisted in a separate list by the surrogate. So, if your workflow instance is in a delay with 9 days left, this information is written in a timerCollectionList, with a guid pointing to the delay (remember, that delay is instantiated in a particular activityContext). It is not simple to correlate these. They are the main problem when you wish to just update your process.

Microsoft does not offer a smart way to upgrade version 1.0 to 2.0 of your workflow instance. When you have version 1.0 in your database, and make one little change to your process, dehydration will not work because of an index-out-of-bounds exception: remember that the persisted blob has the full template of the instance. So when you changed your process and added or removed an activity anywhere, the dehydration process is trying to map the persisted template to a type in your assembly and fails because of the different activity tree.

Therefor, you can do two things:

  1. Run both assemblies Side by Side
  2. Use workflow changes to change your current version to a 2.0 compatible version. 

Let's start by discussing option 2. Say you have created version 2.0 of your instance and try to rehydrate. Since you strongnamed, the runtime will throw an exception because it can not find your old assembly. You can place an assembly redirect in your config, telling the clr to try to use version 2.0 assemblies to instantiate your 1.0 blob. This will then fail because of that changed structure of your template.
The solution at hand is to use workflow changes to get in there, and change the structure of that 1.0 template to match that of a 2.0 version. You can of course only do this by loading in the old assembly side-by-side, but now you only have to do that once during an update-batch. After that, your normal application is able to use your 2.0 assembly to instantiate your 1.0 (but now structurally modified) instance.

The problem here, is that you have to build big workflow change scripts. I have not yet seen someone automate that (do a diff on the templates and generate the workflow changes). If that were available and rock-stable, this might be a good strategy to take. Until then, it's way too much work. (Let me know if this turns out to be super-simple!)

Option 1 is bad as well. Sure, loading in your old assemblies is possible. But what Microsoft forgot is that I want to change my external data exchange service as well (if only in version number) and the objects that I put in my queue. Since your old 1.0 process is expecting a 1.0 service to talk to, or 1.0 version commands, it will not be able to communicate!! This can be mitigated by adding the 1.0 External service to the runtime when loading the 1.0 assemblies, and maybe only using bcl types on the queues, but it's really a shame to have to do that. Certainly when you have processes that last 5 years, and you have 20 versions to keep up with.....

My advise is to really try to understand the way your application will use workflow. For us, I was able to make these assumptions:

  • There are a few states that need to be monitored by a delay activity of say 20 days.. When after that 20 days our process has not moved out of that state, something needs to happen.
  • Most states do not need that. Therefor, I can actually bring the process to a completed state. That actually is not what I prefer, however, since the state of a process can be derived from my domain objects, I can always construct the process at will and bring it in the correct state (with the new version!!). By completing the processes whenever possible, I will have a much smaller amount of processes to deal with in the datastore.
  • Most importantly: I found out that after processing an external event, the process will always return back into a state. It will never start a delay of more then a few minutes within a sequence. So I can guarantee that my workflow, when persisted, is not waiting inside a while-loop or whatever. All long delays are the first child within an eventdriven activity.

The last point reduces our problem big time, because it would be nearly impossible to build an update for a workflow instance that is waiting in the middle of a sequence. Basically, that is what we will be doing in our project: Build an update batch, that will load version 1.0 instances, kill them, create 2.0 version instances and write back to the database with the same guid.

The steps to build an update batch are:

  1. use workflow tracking or something to write the version of your process to your datastore. We have an oracle persistence layer. When we build it, we constructed a new column 'type' in the database and write the fullname of the type in there (which includes version number).
  2. load your old assembly so you can instantiate the blob
  3. instantiate the blob using reflection to directly get access to your instance
  4. do an export of your fields and other stuff you need. You know your processes intimately, so this should not be a problem
  5. delete the row from the datastore  (remember to start a transaction!)
  6. create a new type, using the runtime and the guid of the old instance
  7. call an import event or whatever, that the process will use to bring itself to the correct state
  8. persist

The hard part are the delays. Basically, you can find the list of timers using reflection. However, it is cumbersome to correlate the guids to the correct delay activities. My solution would be the following: during state changes within your process, keep a dictionary of the statename and the moment (Datetime) you transitioned to it. When importing, use this list and the delay.timeoutdurationEvent to setup your timer: normally, it would be DateTime.Now.Add(timeoutlenght). This time, you will use the original DateTime, and your delay activity will not have been 'reset'.

It's not pretty, and it will be necessary to put constraints on your processes. But it might work just fine for you! Let me know..

Monday, September 03, 2007 6:59:38 PM (Romance Standard Time, UTC+01:00)  #    Comments [4]  |  Trackback
 Friday, August 31, 2007

As we are scrambling to get our application ready for it's end-users, we are starting to notice some troublesome behavior by PresentationHost.exe: it tends to go up to 99% CPU-utilization and refuses to go down again.
Obviously we have noticed PresentationHost going up very fast during our debugging sessions, but what goes up, always does go down again. However, now I'm receiving lots of reports from our end-users that claim the 'application is very slow'. This turns out to be that rogue process, which refuses to go down!!

When we were able to attach windbg to it, we did not find a clear cause. It wasn't executing any of our code, to be sure. But it's definitely busy (no, there aren't any running animations!).

We've also seen multiple instances of the process, where we'd only expect one.

This is turning into a real headache. When we figure it out, I'll be sure to post!

Friday, August 31, 2007 8:24:14 PM (Romance Standard Time, UTC+01:00)  #    Comments [0]  |  Trackback
 Tuesday, August 28, 2007

The new entityframework ctp was released. The official statement is here.
It includes the designer, so I'm very curious how this fares against nHibernate! Congratz to the team!

The Devguy does a summary of the features. Read this hilarious piece:

  • In previous CTPs the underlying provider connection was opened when the ObjectContext was constructed and held open for the life of the context—this would create issues, for instance, with databases where licensing was based on the number of concurrently open connections since the connection might be held open for an extended period of time—even when the connection is not being used.  The new model keeps the connection closed as much as possible while still appropriately dealing with transactions and avoiding promotion of those transactions from local to DTC where possible.

Really?

I mean, so, they now no longer keep the connection open all the time and instead have opted to close it when it's not in use..... because of licensing issues??

ROFL. I would think there are better reasons not to keep a connection open for too long, but that's just me.

;-)

Tuesday, August 28, 2007 1:23:20 PM (Romance Standard Time, UTC+01:00)  #    Comments [4]  |  Trackback
 Thursday, August 02, 2007

Just ran across a comment from Daniel Puzey here, with the following excellent advise:

Often, by default, you'll get an error reported at Line 1 of the xaml, which is an obvious lie. You can catch the original exception, though:

- Open the "Exceptions" window (Debug/Exceptions) in Visual Studio.
- Click "add"
- Add "System.Windows.Markup.XamlParseException"
- Check the box to break on throw for this exception.
- Hit F5!

You'll find that the XamlParseException you catch is much more descriptive, and will give the correct position in the xaml file.

Thursday, August 02, 2007 6:44:01 PM (Romance Standard Time, UTC+01:00)  #    Comments [0]  |  Trackback
 Thursday, July 26, 2007

When we just started utilizing WPF we didn't like the way converters had to be created for every little thing you wanted. We looked into using jscript for this after a forum post by Jonathan. At that point, we decided against it. However, he went ahead and created a fantastic package which hasn't had enough exposure imho. So take a look here.

Basically, it will let you create inline code for simple converters. I wish I could use that now!

Thursday, July 26, 2007 9:38:02 PM (Romance Standard Time, UTC+01:00)  #    Comments [1]  |  Trackback
 Wednesday, July 25, 2007

My team is scrambling to get a stable version of our application to our final test team! It's a very exciting time and things are going relatively easy. This is the time that small 'easy' things seem to work against you, as did a simple registry write:
Our application is a corporate intranet distributed xbap, which gets full trust by means of a certificate. It's users are all local-administrators (say whaaattt??). I needed to write a key to a registry and this worked fine in visual studio debug mode. However, once hosted, the application crashed, due to insufficient access to the registry. I checked the permissions on the key, and it had full control for administrators.

Giving the user group full control on the key did the trick. But why? I would have thought I had broken free of the sandbox by means of the certificate.

Chango V. from the WPF team explains:'PresentationHost.exe runs with a restricted process token. In particular, it gives up the SIDs for the Administrators and Power Users groups.'
That certainly explains it. Quite unexpected though. I probably haven't looked hard enough, but I couldn't find that in the SDK.

Also, he mentioned: 'Yes, this is a design flaw in our hosting process. You get "full trust" from CLR point of view, but not in terms of NT security. We'll try to address this issue in a future release.'

So there you have it. Keep that in mind when you develop full trust xbaps!

Wednesday, July 25, 2007 8:06:27 PM (Romance Standard Time, UTC+01:00)  #    Comments [0]  |  Trackback
 Wednesday, July 04, 2007

I've been very interested in the progress of the entity framework. Recently, the June ctp was announced.

It boasts some new feature I had been waiting for, like the ability to detach the object from a context. This is necessary when you want to work in a disconnected manner. It would be fantastic if we would be able to retrieve a graph (spanning is now supported) and send it to a client, having the client change it and then reattach it to a new object context on the server. This would mean having to do change tracking on the client yourself, which does leave for more flexibility in your serialization format, as compared to using strange change tracking iEnumerable implementations that are out of your control.
This is the path Microsoft is taking. There is no real persistence ignorance yet, but they might be heading for a comfortable compromise.

Which brings me to nHibernate. The project I'm heading is a pretty big client-server application, which we are migrating to-wards a more flexible n-tier. We are rebuilding the client-side with WPF (for other reasons) and are implementing WF in the back-end to manage our processes. We have build our own naive OR-mapping layer on top of the datasets that were already in place. This has allowed me to delay having to make a choice for a real OR-mapper.
I have had some great experience with nHibernate, having used it since the 0.7 beta. Lately I have not had the opportunity to use it, but it seems the 1.2 release has added some major missing pieces of functionality like SPROC support and generics. I have some time left before I will have to choose between entity framework and nHibernate but I can already see it's going to be tough:

The generated code is ugly and although they have listened to the persistence ignorance argument, it seems too little, too late. Their V1.0 implementation might be an attempt to do it correctly, but if they had just listened earlier, their approach would have been much cleaner.
nHibernate is based on a proven concept and is very clean (although the code-base wasn't clean when I stepped through it ;-) ). It has a great community uptake. It, however, still lacks good modeling tools and it only has one dedicated programmer.

In the end, if Entity Framework turns out workable, it might be the better choice. It will get a big community (although a large part of that community will consist of programmers that don't even know of the alternatives), it has a big team of smart people working on it (although they needed quite a few tries to get it right) and it will have great visual tools (love it). In the end, most importantly, when I hire new people for the project, they are more likely to know EF then nHibernate. Is that a good reason though????

Wednesday, July 04, 2007 4:32:55 PM (Romance Standard Time, UTC+01:00)  #    Comments [6]  |  Trackback
 Wednesday, May 09, 2007

One very hot issue in workflow foundation is the problems you get when you want to handle some external event in your workflow multiple times, having the correct activity invoked based on some arbitrary piece of data. This can be very helpful if you want to use just one event, for instance: 'procesCommand', and issue different commands to your workflow. So, based on the eventargs of the procesCommandEvent, a different activity will be executed. Having such a system, very rapid development is possible and I like the idea of issuing my workflow 'commands'.

With the standard 'handle external event activity' [HEEA], this is entirely possible. However, it's usage is seriously hampered by the need to setup correlation before-hand. Let's say you have a statemachine, and in some state we will have a few eventdriven activities. The first child-activity of these eventdriven activities is always a HEEA. Since we just want to raise one event, the HEEA is configured with the same event on your external data service. To let the system know which HEEA should react when you raise your 'procesCommand' event, you have to setup a correlation token.
You will thus set up one correlation token for each HEEA that should react in your state. This can be done during the state-initialization. So the defining of the token is done some place other then the configuring of your HEEA itself. The whole proces is very cumbersome with more then a few HEEA to configure. Especially because when you configure the HEEA to use a correlation token, visual studio will present you with a dropdown list of all tokens it knows about, including the ones from other states.

I do not like this mechanism. It's very error-prone, counter intuitive and basically a load of crap.

A great solution would be to build your own version of HEEA, which will just be configured with a string that identifies the command it will react to. Seems easy enough. You will have to implement IEventActivity and possibly IActivityListener<QueueEventArgs> and your done! There is a great example in the SDK that does this for the filesystem. However, while doing this I found out that when the queuename is not unique, only the first IEventActivity will get the Subscribe call. This means that setting up multiple eventactivities with the same queuename (for instance 'procesCommandQueue') is very hard.

Enter the correlationservice: The WF-team created an elaborate service that works by registering 'followers' (the HEEA activities that will not get the subscribe call) and delegating to the first HEEA (the one that did get the subscribe call) the responsibility of notifying it's followers when a message was picked up from the queue.
(In case you haven't read Essential Workflow Foundation by Dharma Shukla and Bob Schmidt: you should, it has an essential introduction into queue's which is the foundation of workflow, which, coincidentally, is the title of their book ;-) ).

That correlationservice is not something to be proud of, and also not something that one would want to build themselves. But without it, only your first procesCommand activity will be able to react to your event, and it might not be the one that should react!

This explains the cowardly piece in the title of this post: by making each queueName unique, all your problems will go away. Therefor, when you setup your queue in the initialize of your activity, name it 'procesCommandQueue' + this.CommandToReactTo. That will give it a unique name.
Then, instead of raising an event, just set your eventargs on the queue of your workflowinstance like this: instance.EnqueueItem('procesCommandQueue' + CommandYouWantToIssue, yourEventargs, null, null).

Since all your procesCommandHandlers were subscribed to different queue's, the correct one will pickup the eventargs + execute, and all is fine. Thus, the cowardly, but perfectly acceptable way to go about this problem, is to tackle it from the outside, instead of the inside.

Wednesday, May 09, 2007 9:08:00 PM (Romance Standard Time, UTC+01:00)  #    Comments [0]  |  Trackback