Microsoft MVP Logo

Earlier this week I ran into a weird issue with workflows in my SharePoint 2013 (SP2013) Beta 2 environment. Sharing the details of the issue as well as the resolution in this post just in case someone else runs into the same issue. If you were in my webinar last week, saw the issue

The Issue...

I created a workflow in SharePoint Designer 2013 (SPD2013) and was trying to publish it to SharePoint 2013 (SP2013). Behind the curtain SPD2013 is sending the workflow to SP2013. After saving it, SP2013 then publishes it to Windows Azure Workflow (WAW). However when I was publishing it, SPD2013 would hang when it was subscribing to the workflow. Eventually it would proceed but then when I tried to execute the workflow on a list item, I'd get a JavaScript alert stating "Sorry, Something went wrong. To try again, reload the page & then start the workflow." This kept happening so the next step was to roll up my sleeves and debug deeper...

Digging Deeper...

Next step was to watch the communication between SP2013 & WAW so I setup Fiddler to debug the exchange (I outlined the steps of how to setup Fiddler to debug WAW & SP2013 in this post). I noticed when SPD2013 was hung up on the "subscribing" step, SP2013 was actually submitting the workflow to WAW and waiting for a response. However eventually the request would timeout or be aborted.

The following image of a trace from Fiddler shows this, specifically session #16 eventually failed (notice no HTTP response code). SP2013 is installed on server W15SP, I'm publishing to a team site at http://intranet.contoso.com and WAW is listening on port 12291. You can even see in the right-hand part of the image the markup of my workflow I created:

(click to see larger image)

When that happened I saw SPD2013 act like the workflow as successfully published. When I tried to run the workflow, WAW would respond with "The scope [...] has no workflows under it (shown in the following image from a Fiddler trace when I tried to start an instance of the workflow):

(click to see larger image)

This made sense as SP2013 had a record of the workflow, but WAW didn't... SP2013 was telling WAW to start a workflow on an item that WAW didn't have a record of. Strange... so I dug deeper into the Event Log and found a ton of the following errors:

(click to see larger image)

Ah... so now it looks like a problem with ServiceBus! That makes sense as WAW relies on ServiceBus. After some troubleshooting with some of the engineers, it at first appeared (from the Event Log & ULS logs) that there was a certificate issue with ServiceBus, but in fact all certs looked good. We tried to connect to the ServiceBus using the ServiceBus Explorer tool but it couldn't connect to the default workflow instance either.

The Fix...

Everything looked right, but for some reason stuff was all haywire. The fix was surprisingly easy: flush the DNS cache (c:> IPCONFIG /FLUSHDNS)... something had the wrong pointers. Not sure how that happened, but once all the cache was purged, I recycled all Workflow & Service Bus services and then tried to publish the workflow again... and it worked! This was confirmed by being able to connect to the workflow namespace with the ServiceBus Explorer successfully.

Hope my pain helps someone else along the way!

Comments powered by Disqus