As you might have noticed I have somewhat fallen in love with Office Web Apps 2013, or WAC as we say now that we’ve gotten this close to each other. It’s an amazingly well written server product with the good side benefit that it is also very usable for the end-users. Even though me and WAC has been hanging around for a while and by now know each other pretty well, WAC has constantly been reporting that it is Unhealthy. And from what I’ve seen, heard and experienced in the field I am not alone…
Office Web Apps 2013 Health Reporting
Office Web Apps 2013 has a really well written reporting mechanism that constantly monitors and reports issues with your WAC farm and its machines. You can at any time see the Health status of your machines by running the following Windows PowerShell command:
If you have installed and configured the farm according to the interwebs and/or official sources it is most likely that all your servers are reporting Unhealthy even though your WAC farm is running fine, from the end-user perspective.
The health reporting mechanism in Office Web Apps consists of a number of Watchdog processes (FarmStateManagerWatchdog.exe, ImagingWatchdog.exe etc). These processes regularly check for issues with the different WAC services. For instance it makes sure all the servers and service endpoints are responding, it checks if the proofing tools works by actually testing the service with a correctly spelled word and an incorrectly spelled word and so on. If any of these watchdog process reports an error the machine is marked as Unhealthy. (Note: It will check all the reports from the last 10 minutes, so it is a slight delay in this status).
You can see the Health reports in either the log files (SharePoint ULS Trace Log style) in your log directory on the WAC machines, but you might find it easier (and faster) to look in the Event Viewer of the machine. You will find the logs under Applications and Service Logs > Microsoft Office Web Apps.
In this log you will clearly see all the errors and be able to solve them, well at least most of them can be fixed by reading (and understanding) the log entry.
Two common reasons for Unhealthy machines
As I said, most WAC farms I’ve seen are reporting all machines as Unhealthy and I’ve found two major reasons for this. One is related to certificates and the second is related to Windows Server 2012/IIS8 and WCF.
Correct WAC Farm certificate
If you have been following the SharePoint 2013 space recently you should be pretty aware of that you should protect your server communication with SSL. Especially Office Web Apps 2013, which sends the AuthN token in clear text over the wire. So you create a certificate for your WAC farm load balanced DNS address, wac.contoso.com for instance, install it on the WAC machines, set up the farm and everything looks fine. But all the machines are reporting that they are Unhealthy. If you take a closer look at the WAC logs you will see exceptions like this: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.
It’s not that strange actually. In order to test the endpoints of the different machines the watchdog processes cannot call the load balanced DNS entry (wac.contoso.com) but instead has to call the individual machines, for instance wacserver1.contoso.com – and the certificate is not valid for that DNS entry.
In order to resolve this issue for the farm you have to create a new SAN certificamyte (or optionally a wildcard) containing the name of all the machines and the load balanced DNS entry.
If you’re creating a new farm just proceed as normal with a certificate like this. If you have an already existing farm and need to update the certificate you just install the certificate on all the machines (before doing anything else) and then use the following PowerShell command to configure the farm to use the new certificate, on one of the WAC machines. Note that you need to restart all the servers in the farm once you have changed the certificate.
Set-OfficeWebAppsFarm -CertificateName "wac.contoso.com"
Once the machines are restarted you can go back to the logs and see if it now reports any issues and if everything is fine your machines should soon (it takes a couple of minutes) start reporting that they are Healthy.
After this you also need to update the WOPI Proof Key of your SharePoint farm(s), to avoid security errors when using Office Web Apps in SharePoint. This can be done either by removing all WOPI bindings and re-add them or by running the following command:
WCF .svc endpoints not working on IIS8
The second thing that I’ve seen is that (once you fix the certificate per above) the logs contains error messages that says it cannot access the Participant.svc file: BroadcastServicesWatchdog_Wfe reported status for BroadcastServices_Host in category '3'. Reported status: Contacting Participant.svc failed with an exception: The remote server returned an error: (404) Not Found. There are other similar error messages. This is a bit strange, if you look in the IIS for that specific for this file it is clearly there but if you try to browse to it you get a 404. Here’s a good way to test if you’re suffering from this issue without reading any logs:
Tip: this command and/or URL is a very good way to use in your monitoring software or load balancer to check if the WAC farm/machine is up and running.
If you receive a HTTP status equal to 200 then you’re golden and can quit reading this section but if you get an exception and specifically a 404 you’re suffering from this exact issue.
When you install/configure IIS8 on Windows Server 2012 using instructions from the internet or TechNet and trying to do a “minimal” installation with as little Server features installed as possible you are most likely not installing the HTTP Activation for .NET Framework 4.5 WCF Services.
In order for IIS to understand .svc files, and not use the Static file handler, you need to install this Feature on your WAC machines. It can be done using the Add Roles and Features Wizard (as shown above) or using the following PowerShell command. Note that it will at the same time install two other dependent features.
If you’re installing a WAC machine from scratch your PowerShell command should look like follows (on Windows Server 2012):
Import-Module ServerManager # Required Features Add-WindowsFeature NET-Framework-45-Core,NET-Framework-45-ASPNET,` Web-Mgmt-Console,Web-Common-Http,Web-Default-Doc,Web-Static-Content,` Web-Filtering,Web-Windows-Auth,Web-Net-Ext45,Web-Asp-Net45,Web-ISAPI-Ext,` Web-ISAPI-Filter,Web-Includes,InkAndHandwritingServices, NET-WCF-HTTP-Activation45 # Recommended Features Add-WindowsFeature Web-Stat-Compression,Web-Dyn-Compression
Once the feature is installed, there is no need for any server restart, you should see that your machines are reported as Healthy. Remember it can take up to 10 minutes. You can sometimes speed up the health reporting a bit by restarting the WAC Service:
If they are no reported as Healthy then you have another issue. Check the logs again, fix it, report to me what you did and why and I’ll update this post.
Another note, you might not see the same status on all WAC machines when you inspect the health status for each machine. WAC machines are not constantly talking to each other, it may take a while before the synchronize. You can always log in to each and every WAC machine and run the following cmdlet to get the actual value of that specific machine:
Repairing an Office Web Apps Farm
There is a PowerShell cmdlet in Office Web Apps 2013 that is called
Repair-OfficeWebAppsFarm. This command is sounding way better than it actually is. It will not repair anything, it will remove all Unhealthy servers, and nothing more. So unless you have fixed the issues mentioned above, you will have no WAC machines after running that cmdlet. Just a tip…
Having an Healthy Office Web Apps Server 2013 farm is important. You should constantly monitor the Health status of each and every Office Web Apps Server machine, and if found Unhealthy fix it. Most likely you don’t have a valid certificate for your Office Web Apps Server 2013, one that contains the load balanced name and all the individual server names, and thus you will always have an Unhealthy farm. Get a new certificate for it, so that you can properly monitor the farm.