Office Web Apps Server 2013 - machines are always reported as Unhealthy

Tags: Office Web Apps

As you might have noticed I have somewhat fallen in love with Office Web Apps 2013, or WAC as we say now that we’ve gotten this close to each other. It’s an amazingly well written server product with the good side benefit that it is also very usable for the end-users. Even though me and WAC has been hanging around for a while and by now know each other pretty well, WAC has constantly been reporting that it is Unhealthy. And from what I’ve seen, heard and experienced in the field I am not alone…

Office Web Apps 2013 Health Reporting

Office Web Apps 2013 has a really well written reporting mechanism that constantly monitors and reports issues with your WAC farm and its machines. You can at any time see the Health status of your machines by running the following Windows PowerShell command:

(Get-OfficeWebAppsFarm).Machines

If you have installed and configured the farm according to the interwebs and/or official sources it is most likely that all your servers are reporting Unhealthy even though your WAC farm is running fine, from the end-user perspective.

Unhealthy WAC machines

The health reporting mechanism in Office Web Apps consists of a number of Watchdog processes (FarmStateManagerWatchdog.exe, ImagingWatchdog.exe etc). These processes regularly check for issues with the different WAC services. For instance it makes sure all the servers and service endpoints are responding, it checks if the proofing tools works by actually testing the service with a correctly spelled word and an incorrectly spelled word and so on. If any of these watchdog process reports an error the machine is marked as Unhealthy. (Note: It will check all the reports from the last 10 minutes, so it is a slight delay in this status).

You can see the Health reports in either the log files (SharePoint ULS Trace Log style) in your log directory on the WAC machines, but you might find it easier (and faster) to look in the Event Viewer of the machine. You will find the logs under Applications and Service Logs > Microsoft Office Web Apps.

WAC Event Log entries

In this log you will clearly see all the errors and be able to solve them, well at least most of them can be fixed by reading (and understanding) the log entry.

Two common reasons for Unhealthy machines

As I said, most WAC farms I’ve seen are reporting all machines as Unhealthy and I’ve found two major reasons for this. One is related to certificates and the second is related to Windows Server 2012/IIS8 and WCF.

Correct WAC Farm certificate

If you have been following the SharePoint 2013 space recently you should be pretty aware of that you should protect your server communication with SSL. Especially Office Web Apps 2013, which sends the AuthN token in clear text over the wire. So you create a certificate for your WAC farm load balanced DNS address, wac.contoso.com for instance, install it on the WAC machines, set up the farm and everything looks fine. But all the machines are reporting that they are Unhealthy. If you take a closer look at the WAC logs you will see exceptions like this: The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.

SSL errors

It’s not that strange actually. In order to test the endpoints of the different machines the watchdog processes cannot call the load balanced DNS entry (wac.contoso.com) but instead has to call the individual machines, for instance wacserver1.contoso.com – and the certificate is not valid for that DNS entry.

In order to resolve this issue for the farm you have to create a new SAN certificamyte (or optionally a wildcard) containing the name of all the machines and the load balanced DNS entry.

New SAN certificate

If you’re creating a new farm just proceed as normal with a certificate like this. If you have an already existing farm and need to update the certificate you just install the certificate on all the machines (before doing anything else) and then use the following PowerShell command to configure the farm to use the new certificate, on one of the WAC machines. Note that you need to restart all the servers in the farm once you have changed the certificate.

Set-OfficeWebAppsFarm -CertificateName "wac.contoso.com"

Once the machines are restarted you can go back to the logs and see if it now reports any issues and if everything is fine your machines should soon (it takes a couple of minutes) start reporting that they are Healthy.

After this you also need to update the WOPI Proof Key of your SharePoint farm(s), to avoid security errors when using Office Web Apps in SharePoint. This can be done either by removing all WOPI bindings and re-add them or by running the following command:

Update-SPWOPIProofKey

WCF .svc endpoints not working on IIS8

The second thing that I’ve seen is that (once you fix the certificate per above) the logs contains error messages that says it cannot access the Participant.svc file: BroadcastServicesWatchdog_Wfe reported status for BroadcastServices_Host in category '3'. Reported status: Contacting Participant.svc failed with an exception: The remote server returned an error: (404) Not Found. There are other similar error messages. This is a bit strange, if you look in the IIS for that specific for this file it is clearly there but if you try to browse to it you get a 404. Here’s a good way to test if you’re suffering from this issue without reading any logs:

Invoke-WebRequest https://wacserver1.contoso.com/m/met/participant.svc/jsonAnonymous/BroadcastPing
Tip: this command and/or URL is a very good way to use in your monitoring software or load balancer to check if the WAC farm/machine is up and running.

If you receive a HTTP status equal to 200 then you’re golden and can quit reading this section but if you get an exception and specifically a 404 you’re suffering from this exact issue.

When you install/configure IIS8 on Windows Server 2012 using instructions from the internet or TechNet and trying to do a “minimal” installation with as little Server features installed as possible you are most likely not installing the HTTP Activation for .NET Framework 4.5 WCF Services.

Forgot the .NET WCF HTTP Activation?

In order for IIS to understand .svc files, and not use the Static file handler, you need to install this Feature on your WAC machines. It can be done using the Add Roles and Features Wizard (as shown above) or using the following PowerShell command. Note that it will at the same time install two other dependent features.

Add-WindowsFeature NET-WCF-HTTP-Activation45

If you’re installing a WAC machine from scratch your PowerShell command should look like follows (on Windows Server 2012):

Import-Module ServerManager

# Required Features
Add-WindowsFeature NET-Framework-45-Core,NET-Framework-45-ASPNET,`
    Web-Mgmt-Console,Web-Common-Http,Web-Default-Doc,Web-Static-Content,`
    Web-Filtering,Web-Windows-Auth,Web-Net-Ext45,Web-Asp-Net45,Web-ISAPI-Ext,`
    Web-ISAPI-Filter,Web-Includes,InkAndHandwritingServices, NET-WCF-HTTP-Activation45

# Recommended Features
Add-WindowsFeature Web-Stat-Compression,Web-Dyn-Compression

Once the feature is installed, there is no need for any server restart, you should see that your machines are reported as Healthy. Remember it can take up to 10 minutes. You can sometimes speed up the health reporting a bit by restarting the WAC Service:

Restart-Service WACSM

If they are no reported as Healthy then you have another issue. Check the logs again, fix it, report to me what you did and why and I’ll update this post.

Healthy WAC machines

Another note, you might not see the same status on all WAC machines when you inspect the health status for each machine. WAC machines are not constantly talking to each other, it may take a while before the synchronize. You can always log in to each and every WAC machine and run the following cmdlet to get the actual value of that specific machine:

Get-OfficeWebAppsMachine

Repairing an Office Web Apps Farm

There is a PowerShell cmdlet in Office Web Apps 2013 that is called Repair-OfficeWebAppsFarm. This command is sounding way better than it actually is. It will not repair anything, it will remove all Unhealthy servers, and nothing more. So unless you have fixed the issues mentioned above, you will have no WAC machines after running that cmdlet. Just a tip…

Summary

Having an Healthy Office Web Apps Server 2013 farm is important. You should constantly monitor the Health status of each and every Office Web Apps Server machine, and if found Unhealthy fix it. Most likely you don’t have a valid certificate for your Office Web Apps Server 2013, one that contains the load balanced name and all the individual server names, and thus you will always have an Unhealthy farm. Get a new certificate for it, so that you can properly monitor the farm.

25 Comments

  • James said

    Great post! I ran in to the same WCF issue after two different installs and this post would have saved me a lot of troubleshooting time.

  • Gnett said

    Excellent post. This resolved my issue - I had more or less ignored it since the OWA worked without issue (except for one strange error I received once - I will post at the end). In my case I installed HTTP activation for .NET 4.5 as in your post and restarted WACSM. The machine reported as healthy after a while. I followed the technet article to install the IIS components via PowerShell, so I guess this was missing?
    http://technet.microsoft.com/en-us/library/jj219455.aspx

    Add-WindowsFeature Web-Server,Web-WebServer,Web-Common-Http,Web-Static-Content,Web-App-Dev,Web-Asp-Net,Web-Net-Ext,Web-ISAPI-Ext,Web-ISAPI-Filter,Web-Includes,Web-Security,Web-Windows-Auth,Web-Filtering,Web-Stat-Compression,Web-Dyn-Compression,Web-Mgmt-Console,Ink-Handwriting,IH-Ink-Support

    Oh, the strange error message I received was when attempting to open an Excel workbook - I wish I could post the image :-)
    "Sorry, we're having a problem showing this workbook".
    Details - "We don't know exactly what happened, we just know that something went wrong".

    Going to get a T-Shirt printed with this message.

  • Wictor said

    @Gnett, glad you got this working. Could you send me a copy of the WAC ULS logs and Event Viewer logs to wictor at wictorwilen dot se and I'll see if I can spot any Excel issues.

  • Gnett said

    I can if you still want, but I forgot to mention that the issue happened exactly once. When I immediately tried again, I was able to access the workbook without incident.

  • Pat Richard said

    I've run up against this issue, and looks like it's the cert issue. So, the question is - what about FQDNs that are not publicly routable names, like wacserver.contoso.local?

  • Trevor said

    This is a great article, but when Pat Richard asked about FQDNs like wacserver.contoso.lcoal, you responded to just have the CA add the names. That is no longer possible. CAs will not issue local names. I can't remember exact year (2014, 2015) that it takes affect, but when you are like me and by certs for 3, 4, or 5 years or more it is already impossible.

    What this means is Microsoft has yet another server where they break their own logic (I.e. Lync 2013 is designed around the concept of Internal and External FQDNs and Lync uses Internal FQDN names on the External interface). Thus, Lync cannot be used with a public certificate fpor its front end server and requires a reverse proxy to work (good practice, but bad design).

    Okay, I guess the solution is to use reverse proxy server with public cert and use a local certificate (can use any names I want) for the WAC server.

  • Michael said

    Active Directory domains should never, ever, ever use an "invalid" domain like .local

    It ends up causing problems with a variety of things, from Exchange, Lync, SharePoint, Office 365, Mac OSX computers, etc. etc.

    In my opinion the best practice is for your Active Directory domain to match your public domain. Alternatives would be a subdomain of your public domain, or an alternative public domain that you don't use for your primary website but you still own.

  • Hilton Giesenow said

    Hey Wictor,

    Thanks for this and all your other great posts! I'm running into the cert issue with a client, but the machines are on a separate internal domain to their actual websites, which have a public CA wildcard cert. Any suggestions for how to get around this? Is it possible to have the watchdog query on http instead of https?

    Thanks,

    Hilton

  • David said

    Wictor....Thanks for these informative articles.

    I have run into this issue of the server being reported as 'unhealthy'. My test server is not currently using HTTPS...only HTTP....so I don't believe it to be a certificate problem.

    I configured my server using the TechNet article for install.....so based on your findings, I have also installed the NET-WCF HTTP Activation feature. However, my errors still remain.

    The events in the WAC logs all seem to be complaining of 'Spelling attempt exceptions'....

    Any ideas?

  • David said

    I found the answer to my problem.....apparently you can only install the WAC server to the system drive. I discovered this information after opening a case with Microsoft.

    If you install to any other drive, the installation will run without error, however you will receive many WatchDog errors in the event log and Word/PowerPoint docs will fail to render.

  • Pat Richard said

    Michael - Exchange, Lync, SharePoint, and Office 365 connectivity all work perfectly fine in AD domains with non-routable names like .local. We've deployed them in dozens, if not hundreds of environments.

    However, coming changes in public CAs, where non-routable names will no longer be allowed, does require some planning.

  • rookieSP said

    stupid question... this past month I am a self taught sharepoint rookie. I loaded up my first sharepoint site and launched a public site for testing.

    but I am lost when you mentioned the following below. I have a wild card ssl from my domain provider setup in IIS but lost on where to setu a DNS set for all servers on my network. can I also export/import my webapp server cert to other computers as well? how do I add this dns entry to other servers?

    In order to resolve this issue for the farm you have to create a new SAN certificamyte (or optionally a wildcard) containing the name of all the machines and the load balanced DNS entry."

  • rookieSP said

    stupid question... this past month I am a self taught sharepoint rookie. I loaded up my first sharepoint site and launched a public site for testing.

    but I am lost when you mentioned the following below. I have a wild card ssl from my domain provider setup in IIS but lost on where to setu a DNS set for all servers on my network. can I also export/import my webapp server cert to other computers as well? how do I add this dns entry to other servers?

    In order to resolve this issue for the farm you have to create a new SAN certificamyte (or optionally a wildcard) containing the name of all the machines and the load balanced DNS entry."

  • TheQuestionMan said

    regarding SAN and SSL, can a wildcard cert provided by web hosting site be enough. or must it be a UCC domain with SAN specified on the SSL?

  • Jay said

    Hi Wictor,
    Great post, I wasn't getting the trsut issues but was getting the 404 for the participant.svc - I have installed the windows feature so hopefully that will address that issue.
    But I have another error and I'm not sure if it is related to anything else.
    Events 1064 and 2064: ImagingWatchdog reported status for Imaging in category 'PositiveDirect'. Reported status: Image validation returned an unexpected result on a valid image

    Any ideas?

  • Toby said

    Hi Wictor,

    Great post - Thanks. However, I'm currently not sharing your love for the WAC server!

    Despite setting things up as per TechNet and your post I'm still seeing event log errors 1009 and 1150. The problem appears to be that the WatchDog is trying to connect on port 80 but the farm is configured with AllowHTTP: False so IIS only has a binding on 443. The event log text clearly states "No connection could be made because the target machine actively refused it 1.2.3.4:80". I guess I could enable HTTP to see what happens but wonder if you have any ideas first. For info this is a single server farm deployed for Lync Meeting Powerpoint presentations which is working fine. The certificate currently has the internal FQDN as subject name with the internal FQDN, internal server name and external FQDN as SANs.

    Thanks,

  • Neelam Rajesh Kanna said

    Hi Wictor,

    Great post. I have a 2 server office apps farm load balanced.

    OfficeApps.neelam.com.sg (Load Balancer URL)
    svrofficeapps01.com.sg (Server 01)
    svrofficeapps02.com.sg (Server 02)

    Everything works fine when both servers are up, But when I bring down the master server (svrOfficeapps01.com.sg) I get page cannot be displayed.

    Did I miss anything walter ?

  • Jonathan said

    Good article but I'm still left with the problem of the SSl/TLS trust relationship. Your solution says to include the names of the machines in the SAN of the certificate. But with public certificates, such as from GoDaddy, you can no longer include private hostnames or domain names, only public names.

    And in the Event Log on my WAC server, it specifically references the private FQDN hostname of the WAC server in the Web Apps errors.

    So this would seem to be an impasse to me. You can't include private-only names in public certificates, but you can't use the certificate without the private name in it.

  • Sachin said

    I have single OWA server with certificate installed with server name. I get the exact same error you described as the second reason. But while running Invoke-WebRequest https://wacserver1.contoso.com/m/met/participant.svc/jsonAnonymous/BroadcastPing
    i get 200 OK not 404. I still added the http feature and restarted the server but my server health status is still unhealthy.

Comments have been disabled for this content.

About Wictor...

Wictor Wilén is the Nordic Digital Workplace Lead working at Avanade. Wictor has achieved the Microsoft Certified Architect (MCA) - SharePoint 2010, Microsoft Certified Solutions Master (MCSM) - SharePoint  and Microsoft Certified Master (MCM) - SharePoint 2010 certifications. He has also been awarded Microsoft Most Valuable Professional (MVP) for seven consecutive years.

And a word from our sponsors...