How to patch the Distributed Cache in SharePoint 2013

Tags: SharePoint 2013, AppFabric

Introduction

In SharePoint 2013 the Distributed Cache plays a very important role, it is a key component for performance and caching. An incorrectly configured or managed Distributed Cache will cause issues, with your farm. I’ve even seen blogs recommending turning it off, most likely due to that they don’t manage the cache properly and get into a situation where it causes even worse performance problems. One of the good things with the Distributed Cache is that is not a SharePoint service, it is a standalone service called AppFabric 1.1 for Windows Server. Initially the guidance from Microsoft was that the Distributed Cache (DC) would be patched together with SharePoint and we should keep our hands off it. But that has over the time changed and allowed us to take advantage of the fixes and improvements that the AppFabric team does to the service. So, it is our responsibility to patch the Distributed Cache. But how do we do it?

[Update 2014-07-16] Here's a link to the official statement that AppFabric/DC is independently updated from SharePoint: KB2843251

Proper patch instruction

First of all there are currently five released Cumulative Updates (CU) for AppFabric 1.1, CU1 to CU5. An AppFabric Cumulative Update is exactly the same as a CU for SharePoint, it contains all previous CUs. So if you install CU5 you also get CU1 to CU4 installed. The important thing with this is that you as an administrator should not just read the latest CU Knowledge base article, but also the previous ones (you will see one reason in a minute).

Let’s assume that we have a SharePoint farm with more than one servers running the Distributed Cache service instance. To patch these we need to download the AppFabric CU that we would like to use. I’d recommend using the latest (CU5) right now and I’ve not yet seen any issues with it, only positive effects. If you are using Windows Server 2012 you definitely should go with at least CU4.

Here’s a link to the different CU’s and their KB article respectively

In this case let’s apply CU5.

The first thing before patching that is really important to do is to properly shut down the Distributed Service instance. The reason that you would like to do this is that some items in the Distributed Cache is only living in the cache and is not backed to a persistent storage, such as some items in the SharePoint Newsfeed. If you do not properly shut down the service instance you will loose that data. Fortunately we have a great PowerShell cmdlet that does this job for us. The process here is that you need to patch one machine at a time according to these steps:

  1. Shut down the service instance on one machine
  2. Patch AppFabric 1.1
  3. Post-patch operations
  4. Start the service instance
  5. Restart from 1 on the next machine

Do not do servers in parallel unless you have a massive amount of servers and can handle that extra load/redundancy!

Step (1) is done in PowerShell using one of the built-in SharePoint cmdlets:

asnp *sharepoint*
Stop-SPDistributedCacheServiceInstance -Graceful

This command will gracefully shut down the service instance on the local machine. A graceful shutdown means that all the cache items will be distributed to the other service instances in the cache cluster. Make sure that you have enough overhead to do this. This is yet another example of my “3 is the new 2” rule is important. When you patch one server you don’t want just one extra machine! Once the service instance is stopped, I normally wait a couple of extra minutes to make sure that all the cached items has properly propagated to the other servers.

Then it is time to apply the actual AppFabric patch, step (2). Run the patch executable and follow the instructions. It’s basically a next, next, finish procedure.

Step (3). When the patch is applied you should have read through the KB articles, and if you are applying CU3 or later you should have seen that in order to improve performance you need to modify the Distributed Cache configuration file. The CU3 KB article mentions a new feature added to the AppFabric service that takes advantage of the non-blocking background garbage collection feature in .NET 4.5, which is important for machines with large amounts of RAM – and that is exactly the description of a SharePoint server. So modify the DistributedCacheService.exe.config file as below to enable the background garbage collection:

<configuration>
  ...
  <appSettings>
    <add key="backgroundGC" value="true"/>
  </appSettings>
</configuration>

The final thing we need to do on the machine is to start the service instance again, step (4). The AppFabric Windows Service will be disabled when it is shut down and you should NOT try to start that one manually, you must use the following PowerShell (or corresponding action in Central Administration if you’re a n00b).

$instance = Get-SPServiceInstance | ? {$_.TypeName -eq "Distributed Cache" -and $_.Server.Name -eq $env:computername}
$instance.Provision()

This PowerShell snippet will get the Distributed Cache service instance on the machine where it is executed and provision it and start the AppFabric Windows Service.

Once this is done and you are ready to move on to the next machine, step (5), give it a couple of extra minutes so that the newly patched (and empty) cache service instance has time to catch up.

Not patching properly…

A very common issue is that in order to apply the patch you just run the patch executable. Two things will happen when you do this. First of all the service will be shut down, but not gracefully and you will loose data. Secondly it will not start the service instance properly or at all. The patch contains a script that waits for the service to come online, but since this is not a normal AppFabric cache, it’s controlled by SharePoint, so this script will wait forever until the service comes up. If this happens all you have to do is kill the script window and start the service properly as shown above (warning: this is not a recommendation and there might be side effects)

Summary

I hope this short post cleared up some confusion on how to patch the Distributed Cache service in SharePoint 2013 and gave you an example on how to do this in a production environment without loosing any data. Of course, you should always test the procedure in a test/staging environment. Cache on!

10 Comments

  • Steven Summone said Reply

    Hi Wictor, thanks for sharing this info. I have installed the CU 5 update and I have tried to add in the new appsettings but I get access denied message when trying to edit the file. I checked and the AppFabric Caching Service is in a stopped state. Any suggestions?

  • Marilag said Reply

    Hi Wictor, thanks for a very very informative article as always. I have a related issue with a high trust provider hosted app in prod where the user reported not being able to see data from SP intermittently. Trace logs show random access denied issue when executing CSOM, sometimes it works, sometimes it doesnt. From ULS, I found a lot of the timeout issue with the Distributed Logon Token Cache. Customer upgraded to CU5 and with help from MS support adjusted the Distributed Cache and STS Settings but the issue persisted. I also noticed that the random access denied is only happening to users that has site permission via AD group but not to users that are directly added to SP group. Do you think this could all be related or am I shooting at different directions. Any idea is much appreciated.

    • Philippe said Reply

      Hi Wictor, While deploying a new server, we directly installed AppFabric with CU5, but the SharePoint setup is requesting CU1 as a prerequisite. As a workaround, we reinstalled CU1 then CU5, but it is not logical, as it is supposed to be cumulative. Would you have another idea ?

  • Ivan Yankulov said Reply

    Hi Wictor,

    Thanks for the great article!

    I had an issue when doing the update as described. The Service did not wanted to start I had an events for crash.
    Is not it better at Step(2) to stop the service and remove the host from the cluster like this:

    Stop-SPDistributedCacheServiceInstance -Graceful
    Remove-SPDistributedCacheServiceInstance

    This will stop the service and remove the instance completely from the CA UI.

    Then for Step(4), just do:

    Add-SPDistributedCacheServiceInstance

    This will add the service instance will start it properly and will join to cache cluster.

    This worked out perfectly for me.

  • Rich said Reply

    Hi Wictor,

    Thanks for the write up. I too am wondering about using remove-SPDistributedCacheServiceInstance when patching a dedicated cache server. Also, would this be done on a box that was not dedicated but was still running this service?

    Thank you!

  • Tony Di Leonardo said Reply

    Hi Victor,

    Excellent presentation, we recently had issues on our farm with distributed cache. While working with Microsoft Engineers we put together this information on distributed cache. http://sharepoint-community.net/profiles/blogs/distributed-cache-repairing-it-with-powershell

    Thanks again!

Comments have been disabled for this content.

About Wictor...

Wictor Wilén is the Nordic Digital Workplace Lead working at Avanade. Wictor has achieved the Microsoft Certified Architect (MCA) - SharePoint 2010, Microsoft Certified Solutions Master (MCSM) - SharePoint  and Microsoft Certified Master (MCM) - SharePoint 2010 certifications. He has also been awarded Microsoft Most Valuable Professional (MVP) for seven consecutive years.

And a word from our sponsors...