An intense Twitter conversation initiated by Fabian about how Managed Metadata is updated in SharePoint 2010 gave me the idea to note down a few interesting bits about the Taxonomy Fields and how they work within a Site Collection. I hope/guess that Fabian will write a good post (as usual) about his findings as well.
The possibility to tag documents in SharePoint is one of my favorite features and one of the reasons that I think you should move to SharePoint 2010 as soon as possible. As every new function added to the huge SharePoint spectrum I have an urge to dive deep into these new additions to really know how to use them fully. I've spent some time with the Managed Metadata Service Application and the taxonomy fields used with it. I can't say it has been a smooth ride all the way - but digging into the actual bits and understanding how it all works made it a whole lot easier. And you know what - why keep everything a secret!
Taxonomy Fields are Lookup columns!
Yes, you heard it right! It is a smart and clever implementation from the SharePoint Team (some say the opposite though). In order to get performance from the taxonomy and managed metadata fields all used keywords and terms within that Site Collection is stored in a hidden list (in the top-level root site).
If you've read my previous posts about how to create Taxonomy Site Columns you've probably seen that we use two different fields; one of the type TaxonomyFieldType and one hidden using the type Note. And when defining a field of the type TaxonomyFieldType we need to specify a reference to a list called TaxonomyHiddenList.
This hidden list contains all used keywords and terms for the Site Collection and SharePoint uses this list for fast retrieval of the labels of the keyword and terms. You can find the list either by just browsing to /Lists/TaxonomyHiddenList, use SharePoint Designer to get the list id from the Site Properties (see image to the right) or use SharePoint Manager 2010.
If we take a look at the columns in that list we will see that it contains a number of interesting things. The first interesting columns are the IdFor* (1-3). IdForTermStore (1) is the Guid of the term store used to store this term and IdForTerm (2) is the Guid for the term. Keywords doesn't belong to a term set so the IdForTermSet (3) is an empty Guid, while managed metadata terms have a Guid corresponding to the term set.
Also worth noticing here is that the hidden list also contains the localized labels. I have the French language pack installed, so I can see both the English (4) term and path as well as the French one (5).
SharePoint uses this list as a lookup column so that it does not have to query the Managed Metadata Service all the time, but instead just looks it up in the local Site Collection.
You need proof?
Ok, let's make a really sample scenario. I have one document with Enterprise Keywords and a Managed Metadata column like this:
Let's change the Term1033 column in the hidden taxonomy list to all uppercase letters and save the list item.
When the list is reloaded you will immediately see that the column value has changed to the value we updated in the hidden list:
What happens if I delete one of the items in the hidden list?
As you probably guessed, it was removed from the file. Let's look at the document from another point of view - the edit properties view!
What! As you can see: the term for which I changed the label is back to it's normal state, but the deleted one is still missing (and it is permanently). What's really happening here is that when in edit mode the taxonomy fields queries the Managed Metadata service directly - it does not use the local hidden list.
So, how do I get it back to where it were? The short answer is you can't. But by default every hour a timer job is executed. It is called Taxonomy Update Scheduler, and it's job is to push down the term store changes to the hidden lists (very much like the sync between the site collection user list and the UPA). Unfortunately it only pushes down changed items, so no luck here. Instead you actually need to go change it in the Term Store Management tool before running the timer job.
Warning: Under normal conditions you should never ever fiddle with the items in this hidden list. I'm just doing you this to show some stuff some of you never seen or even thought about.
What about the Note field then?
Let's take a look on what is stored in these two taxonomy fields. The TaxonomyField which is the lookup looks quite similar to a lookup column. It has the lookup id and the value:
The Note field (the hidden field) on the other hand contains just the term identifier and is the actual field used to store the connection to the term store. In case you copy, move or uploads a document to another Site Collection - then it will update the TaxonomyField with the correct lookup values.
I've seen the TaxCatchAll field, what's that!?
If you use managed metadata there is a hidden column on all your list items or documents called TaxCatchAll. This field contains all ID's in the hidden lookup list of all used terms and keywords for the list item object and is used by SharePoint when adding and updating items.
Who manages the hidden list?
Good question, it isn't you! The hidden list is automagically managed by two internal event receivers (can be found in the Microsoft.SharePoint.Taxonomy assembly). These event receivers are responsible for adding items to the hidden list and clean up old and unused ones. There's also a feature stapled, called TaxonomyFieldAdded, on the site definitions which is responsible for creating the hidden list as well as adding the item receivers.
This was it - a quick introduction on how to very clever use the SharePoint feature-set to make the tagging functionality available in SharePoint. And essentially it is just a plain ol' lookup columns with some event receivers that does all the magic for us!