For a couple of weeks (ahem, months) I’ve been struggling with a strange Search Service Application issue. Some time back I went to check out on some Crawled Properties when making a tool to help copying settings between SSA’s (more on this tool in another post). Then I noticed that there were tons of Crawled Properties with just garbled binary data(!) as the property name.
I searched like crazy for a while to find where these came from, there was nothing in the logs of any kind related to this. I could not locate any documents related to the Crawled Properties. I could not delete them, somehow they are connected to some content (at least that’s what the system says), but there were no document samples. I created a new SSA and crawled the same corpus and the same (almost) corrupted junk crawled properties appeared.
A couple of days back I finally found out where they came from. With the grace from Microsoft support I did some queries on the SSA databases and found another crawled property that was inserted at the same time as these junk properties. This property had document samples! And those samples were e-mail .msg files. And specifically it was e-mails with encrypted content! I copied these files onto a brand new farm without any content and was able to reproduce the corrupted properties.
How to reproduce the corrupted Crawled Properties
To verify that this had to do with encrypted e-mail messages, and just not the ones I found; I encrypted an e-mail, sent it to myself and exported it as an .msg file. I took this .msg file and added it to a document library in a hot new VM (yea, I only got new ones since I had to rebuild all my VM’s last weekend due to a corruption issue on one of my base images). Then I fired of a full crawl, with full logging enabled, and watched the Crawled Properties of the SSA. And as expected they showed up after just a minute.
So, be careful about having encrypted e-mail messages in your farm! Or prevent the issue…
How to prevent the issue
So, how do I get rid of these corrupted properties? Unfortunately there is no good (supported) way, at the time of this writing, except deleting your SSA and create a new one and before crawling the data remove the files or create a Crawl Rule.
If you already have these corrupted properties or if you want to prevent new corrupted properties you can create a Crawl Rule that excludes .msg files. That will help the situation - but you will not be able to search the .msg files (if they are encrypted you cannot search them anyways!).
These corrupted properties does not do any harm. You cannot use them and you don’t notice anything else on the SSA except that they are there and annoys you. Only thing I can think of is that if you have a lots of them, you can run into trouble. There is a limit of 500.000 crawled properties per SSA! Sounds a lot, but for two .msg files I saw about 1.600 corrupted properties!
I hope this helps someone and I hope there will be a fix for this in the future - if that happens I’ll update the post.