Here’s one of these real life stories that caused some headache for quite some time but was in the end very easy to resolve. I’ll write it down and hopefully some of the search engines pick it up and help some other poor soul out there.

Background

We have a solution that uses publishing pages to manage news articles and information pages in SharePoint Online. These articles and pages have a custom page layout with a custom content type, so they look decent and have proper metadata. They are all deployed using the PnP PowerShell cmdlets.

Creation of pages worked flawless and the rendered nicely. They page layouts use jQuery, some Office Fabric components (including those nasty jQuery scripts for that).

So far so good!

The Problem

We wanted to roll up this News articles through a custom Web Part based on a search criteria, and also show some of that metadata, such as News category, rollup image and more. But the pages was not indexed. First thought was that well, indexing in SharePoint Online usually takes everything from a couple of minutes to a couple of days, so just wait. Then summer vacation came - I’m not complaining - and when i got back we had index. Ok, good. But then I started adding some custom managed properties to the rollup feature and was waiting for it to pick up the changes. I clicked re-index the libraries, I clicked re-index the site, added new pages and articles, I might even sacrificed the cutest little kitten you can imagine - but nothing. The pages and articles was just not picked up by the crawler. So I started to smell something fishy.

Is there something wrong with my SPO tenant? No, same issue across multiple tenants!
Can it be the page layout? No, it renders perfectly in all browsers!
Can it be the PnP PowerShell cmdlets? No, the same issue if I uploaded page layouts manually!

Old man yells at cloud

Troubleshooting

First of all I added the CrawlTime managed property to see when stuff was actually crawled and that there was nothing wrong with the crawler. I could see how this managed property was populated on other stuff than the pages. So there is obviously something wrong with our pages and not SPO.

Time to get my troubleshooting gloves on and I brought out my checklist for this. Since I can’t check trace logs (ULS) in SharePoint Online all I could do was to inspect the crawl logs. And now I could see in plan text that my news articles could not be crawled:

"6074070",
"https://contoso.sharepoint.com/news/Pages/xyz.aspx",
755",
"Item Error",
"The SharePoint item being crawled returned an error when attempting to download the item.
  ( SearchID = F2A4A5E4-AAAA-AAAA-B034-10C593EF6CCE )",
"8/4/2016 7:07 AM",
"https://contoso.sharepoint.com/",
"Intranet",

So, the crawler can obviously not download my page! I also noticed the following cryptic error message on the site and library levels:

"755",
"Container Error",
"The SharePoint item being crawled returned an error when attempting to download the item. 
  ( Unknown Error The surrogate pair (0xD8DA, 0x272) is invalid. 
  A high surrogate character (0xD800 - 0xDBFF) must always be paired with a low surrogate 
  character (0xDC00 - 0xDFFF).; SearchID = 00D32C87-7CD9-4350-AFF4-BBF38B8EB712 )",

Hmm, some Unicode errors?

I was out of options and I could not do an IISRESET (Rule #3) - so I called a friend (Rule #4), Doc Hodgkinson. He was almost as clueless as I but asked my to test it on a normal SharePoint 2016 on-premises installation (born in the cloud you know) and see if I could replicate the issue and also then have the option to check for more detailed crawl and trace logs.

And so I did. I ran the installer on a clean SharePoint 2016 install, started creating a page using our custom page layout and BOOM! The browser just went white!!! Nothing! It rendered fine in SharePoint Online (cloud-born innovation!)

The solution

Quickly I opened up the trace logs and reloaded the page and found this in the logs:

SharePoint Foundation	General	ahi1s	Medium	
Cannot find requested file: 'C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\Template\layouts/https://code.jquery.com/jquery-2.1.4.min.js'
SharePoint Foundation	Upgrade	aq775	Unexpected	
CannotMakeBrowserCacheSafeLayoutsUrl ArgumentException: 15/0/1033/https://code.jquery.com/jquery-2.1.4.min.js


Aha! The Page Layout contains a ScriptLink control that points to that CDN location.

<SharePoint:ScriptLink 
  id="jquery" 
  runat="server" 
  Name="https://code.jquery.com/jquery-2.1.4.min.js" 
  Language="text/javascript"></SharePoint:ScriptLink>

I quickly changed it to a normal script tag and the page rendered fine in the SharePoint 2016 on-premises instance, and I uploaded it to SharePoint Online where it also rendered fine. And…after just a few minutes all my articles was indexed and queryable and I had my Managed Properties. And all was good!

Summary

I find this bug very interesting, SharePoint Online obviously accepts full URLs in the ScriptLink property when rendered through a browser but when the crawler sees it something weird happens (also, I wonder why it gets Unicode errors on the site and library as well). Also, the code for ScriptLink is different for SharePoint Online and SharePoint On-Premises (I have NOT tested all CU’s though). Also, it’s really weird that it some time during the summer actually indexed the pages.

All in all, actually using an on-premises installation for troubleshooting my SharePoint Online issues was a great idea (big thanks as always Neil) and I’ll keep that option as a part of my normal routine from now on.