HOWTO Create a Content Index for a Host Header Site Collection

We have an environment with several web applications that have a number of host header site collections attached to them. This reduces the resources required by your server (you have fewer physical IIS web applications) while still allowing you to have 100s/1000s/lots of site collections each with their own URL.

We wanted to be able to search these site collections, but host header site collections cannot be searched on their own. Your content indexer has to access the site collections via the web application itself. As an example, lets assume you have a web application that resolves to “my.sharepoint.local” and you create a host-header site collection “site.sharepoint.local” using the following powershell script:

$formsContentDbName = “WSS_Content_SiteLocal”
$webName = “my.sharepoint.local”
$url = “http://site.sharepoint.local”

# Get the web application
$webApp = Get-SPWebApplication –Identity $webName

# Create content database
New-SPContentDatabase –Name $formsContentDbName -WebApplication $webApp

# Create the site
$contentDb = Get-SPContentDatabase –Identity $formsContentDbName
New-SPSite –Url $url –OwnerAlias “myalias” –OwnerEmail “my.email@somewhere.com” –ContentDatabase $contentDb –HostHeaderWebApplication $webApp –Template “STS#0”

You will now be able to access the host-header site collection via http://site.sharepoint.local.

If you create a content source however using the url “http://site.sharepoint.local” and run a full crawl, you’ll get the following warning in your crawl log if you don’t have a site collection attached to http://my.sharepoint.local:

This URL is part of a host header SharePoint deployment and the search application is not configured to crawl individual host header sites. This will be crawled as a part of the host header Web application if configured as a start address

As the warning states, the trick here is to create a site collection for the web application, in this case my.sharepoint.local. Then in the content source add only the url http://my.sharepoint.local. Your content source will then happily index the in your host header site collection, as well as the base site collection.

This however has some implications. Lets assume you have a large site collection (say 100GB) and a smaller site collection (say 10MB). They are both attached to a web application. You cannot do a full reindex of the smaller site collection without also reindexing the larger site collection.

I’ve raised a support incident with Microsoft to try and find a resolution. Stay tuned…

Advertisements

2 Responses to “HOWTO Create a Content Index for a Host Header Site Collection”

  1. Arunachalam Says:

    You can go to the SharePoint 2010 crawl logs and mark the item as re-index in the next crawl. This should automatically reindex the item in next incremental crawl for you. Though there is no programmatic way of doing this but that is still possible

    • gavinmckay Says:

      That’s true, you could do that for individual items – but I would hate to have millions of items to have to mark individually for reindexing ! 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: