Posts Tagged ‘Stepwise’

Backing up SharePoint 2010 and Remote Blob Storage with Data Protection Manager 2012

June 10, 2012

I’ve recently finished upgrading one my SMB’s from Data Protection Manager 2010 (DPM) to DPM 2012. The upgrade went well, added a bunch of useful functionality, and gave me a chance to modify their long-term storage solution to use external USB drives with Firestreamer virtual tape library. All going well so far!

One of the other components we installed was Stepwise for Remote Blob Storage, and this also meant slightly modifying the protection groups to make sure their backups were consistent. I’ve already blogged about backup with Stepwise previously, but the fundamental principle is to ensure you backup your database first, then your filesystem(s) second. This ensures all the files exist that your SharePoint database is referencing. However with incremental backups in DPM you can relax this somewhat – see below for more information.

I created a Protection Group that consisted of my SharePoint 2010 content databases and a separate Protection Group for my files and my network file shares. The protection groups were set up to complete a full express backup every night at 10pm, with a synch frequency of 15 minutes. DPM uses block-based backup technology when it performs full express backups, which can save an enormous amount of time and effort for the systems to back up data. It does this by only backing up blocks on the disk that have changed, rather than iterating through the filesystem searching for individually changed files.

Data Protection Groups for SharePoint 2010 and Remote Blob Storage

Protecting SharePoint 2010 and Stepwise Remote Blob Storage with DPM 2012

The issue with DPM is that you can’t force an order for a protection group to run in and guarantee that it will occur. For example, even though the sych frequency is every 15 minutes, there is no guarantee that my file-based protection group and my sql server protection group will happen at *exactly* the same time. For a “perfect” backup scenario it would be better to have the backups synchronised in their proper order, but here is where I think it doesn’t really matter too much.

Scenario 1 – you need to restore your content database

Simple enough – use DPM to restore the content database. Your Remote Blob Storage files haven’t changed, so you don’t need to restore them. IMPORTANT:  all Remote Blob Storage technologies are effectively Write-Once Read-Many (WORM) devices – they *never* overwrite existing files, always create new files. Even if you save over the same document in SharePoint that does not have version control turned on, Stepwise would create a completely new file as per the Remote Blob Storage interface requirements.

Scenario 2 – you need to restore your filesystem

This should only usually occur when there has been a hardware failure or corruption, but if so you would restore your filesystem from the DPM protection group to the same, or another, location.

Scenario 3 – you need to restore both SharePoint and your filesystem

Complete site failure? Of hopefully you are testing in your pre-production environment! Regardless, you would restore both protection groups to get all your data back again.

What about restoring only one file? Or one site?

Here is where things get fun. Because Stepwise has a copy of all files and versions of files, you typically don’t need to restore your filesystem at all. As long as the SharePoint content database still has a reference to your original file, you only need to worry about the SharePoint content database. It may be as simple as restoring your content database as a different database name and then using the SharePoint Central Admin site to get your data exported and re-imported to a location of your choosing. Because the backup of your database will have a reference to the Stepwise document id, it can be retrieved with the data as part of an export process.

For a more in-depth discussion of Remote Blob Storage, SharePoint 2010 and Stepwise, see my previous post series.


Remote Blob Storage for SharePoint 2010 with Stepwise – Part 3 Disaster Recovery

May 25, 2012

This is the third post in this series on Stepwise, Backup and Restore, and Disaster Recovery.

Using Stepwise Remote Blob Storage we have learned that the documents are “externalised” from the SharePoint SQL Server content database and stored on alternate filesystems. One of the interesting side-effects of this externalisation process is the fact that documents are created by Stepwise whenever a file is added to SharePoint, but if the same document is updated in SharePoint a completely new file is created in Stepwise. This happens regardless of whether you have version control turned on in your document library or not – every write to SharePoint results in a new file being created in Stepwise.

The interesting part of this is what happens in a disaster recovery scenerio when you lost your SQL Server content database(s). Because Stepwise has a copy of every single document written to SharePoint, when restoring your data you only need to restore the SQL Server databases. The filesystem that Stepwise is using already has your previous versions of documents on it (up to a point – see Garbage Collection below). This can drastically reduce your restore time.

Furthermore, Stepwise actively collects SharePoint information and metadata as it processes documents. This information is maintained with the document and is accessible to Stepwise administrators. So here’s another nice side-effect of the process – you may have lost 4 hours of SharePoint content database, but Stepwise can determine what files were added in the last 4 hours and provide the metadata and documents for everything that occurred during that 4 hour window. Result – no lost documents!

But what happens if you lose your filesystem, and 4 hours goes up in smoke on that? Obviously that is not a good thing! That is definitely a time to get your restore process started, however again Stepwise can assist in the task. Stepwise still maintains metadata and file information in it’s own database, and it can report back all the documents that were added/changed in the 4 hour window. So while it isn’t all good news, you can at least let clients know exactly what they have lost rather than leave them in the wilderness. And some more good news? Stepwise caches recently accessed documents locally on your web front-ends – including documents that have been uploaded to SharePoint. So we can interrogate the Stepwise cache and pull documents out of there as well.

It’s not perfect – but anything that can help you recover your data after a disaster is a good thing!

What is Garbage Collection?

Remote Blob Storage uses the term Garbage Collection to describe the clean-up process of deleting documents that are no longer being referenced. As an example, consider a document that has been deleted from a document library in SharePoint. It first hits the user recycle bin, then the site collection recycle bin, then finalled the deleted from end user recycle bin. After it leaves this area, SharePoint no longer maintains any connection to the document.

It is at this stage that the Garbage Collection process in Stepwise kicks in. Stepwise uses the Remote Blob Storage API to identify any documents that SharePoint no longer references. It then checks a configurable number of days parameter (defaults to 7 days) and if the date is older than this, it will physically delete the document from the back-end storage.

In some situations, such as WORM-based configurations of Stepwise and/or for compliance reason, Garbage Collection can be disabled completely. This ensures that all documents are maintained by the system indefinitely.

Remote Blob Storage for SharePoint 2010 with Stepwise – Part 2 Backup and Restore

May 25, 2012

You can’t have a storage conversation about SharePoint 2010, Remote Blob Storage, and Stepwise without quickly getting into Backup and Restore options.

Remote Blob Storage with Stepwise externalises documents from a SharePoint content database to external storage systems – see Part 1 of this blog series for additional information. This then presents two separate sets of data that need to be backed up, the SQL Server content database(s) and the storage devices themselves (such as a file share). Backup and restore operations need to synchronise their schedule to ensure they are capturing all data.

How Much Data Can I Afford To Lose?

This is the classic backup/restore question. It is entirely possible with today’s technology to get very close to perfect data integrity – given enough budget! Enterprise storage systems such as EMC Data Domain offer extremely high performing data backup solutions. It really depends on how much you want to spend to achieve your goals.

I recently spoke with a Courier Company about what the impact would be if they lost their system for a day, how much would it cost them? They keep electronic run-sheets of their jobs, obtain signatures for completed work, and have GPS systems on hand-held devices. They could recreate their entire day – it would take time, but it could be done. The cost to them for being offline for a day would be small.

Switch over to a legal services company who charges in 10-minute increments. They have multiple sites world-wide, across time-zones, and are heavily reliant on their IT systems for both case management and time management. Being offline for a day for this company would cost them tens of thousands of dollars.

Often it is situations like these that should govern your backup and restore designs. It must be matched to the business requirements and of course what is affordable!

Backing Up SharePoint when Using Stepwise

When documents are externalised (stored outside the SQL content database) via Stepwise they are stored on a filesystem, and the storage location is maintained within the Stepwise administration database. The SharePoint database is also updated to store metadata about the document as well as tracking information about the document’s usage status (i.e. is it still active, is it in the recycle bin, has it been deleted entirely).

In order to ensure you maintain all the components of a SharePoint + Stepwise installation, you can use this procedure:

1. Snapshot the SQL Server content database(s) and the Stepwise administration database

2. Backup the databases

3. Remove the snapshots

4. Backup the file share(s)

Let’s examine this in more detail.

Snapshotting the Databases

Snapshotting the databases is a technique available in most commercial backup software solutions. Snapshotting creates a read-ony, static view of a SQL Server database and ensures the data is not being updated while the backup is being taken. SharePoint and Stepwise can still access the primary, writable databases, but the snapshot ensures nothing happens during the backup process.

Backup the Databases

The databases contain not only the metadata for the documents in SharePoint, but they also contain the physical paths to the documents on the file system(s) configured for use by Stepwise. By backing up the databases first, you ensure that the links to the documents exist and are valid at the time the backup is taken.

Remove the Snapshots

After the backup has been completed successfully, the snapshots are no longer required. This step is usually done automatically by the backup software or SQL Server backup processes and does not need to be manually completed.

Backup the Filesystem(s)

The filesystems contain the physical files that have been stored by Stepwise on behalf of SharePoint. These need to be backed up as part of your backup/restore solution to ensure you get both your SharePoint data and the externalised files.

Restoring Stepwise

The restore steps are similar, but depend on what data you have lost. You can read more on this in the final piece of the puzzle: disaster recovery scenarios in part 3 of this series.

Remote Blob Storage for SharePoint 2010 with Stepwise – Part 1

May 24, 2012

This post has spent a long time in draft, but as we have released the second version of our product and things are progressing nicely, now is a good time to post this.

For several years my company (Invizion) have been working on a Remote Blob Storage product for SharePoint 2010 called Stepwise. Remote Blob Storage is a technology and API provided by Microsoft that allows you to move documents out of your SQL Server database and store them on file-system based storage (think network drive or cloud storage) by a process called “externalisation”. There are nemerous advantages, but the biggest ones are:

  • Your databases are smaller. Sometimes hugely smaller – up to 95% is not uncommon for SharePoint 2010 content databases in particular
  • The documents don’t travel via SQL Server at all, they are stored directly on a file-system. So SQL doesn’t get slammed with I/O, your temp databases aren’t hammered, and transaction size is smaller (but not shorter – see below)
  • You can utilise existing storage systems, like high-capacity drives which are much cheaper. SQL Server uses (or should use!) high-performance drives, which are expensive. This means organisations spend less on physical storage for their SQL Server environment
  • Backup and restore tasks are in most cases substantially quicker. Ask your backup engineer – would they prefer backing up a 1TB SQL Server database, or a 100GB SQL Server database + 900GB of documents? There are also huge improvements in Data Deduplication with backup software that works more efficiently on file-based data rather that SQL Server databases

There are a lot of other benefits, but these are the main ones.

There are several competitors in the Remote Blob Storage provider-space, but Stepwise has some pretty unique features which I think are worth detailing here.

  1. Stepwise isn’t reliant on SharePoint. If SharePoint is down, can you still access and, more importantly, manage your documents? Stepwise can.
  2. Stepwise uses Microsoft Management Console to control all configuration of the system. No Central Admin features to deploy, no separate website, no timer jobs to run, no impact on SharePoint (beyond reading and writing the documents of course!)
  3. Stepwise manages documents, rather than just storing them. Want to add more storage? Covered. Need to move documents to a new location with no down-time? Stepwise can do that. Want to calculate the cost of cloud storage? Stepwise has inbuilt functionality to show you how much the cloud is going to hit your budget.
  4. Stepwise can integrate with your non-SharePoint applications. Stepwise is a fully-featured Content Addressable Storage (CAS) system based on Microsoft’s Remote Blob Storage technology. That means your custom applications can benefit in the same fully-supported way.

What about the 200GB Content Limit?

This is my favorite topic at the moment. The SharePoint 2010 Boundaries and Limits published by Microsoft has a section on supported content database sizes and what you need to support an infrastructure based on your planned usage of SharePoint. I have had various long and very useful discussions with Microsoft SharePoint engineers both in Australia and the US about what this actually means.

First up – the term “content database” isn’t just about the size of your SQL Server database. It is the sum total of all content that resides in your site collection(s) i.e. if it passes through a SharePoint Web Front-End, it is counted in the size of your “content database”. The reason for this is calculating scalability through Microsoft’s customer feedback and experience, and the amount of data that is processed through SharePoint components.

Some of the clients I have spoken to were concerned about this limit, but most organisations with 500+ users are probably going to have the infrastructure to support >200GB “content databases”. As an example, requirements are as follows:

  • Disk sub-system of 0.25 IOPS (Input/output Operations Per Second), 0.2 IOPS preferred for optimal performance. Decent local disk in a RAID configuration should meet this easily in most cases, and the majority of SAN configurations will also meet this criteria.
  • Have a good backup/restore strategy. Common-sense, often-overlooked, but achievable.
  • SharePoint 2010 Administrators. You need them – get some good ones.
  • Customisation complexity. Needs to be assessed on a case-by-case basis by each organisation.
  • Site collection refactoring. More on this later.
  • Backup and restore. See below.

Site Collection Management

Let’s look at what a site collections should be used for. From Sites and site collections overview:

“The sites in a site collection have shared administration settings, common navigation, and other common features and elements. Each site collection contains a top-level site and (usually) one or more sites below it in a hierarchical structure.”

For me this is critical for how you design your site collection, and I believe one of the factors that gets overlooked the most. It is fairly common practice to create new content databases and/or site collections to manage the size and growth of a SharePoint environment. But this presents several problems:

  • Master pages and UI customisations need to be copies over/modified
  • Administration and permissions need to be copied over
  • Navigation betwen site collections must be manually addressed
  • Site authors may need to redo their work
  • Search needs to be reassessed as well to match the content in each site collection
  • Workflows may need to be redesigned to cope with routing approvals and documents between site collections

These tasks should not be undertaken lightly!

My advice to my clients is to make sure they assess all the impacts of creating additional site collections if they are just doing so to avoid the 200GB boundary for SharePoint 2010. The considerations above are a good starting point to help assess whether supporting 200GB+ content collections is better than sticking to a <200GB size.

Continue on to the next post in this series – Backup and Restore options with Stepwise