Migrate SharePoint Blog to WordPress

As promised here, this is a follow-up post with the tool I developed for the SharePoint to WordPress migration.

First, a screenshot:

Migrate SharePoint ot WordPress Screenshot

What is it, that we have to cover with a migration? Copying the posts is not enough. So I came up with this features:

Features

  • Copy posts
  • Copy comments
  • Copy resources like images and downloads
  • Create needed tags and categories
  • Modify links to local resource
  • deal with https, if links are absolute on the source blog and mixed with http
  • Using web services to connect to source and destination
  • URL rewriting (covered by a WordPress Plugin)
  • Delete all content from the destination blog (for migration testing)
  • Replace strings (with Regex)
  • a nice (WPF) GUI

Description

Originally I’ve build a plain console application. Then I thought that a console application would possibly scare some users. And after some time I wanted to do some WPF again. So I created a WPF application, to wrap all the functionality into a GUI. This way it will be easier to use for the folks out there, who do not like black console applications 😉 Since I am using web services to connect to both blogging platforms, the tool can be executed on any client computer. No access to a server session is required.

To start, you obviously need URLs to the source and destination blog, as well as credentials to the destination blog. Since most blogs are anonymous, you’ll probably not need to fill in the source credentials. The migration starts by hitting the “Migrate Content” button. That should be it for using the tool. It will remember the last entries for the URLs and login names, in case you need to perform multiple runs, which was the case for me. The passwords will need to be reentered for security reasons.

It’ll show the progress of all steps in a progress bar and text at the bottom of the application and tell you when it’s finished. Existing categories are mapped to new categories and used as tag, too. I’ve tested the tool with three blogs, one being my own with installed CKS:EBE. There really isn’t much more to configure, to have your blog being migrated to WordPress with this tool.

Some data needs to be modified, before the blog can go live on the new destination. In case of URLs this is necessary to generate valid links within the destination. Fortunately there is a plugin available to do some fancy rewriting. Since WordPress is showing its own smilies, I wanted to get rid of some strings within the posts, that reference smilies as images and replace them with, well, smilies. A txt file within the same directory with the name “replacestrings.txt” will take lines with strings for replacement.

<img.[^>]*/wlEmoticon-smile_2.png"(>| >|/>| />|</img>)*;#:-)
<img.[^>]*/wlEmoticon-sadsmile_2.png"(>| >|/>| />|</img>)*;#:-)
<img.[^>]*/wlEmoticon-winkingsmile_2.png"(>| >|/>| />|</img>)*;#:-)
http://www.hezser.de/_layouts/images/download.gif;# 

The sample will replace all my old smilie images with plain string before posts are created on the destination blog. The images that were used as smilies in the source, won’t be copied to the destination, because they are not referenced anymore. Otherwise I got many images with smilies. I like smilies 😀

You can stop reading here, if you are a user and would like to migrate your blog and download the tool. As a developer you might be interested on how the tool works…

Technical stuff

The tool gives me a good opportunity to explain some programming tasks, I used for the migration tool. I will explain some of them.

SharePoint offers web services (_vti_bin/lists.asmx), WordPress an XML RPC interface (I used CookComputing.XmlRpc to connect). Those two are used to connect to the blogs. Since the SharePoint web services need Displaynames to connect to the posts and comments list, I first queried them by list template.

Querying SharePoint for List Titles

Use the SharePoint lists web service, to get all lists of a site and search for specific lists like the posts and comments. The lists are identified by the used template. That way I do not have a localization issue.

_lists = new Lists
{
	Url = string.Format("{0}/_vti_bin/lists.asmx", BlogUrl),
	Credentials = CredentialCache.DefaultNetworkCredentials
};
XDocument response = XDocument.Parse(_lists.GetListCollection().OuterXml);
IEnumerable<XElement> lists = response.Root.Descendants(XName.Get("List", _s.ToString()));
foreach (XElement list in lists)
{
	XAttribute listTemplate = list.Attribute(XName.Get("ServerTemplate"));
	if (listTemplate != null && listTemplate.Value == "301")
	{
		// found Posts list
		PostListName = list.Attribute(XName.Get("Title")).Value;
		PostListServerRelativeUrl = list.Attribute(XName.Get("DefaultViewUrl")).Value.Replace("/AllPosts.aspx", string.Empty);
	}
	else if (listTemplate != null && listTemplate.Value == "302")
	{
		// found Comments list
		CommentListName = list.Attribute(XName.Get("Title")).Value;
	}
}

With the list names retrieved, I can query the lists for data. The web services use display names to identify lists.

Get SharePoint items with paging via web service

XDocument response = GetListItems(postsConfig);
do
{
	XElement root = response.Root;
	foreach (XElement row in root.Descendants(XName.Get("row", _z.ToString())))
	{
		// parse data here
	}
	XElement node = root.Descendants(XName.Get("data", _rs.ToString())).First();
	XAttribute nextNode = node.Attribute("ListItemCollectionPositionNext");
	if (nextNode != null)
	{
		postsConfig.ListItemCollectionPosition = nextNode.Value;
		if (!string.IsNullOrEmpty(postsConfig.ListItemCollectionPosition))
		{
			postsConfig.PageSize = node.Attribute("ItemCount").Value;
			response = GetListItems(postsConfig);
		}
	}
	else
	{
		postsConfig.PageSize = null;
		postsConfig.ListItemCollectionPosition = null;
	}
} while (!string.IsNullOrEmpty(postsConfig.PageSize));

The method to actually query the web service for listitems. Properties of the class SharePointListConfig for the list title, ListItemCollectionPosition and Pagesize are simple string properties. The fields are specified, to get only the data we need for the migration.

private XDocument GetListItems(SharePointListConfig config)
{
	var xmlDoc = new XmlDocument();

	XmlNode ndQuery = xmlDoc.CreateNode(XmlNodeType.Element, "Query", "");
	XmlNode ndViewFields = xmlDoc.CreateNode(XmlNodeType.Element, "ViewFields", "");
	XmlNode ndQueryOptions = xmlDoc.CreateNode(XmlNodeType.Element, "QueryOptions", "");

	if (!string.IsNullOrEmpty(config.ListItemCollectionPosition))
	{
		ndQueryOptions.InnerXml = string.Format("<IncludeMandatoryColumns>FALSE</IncludeMandatoryColumns><DateInUtc>TRUE</DateInUtc><Paging ListItemCollectionPositionNext=\"{0}\" />",
			config.ListItemCollectionPosition.Replace("&", "&"));
	}
	else
	{
		ndQueryOptions.InnerXml = "<IncludeMandatoryColumns>FALSE</IncludeMandatoryColumns><DateInUtc>TRUE</DateInUtc>";
	}

	// get all comments and posts
	if (config.ListItemType == SharePointListConfig.ListType.Posts)
	{
		// PostCatgory for SP Blog, BlogTitleForUrl and Categories for EBE Blogs
		ndViewFields.InnerXml = "<FieldRef Name='ID' /><FieldRef Name='Title'/><FieldRef Name='Body'/><FieldRef Name='PublishedDate'/><FieldRef Name='BlogTitleForUrl'/><FieldRef Name='Categories'/><FieldRef Name='PostCategory'/><FieldRef Name='Author'/>";
	}
	else
	{
		ndViewFields.InnerXml = "<FieldRef Name='ID' /><FieldRef Name='Title'/><FieldRef Name='Body'/><FieldRef Name='PostTitle'/><FieldRef Name='CommentUrl'/><FieldRef Name='EmailAddress'/><FieldRef Name='Author'/><FieldRef Name='Created'/>";
	}
	try
	{
		XmlNode ndListItems = _lists.GetListItems(config.GetListName(), null, ndQuery, ndViewFields, null, ndQueryOptions, null);
		XDocument response = XDocument.Parse(ndListItems.OuterXml);
		return response;
	}
	catch (System.Web.Services.Protocols.SoapException ex)
	{
		throw new Exception(ex.Message + Environment.NewLine + ex.Detail.InnerText, ex);
	}
}

After all data has been read, local resources parsed and links replaced we move on to the destination side.

WordPress specific details

As stated above, I’ve use an existing library. There are plenty of samples out there, if you look for them. I’ve implemented the following methods.

public interface IWordpressXmlRpc
{
	[XmlRpcMethod("metaWeblog.newMediaObject")]
	WordpressFile newImage(string blogid, string username, string password, WordPressFile theImage, bool overwrite);

	[XmlRpcMethod("wp.getMediaLibrary")]
	MediaItem[] getMediaLibrary(string blogid, string username, string password, MediaFilter filter);

	[XmlRpcMethod("wp.deletePage")]
	bool deletePage(string blogid, string username, string password, int page_id);

	[XmlRpcMethod("metaWeblog.getRecentPosts")]
	ExistingPostContent[] getRecentPosts(string blogID, string username, string password, int numberOfPosts);

	[XmlRpcMethod("metaWeblog.newPost")]
	string newPost(string blogid, string username, string password, NewPostContent content, bool publish);

	[XmlRpcMethod("metaWeblog.editPost")]
	bool editPost(string blogid, string username, string password, NewPostContent content, bool publish);

	[XmlRpcMethod("wp.deletePost")]
	bool deletePost(string blogid, string username, string password, int postid);

	[XmlRpcMethod("wp.newComment")]
	int newComment(string blogid, string username, string password, int post_id, Comment comment);

	[XmlRpcMethod("wp.getComments")]
	Comment[] getComments(string blogid, string username, string password, CommentFilter filter);

	[XmlRpcMethod("wp.editComment")]
	bool editComment(string blogid, string username, string password, int comment_id, Comment comment);

	[XmlRpcMethod("wp.deleteComment")]
	bool deleteComment(string blogid, string username, string password, int comment_id);

	[XmlRpcMethod("wp.newTerm")]
	string newTerm(string blogid, string username, string password, TaxonomyContent content);

	[XmlRpcMethod("wp.getTerms")]
	Term[] getTerms(string blogid, string username, string password, string taxonomy, TermFilter filter);

	[XmlRpcMethod("wp.deleteTerm")]
	bool deleteTerm(string blogid, string username, string password, string taxonomy, int term_id);
}

I would like to tell you some issues I had, so you don’t get the same problems I had programming with the WordPress XML RPC interface.

Post deletion

Just call the wp.deletePost method? Almost. You’ll have to call it twice to first move it to the recycle bin and then again to have posts being deleted permanently.

Media deletion

There is no method to delete items from the media gallery 🙁 Fortunately items within the gallery behave like pages. So if you implement an call the deprecated wp.deletePage interface, you can achieve what you want (remember to delete twice).

Categories and Tags

Both can be managed with the interface for terms the string for the parameter “taxonomy” will decide what to do. It can be “category” or “post_tag”.

Other than that, the WordPress API is pretty straight-forward and easy to use.

Download

The download contains an executable, which is the tool itself, and a folder with the complete sourcecode.

Migrate SharePoint To WordPress