Tonight, I completed a second conversion of someone else’s Simple PHP Blog to WordPress.
As regular readers know, I developed a C# Windows app to do this for myself, last year. However it is certainly not production-ready, so two separate websites had to be assisted manually with the conversion. Their common issue is that they were combining more than one Simple PHP Blog directory into a single WordPress instance.
The trickiest part of doing this is the category list, and internal links.
With the category list, as in this recent conversion, the two source directories had the same categories between them, but in a slightly different hierarchy. That means manually merging the generated SQL scripts into one file, renumbering the category IDs in the second dataset, and removing the truncate statements.
With internal links, the problem is simple, but the resolution is awful: a blog will point to earlier posts, but when generating the SQL script, I don’t know what the new IDs will be for those posts, plus the permalinks may be changed at some stage by the blog owner. In other words, I need to know a URL that doesn’t exist yet.
Currently, I find all occurrences of internal site links (manually, using a SELECT … LIKE query), generate the new WordPress blog by executing the generated SQL script, manually update the script with the new IDs by searching for them through WordPress’s built-in search, and then re-running the modified script. I really need to find a better way, because this last blog had nearly 100 internal links, and my search and replace is slow.
How do I fix this in code? Well, I have a pretty good idea, but I think it may be error-prone. Simple PHP Blog creates static files and names them whatever the timestamp is. I use this time stamp in the course of generating the script, but I noticed tonight that sometimes the filename and the modified date don’t match up, especially if you’re not using GMT (UTC) date stamps. So if the timezone is screwy, the filename might be wrong.
If I can crack the timezone problem, I might be able to resolve matching the internal links via their timestamps. While this isn’t perfect (multiple posts with the same timestamp, for example), it’s a start. Then I can guess what the URL will be.
Unfortunately, this means having to assign static IDs in the SQL script, which I’m trying to avoid, but is there a better way?
I’ll look at this when I get a chance. Comments and suggestions are welcome.