Back to static with Jekyll

A return to simpler times

The last version of my site was far from complex, consisting of a few custom php files and a MySQL database. It served its purpose for a few years and worked well for the short posts I tended to write. Last year something happened to the MySQL database the posts were stored in during some kind of Dreamhost migration. It was enough to get me thinking about moving the blog back to a statically-generated site. At this time I had been using Jekyll for a couple years at slide to rock. I liked it because it was simple, let me manage posts as markdown files, and made it easy to keep the site backed up. It also reminded me of the C# program I wrote to generate my website years ago while learning Mono.

I recently got around to redoing the site.

Migrating old blog posts

I used the YAML export feature of phpMyAdmin to download my old posts. I then set out to parse the file into a bunch of separate Markdown files to drop into the _posts directory using Ruby but became frustrated when I couldn't remember any of the string parsing and file IO methods off-hand. Rather than look them up I wrote the program in Objective C and soon had a few hundred post files.

Mapping old URLs

On my old site, I used Apache's rewrite rules for permalinks like:

http://romej.com/archives/717/visiting-boston

I wanted to maintain the same link structure for old posts on the migrated site and started by setting the YAML front matter of the generated posts to:

---
layout: post
title: "Visiting Boston"
permalink: /archives/717/visiting-boston
---

This was close to what I needed, but Jekyll generates, as it should, the actual HTML at:

_site/archives/717/visiting-boston/index.html

I forked Jekyll and added some settings to help with generating URLs without the trailing slash, essentially making the last path component a file and not a directory. Sticking with the example above, the --noslash option generates a file at the following path:

_site/archives/717/visiting-boston.html

I added noslash: true to the site's _config.yml instead of passing it via command line all the time. My plan was to rely on Nginx's rewrite rules to handle mapping the extensionless URLs to the actual .html files.

One issue I ran into was testing locally using Jekyll's built-in WEBrick server. My generated pages were referencing posts that looked like directories (links that would work fine when rewritten on the server). I added a --prewrite option for local testing that simply generates the full path to each HTML file in the permalink.

You can see the commit with these changes at GitHub.

Rewriting URLs with Nginx

My Nginx configuration got a little funky because I had some paths that I wanted to behave as usual (eg, /iphone resolves to /iphone/index.html) Here's most of my nginx.conf. If you see anything below that makes you go dude, no let me know.

server {
  listen 80;
  server_name www.romej.com;
  rewrite ^ http://romej.com$uri permanent;
}

server {
  listen 80;
  server_name romej.com;
  autoindex on;
  #rewrite_log on;
  root html/romej.com/;
  location / {
    index index.html;
    rewrite ^/rss/blog$ /rss/blog.xml break;
  }

  #
  # The following locations add .html to urls, while also
  # making sure direct requests to a-post.html or /a-post/
  # are redirected to /a-post
  # 
  # By specifying locations can avoid doing if (-f) tests.
  #

  # new permalinks look like /2012/03/some-post
  # uggh, not sure how to match [\d]+ or [\d]{4}
  location /20 {
    rewrite ^(.+)/$ $1 permanent;
    rewrite ^(.+).html$ $1 permanent;
    rewrite ^(.+)$ $1.html break;
  }
  
  # old permalinks and /archives page
  location /archives {
    rewrite ^(.+)/$ $1 permanent;
    rewrite ^(.+).html$ $1 permanent;
    rewrite ^(.+)$ $1.html break;
  }
}

The first two rewrite lines in the location blocks make sure that any references to /some-post/ or /some-post.html are 301 redirected to /some-post, the canonical URL.

One more thing: the archives

My old site had a simple archives page that listed every post, grouped by month, most recent first. I wrote a Jekyll plugin to handle grouping the posts by date so that I could use them in my archives layout file. To do this I maintained a list of dates normalized by year and month in months. Each month served as a key in the posts_by_month dictionary; the value was a list of posts written that month.

   def group_by_month(posts)
    months = []
    posts_by_month = {}
    posts.reverse.each do |post|
      key = Time.utc(post.date.year, post.date.month)
      if posts_by_month.has_key?(key)
        posts_by_month[key] << post
      else
        posts_by_month[key] = [post]
        months << key
      end
    end
    return [months,posts_by_month]
  end

Finally, I used the generator output in my _layouts/archives.html layout template.

You can see the result on the archives page of this site.

Post and deploy

My site has about 600 pages and takes 4.2 seconds to generate. That's probably close to a best-case scenario, as I don't use any plugins for tags or related posts, etc.

I created a Rakefile in my site's root directory so that I could run some common tasks more easily. Here are some common Jekyll Rake tasks to base your own on. The rsync one, in particular, makes deploying your site easy. After the initial site copy, only changed files get updated every time you deploy.

Conclusion

It took me longer to do all of this than I expected, but it got addictingly fun as it progressed. After enduring the Slicehost to Rackspace migration this year for another site I wanted to make sure my personal site would be easy to migrate should something similar occur again. Another benefit is the natural redundancy you get from this setup: I have full copies of my site on Dropbox, on my local computer, and, of course, on the web server.