» Ruby, Rails and Google Sitemaps

Setting up a google sitemap is an easy way to force google to notice your site. A sitemap is just a simple xml file that lists every url you want google to know about. They are especially useful if...

  • You have dynamic content.
  • Your site is new and google is unaware of it.
  • You use a lot of AJAX or Flash.

You also get the added benefit of seeing where the googlebot looked last, where it encountered errors, and your sites top search keywords.

So it's helpful, but is it easy to setup? If you're using Ruby on Rails (or any other ruby based framework) it's cake!

Step 1: Created a script (RAILS_ROOT/scripts/sitemap)

This script will collect all relevant urls and create a file at RAILS_ROOT/public/sitemap.xml that contains info about each url. For example, let's pretend we have a site devoted to hippo pictures, our script would look like this...

#!/usr/bin/env ruby

ENV['RAILS_ENV'] ||= "production"

Dir.chdir(File.expand_path(File.dirname(__FILE__) + "/..")) # Change current directory to RAILS_ROOT
require "config/environment" # Start up rails

# These two lines make life super easy... It allows you to call url_for/link_to outside of a controller or view
include ActionController::UrlWriter
default_url_options[:host] = 'www.hippos-are-awesome.com'

filename = "#{RAILS_ROOT}/public/sitemap.xml"

hippo_pics = HippoPic.find(:all) # Such a wonderful collection

File.open(filename, "w") do |file|
  xml = Builder::XmlMarkup.new(:target => file, :indent => 2)

  # This
  xml.instruct!
  xml.urlset "xmlns" => "http://www.sitemaps.org/schemas/sitemap/0.9" do
    for hippo_pic in hippo_pics
      xml.url do
        xml.loc url_for(:controller => "hippos", :id => hippo_pic.id)
        xml.lastmod hippo_pic.updated_at.xmlschema
        xml.changefreq "weekly"
        xml.priority 0.5
      end
    end
  end
end

For more info about what the lastmod, changefreq and priority mean in the sitemap, google explains it all here. Basically they tell google which urls are more important.

Step 2: Create a daily or weekly cronjob to run the sitemap script

Just switch to the user that runs your ruby apps and add this to its crontab.

20 2 * * * PATH_TO_RAILS_APP/script/sitemap # Runs the sitemap script every morning

Step 3: Let google know about your sitemap

Head over to google's webmaster tools and follow the instructions on how to point google to your sitemap

That's it. Some other additional things to consider are

  • gzip your sitemap. Google can read them just fine and you save on bandwidth.
  • If you have more than 50,000 links you need to split your sitemap into several files.
  • Other search engines (like yahoo) can take google style sitemaps too.

User Comments

Recent Posts

  • Interactive console for iOS! - August 20, 2010
  • Archive

    • Letter to Steve Jobs - April 11, 2010
    • Wax talks to Twitter - October 20, 2009
    • How does iPhone Wax work? - October 19, 2009
    • Setting up iPhone Wax - October 18, 2009
    • Ruby (tinyrb) on iPhone - May 03, 2009
    • Building PCRE static lib for the iPhone - May 02, 2009
    • Amazon EC2 + Chef = Mmmmm - March 29, 2009
    • Objective-c key paths - February 13, 2009
    • POW! - December 26, 2008
    • Abusing Ruby's question mark methods. - November 28, 2008
    • Git hooks make me giddy - November 07, 2008
    • Ruby Equality! equal? eql? == and === - October 26, 2008
    • Ruby, Rails and Google Sitemaps - October 20, 2008
    • Projects

      • Wax Obj-C to Lua bridge for iPhone.
      • Pow a Ruby library for making file & directory manipulation easy.
      • MiniMagick a tiny RMagick replacement.