Building better short urls

To mix things up a bit I, Wesley Beary (aka geemus), a Software Engineer, will be taking the blog for a spin of a more technical nature.

At Plinky we needed short url support for our Twitter integration. I wasn’t quite satisfied with the ruby implementations I looked through, though.

Their basic approach was:
1. Create a random token for the url of some arbitrary length.
2. Do a db lookup to ensure this token is not already in use.
3. Record the record, subsequent lookups depend on this token being indexed.

I didn’t like the arbitrary token length and arbitrary mapping to records. You end up with longer tokens and eventually exhaust those available and get tied up in collisions. I wanted shorter links that grew as needed, even if it disallowed arbitrary tokens.

I decided to create links using the next shortest available token and use a natural mapping between links and tokens. Drawing inspiration from base conversion I came up with this:

module Base
	BASE_CHARACTERS = '0123456789BCDFGHJKLMNPQRSTVWXYZbcdfghjklmnpqrstvwxyz'

	def encode(int)
		str = ''
		while int > 0
			character = int % BASE_CHARACTERS.length
			int = int / BASE_CHARACTERS.length
			str = BASE_CHARACTERS[character..character] + str

	def decode(str)
		int = 0
		str.reverse.split('').each_with_index do |char, i|
			int += BASE_CHARACTERS.index(char) * (BASE_CHARACTERS.length ** i)

Easier to read copy can be found at gist.

To get a link’s token, you call Base.encode(link_id), and get a tokens link id you call Base.decode(token).

Links will be as short as they can be (we left out the vowels and special characters, for prettier urls and to prevent english words as tokens), map clearly, and use the available auto-incrementing id.

It took a couple near misses to get here, but I’ve been happy with it so far. Let me know if you have questions, if I’m overlooking anything or if you reimplement this in the wild.

This entry was posted in engineering, Plinky. Bookmark the permalink.

Leave a Reply