Given there are existing tools that already URL shorten, and scan for malware, track statistics, etcetera, why would one want to roll their own? Because it is fun, and pretty darned easy. There are essentially four things to be done in order to make a URL shortener:
Step One
Acquire a short domain name, preferably one that demonstrates your clever branding skills, like goo.gl Then assign the domain to your favourite webserver, like nginx, or Apache.
Step Two
Setup a database to hold your URLs. The database paradigm is not important so long as you have a means of generating unique integer indecies for each row/document. The minimum data you will need to store is a unique integer, the full long URL, and the relative short URL.
Step Three
Write a front controller to handle redirects. This is a very simple script, so I wrote mine by hand (no frameworks/libraries) for optimal performance. The script needs to detect the requested URI, look it up in the table (it iss the short URL field/attribute) and retrieve the corresponding long URL, and forward the user to the long URL. If the URL is not found, have the controller forward the user to the management console, or the "404" of your choice.
Step Four
Write management software for your table/collection. This software should at a minimum take a long URL, determine if it is already in the table/collection, and insert it, if it is not. In order to dermine the short URL, simply take the unique integer index from the row/document, and base62 convert it. Done!
Appendix I: Things To Do
- Verify the person adding URLs is a human and/or
- Add an authentication mechanism to the management software
- Implement a caching layer on the front controller
- Make the management software user friendly
Appendix II: Why Base62?
If the numbers: 0,1,2,3,4,5,6,7,8,9
are base10, and the numbers: 0,1
are
base2, and the numbers, and letters: 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f
are
base16, base62 is a number system made up of the numbers, and letters:
0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,A,B,C,
D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z
In this system the numbers would have the following values, ...
is an
indicator for the reader to fill in the next several numbers in the series.
- 1==1
- 2==2
- 3==3
- ...
- 9==9
- 10==a
- 11==b
- ...
- 60==Y
- 61==Z
- 62==10
- 63==11
- ...
- 100=1C
- 101=1D
- ...
- 1000=g8
- 1001=g9
- ...
- 1000000=4c92
What this lets us do is take the unique integer index from each row/document, and make a unique alphanumeric combination that represents our short URL.
I recently made a short URL at Google that was: “5GSFR”, if Google is using base62, that number would be 84,101,627. That is a lot of short URLs.
Why not use another base, after all there are other non-alphanumeric characters that are URL safe? From what I understand they are not necessarily SMS URL safe, and base 62 is “good enough”
Addendum
This post was originally shared to Google+. The author has deliberately opted to not use the Google+ embed feature.