IP address geolocation and distance calculation
During a discovery conversation with a client, an interesting question was brought up. The client hosted seminars all over the globe which they listed on a content page within their website. During feature brainstorming someone said “Wouldn’t it be cool if we could show them the closest seminar to their location?” The client quickly began dismissing the idea because he was worried about interrupting the user by making them either input a zip code or getting that “current location” browser notification which seems to incite invasion of privacy panic attacks in some users. I wanted to dig into the idea a little further so as the brainstorming session continued I began working through the problem in my head. I knew that IP address geolocation could be used to gather some rough data about general location of at least the user’s ISP but didn’t yet have an answer about how accurately. I told the client to not completely dismiss the idea and that I would build some tests to see if it was something that we could actually accomplish. After the brainstorming session I got a little more information from the client about how seminar information was stored and updated, how far or close from each other they were on average, and some other details to try to come up with a method to accomplish this functionality.
Here are is the data I gathered:
- Seminar locations are very spread out (Denver, New York, Beijing, London, etc.)
- Seminar data (including it’s location) is entered into a CMS in plain text within a table
- Locations were typically entered as City, State or City, Country
- The locations cell on each row of the table had a class for CSS styling
- Most of the traffic to this page was desktop, not mobile
Based on this data I knew step one was to test how accurate IP address lookups were so I created this A/B test. Each one queries a different IP address database through an API and returns a City, State. I am still hoping to gather more data on which method is more accurate, so while you are there please follow the instructions and vote.
Secondly I needed to try to calculate a distance between where I thought you were, and some other location. Since the table cell with the location of the seminar was had a class, I knew that for each row of the table I could grab the location to do the calculation. My first method of calculating distance was to use the Google Maps API and I did one version using the Directions API and one version using the Distance Matrix API. The results were the same as they use the same method of calculations, but since I could feed the Distance Matrix a whole list of locations I wanted to calculate distances to, it seemed like my best option.
While this didn’t give me precise accuracy, it worked well enough to gauge whether Denver was closer to you than New York and so on. With some continued testing I ran across a very serious problem with this method however. The Google Maps Directions and Distance Matrix calculations are based upon a method of travel. Even though you can get “flying directions” through Google Maps (really it just wants to sell you a ticket through Google Flights) you can not get flying directions through the API. If the locations I was trying to calculate distances between could not be traveled between by driving, walking, biking, or public transportation, no distance result would be returned. Since many of the seminars offered were overseas, this method of calculation wasn’t a very good solution. In my A/B test, method A (left side) still uses this distance calculation, try inputting a destination across the Atlantic Ocean.
A little more research brought me to the Haversine formula. This formula is a mathematical (trigonometry) method of calculating distances over a three dimensional sphere. While the math is a little complex, once broken down and utilized it gives me an “as the crow flies” distance between any two latitude and longitude coordinates. Now all I needed to do was parse addresses into latitude/longitude using the Google Maps Geocoding Service API and apply some advanced math to the results. There are some small percentage deviations from 100% accuracy (somewhere near 0.3%) as the earth is not a perfect sphere, but it is plenty accurate for this purpose. It is actually more accurate than the Distance Matrix method since I am really concerned with straight line distance, not driving distance on often indirect roads.
The A/B test now uses the Haversine formula for distance calculation on method B (the right side).
Tagged: A/B test, APIs, distance calculation, geolocation, Google, Google Maps, haversine formula, IP address