In MySQL 5.7.5, we introduced new functions for encoding and decoding Geohash data. Geohash is a system for encoding and decoding longitude and latitude coordinates in the WGS 84 coordinate system, into a text string. In this blog post we will take a brief look at a simple example to explain just how geohash works.
Where on earth is “u5r2vty0”?
Imagine you get a email from your friend, telling you that there is free food at “u5r2vty0”. But where on earth is “u5r2vty0”?
The first step in converting from “u5r2vty0” to latitude and longitude data, is decoding the text string into its binary representation. Geohash uses base32 characters, and you can find the character mapping on several online resources. Anyway, the decoded text string is as follows:
“u5r2vty0” = 11010 00101 10111 00010 11011 11001 11110 00000
The binary digits are then used to repeatedly divide the earth in two, and thus increasing the accuracy with each digit used. Without using any digits, all we know is that the location is somewhere in the range [-180, 180] longitude and [-90, 90] latitude, which is anywhere on the earth! This won’t help you too much in finding the food. Note that latitude 0 is the IERS Reference Meridian, and longitude 0 is equator.
The first digit will halve the longitude, where ‘0’ means discarding the right half and ‘1’ means discarding the left half. In this example, we discard the left half since the first digit is a ‘1’. Now we know that the food is located somewhere in the range [0, 180] longitude and [-90, 90] latitude, and thus you know that you should not be heading towards the Americas.
The next digit halves the latitude, using the same rules. Since the second digit also is a ‘1’, we’ll discard the lower half, and now we are in the range [0, 180] longitude and [0, 90] latitude. We still have few digits left before we know where the free food is, but as you can see the accuracy increases dramatically with the first two digits.
The third digit halves the longitude again, and now you’ll probably get the idea on how decoding a geohash string works. Do you manage to find out where the free food is?
Properties and uses of Geohash
- Since a coordinate tuple is represented as a string, it is possible to index and search for geographic data in one-dimensional indexes (B-tree for example).
- As you might have figured out from the example, you can see that two geohash strings with the same prefix will be close to each other. This makes proximity searches easy, but remember that the opposite also can be true in some cases; two geohash strings with different prefixes can be located right next to each other. Imagine two locations that are on each side of the IERS Reference Meridian, but only a few meters apart. One geohash string will begin with a ‘0’, and the other will begin with a ‘1’.
- The accuracy of the geohash string depends on the number of characters. The more characters you have, the more accurate position you get. Thus you can remove characters from the end to gradually reduce its size and accuracy as required.
- It has been proposed to be used for geotagging.
- Since similar locations (often) share similar prefixes, it’s simple to spread out geohashes on different partitions, where similar locations are located on the same partitions.