How I Made It - Francis' Geolocation API


Click here to go back to the demo

Project Highlights

How It Works

Sourcing the Data

The most important part of my API is the data it serves. The Regional Internet Registries (RIRs) actually publish IP Geolocation data every 24 hours in the form of a text file. I have a GitHub Actions pipeline which runs every 24 hours to retrieve the latest IP data from each RIR.

To improve the accuracy of the data it's important to supplement the data from the RIRs with data from other primary sources. I use several public Geofeeds for this purpose. Geofeeds are comma-separated value files containing IP ranges and location information. Geofeeds were first proposed by Google in 2013.

GitHub pipeline for IP Geolocation

Screenshot of my GitHub Action which re-generates my IP geolocation database every 24 hours

Writing My Own MMDB Library for Go

For the database I chose to use the MMDB file format, this format is widely used throughout the IP geolocation industry and chosing it represented an opportunity to learn a new binary file format.

Shortly after beginning this project I discovered there was no suitable MMDB library written in Go, the reference implementation for parsing MMDB files is written in Perl.

I therefore decided to write my own library for parsing and generating MMDB files, I used the MMDB file format reference as my guide. You can find the reference here https://maxmind.github.io/MaxMind-DB/

After a substantial amount of work I successfully finished my library. It's called lib-mmdb, you can find it on my GitHub.

My library is very easy to use. You can generate a new MMDB file in just a few lines of Go code.

db := mmdb.NewMMDB()
_, cidr, err := net.ParseCIDR("1.0.1.0/24")
if err != nil {
    panic("invalid CIDR")
}
db.Insert(cidr, field.String("CN"))
db.Finalise()
db.Bytes() // The output of db.Bytes() can be written to disk as a .mmdb file

The above sample code shows how to create an MMDB object, add an IP block and serialise it using my Go MMDB library.

An interesting property of MMDB files is you can store lots of different data types inside the database. In the previous example I associated a single IP block with a string but my MMDB library supports all MMDB data types, the data type that's most commonly used is map.

myMap := field.NewMap()
myMap.Put(field.String("country"), field.String("GB"))
myMap.Put(field.String("city"), field.String("London"))
_, cidr, err := net.ParseCIDR("2001:67c:2d20::/48")
if err != nil {
    panic("invalid CIDR")
}
db.Insert(cidr, myMap)

The above example shows how to add an IPv6 block to the database and associate a map with it.

How MMDB Files Work

I would summarise MMDB files by saying they are a binary format for representing trie data structures.

Tries are also known as prefix trees because they are very efficient at finding keys sharing a particular prefix, the search time only depends on the length of the prefix. Searching for a prefix in a trie is an O(N) operation where N is the length of the prefix. Each row in the Internet Routing Table is an IP prefix and it's for this reason Internet Routing Tables are normally implemented using tries.

An MMDB file at its core is just a serialised trie where each node is some geolocation data for a particular IP prefix.

Re-Creating the Database Every Day

The next important aspect of this project is called the IP Intelligence Pipeline. It takes all of the IP data and Geofeeds in their various forms and produces the MMDB database file. The RIRs all publish their data in slightly different formats which added some complexity to the IP Intelligence Pipeline. You can find the source code for the IP Intelligence Pipeline on my GitHub here https://github.com/FrancisMcN/ipi-pipeline

Each pipeline stage is shown in the diagram below.

IP Intelligence Pipeline stages to generate the MMDB file

The stages the IP Intelligence Pipeline goes through to generate an MMDB file every day.

Once the database file has been generated it's uploaded to DigitalOcean Spaces, an Amazon S3 compatible object store. Only the latest MMDB file is served from my API but I keep all of the previous versions to enable potential interesting future projects. One such project I'm considering is tracking historical IP address movements.

MMDB files in the object store

The MMDB files inside DigitalOcean's object storage service

Serving the MMDB Database Using My API

The API is served by a lightweight Go service. You can find the source code for the API service on my GitHub here https://github.com/FrancisMcN/ipi-api. When the API service launches it finds the latest MMDB file from object storage, downloads it and begins to answer requests. The service's startup logs are shown below.

[london-ipi-api] [2024-08-09 12:36:13] 2024/08/09 12:36:13 Finding DB files in Spaces ...
[london-ipi-api] [2024-08-09 12:36:13] 2024/08/09 12:36:13 Finding latest DB file ...
[london-ipi-api] [2024-08-09 12:36:13] 2024/08/09 12:36:13 Found latest DB file.
[london-ipi-api] [2024-08-09 12:36:13] 2024/08/09 12:36:13 Downloading latest DB file ...
[london-ipi-api] [2024-08-09 12:36:14] 2024/08/09 12:36:14 Downloaded latest MMDB file (ipi/db-2024-08-09-T12-33-58.mmdb) from Spaces.

API Service startup logs

You can access the API directly from here https://ip.francismcnamee.com (I recommend you access the API using the UI I created on the previous page though). To find location information about a particular IP address you can simply send a GET request to https://ip.francismcnamee.com/<IP Address>. An example request sent using curl is shown below.

$ curl https://ip.francismcnamee.com/8.8.4.4
{"ip":"8.8.4.4","country":"US"}

Example API call using curl

Building the UI

To make the API easier to use, I created a simple UI which can be found on the previous page. My API returns the ISO 3166-1 alpha-2 country code of each country, rather than the country's full name, mainly to reduce the size of the generated MMDB files.

Geolocation API's UI

The UI to access my geolocation API

In my demo I used the Intl.DisplayNames Javascript API to convert each country code into the country's actual name. Unfortunately, that API isn't compatible with all browsers, therefore I pre-calculated the country names ahead of time and hard-coded them to maximise browser compatibility. The code to map country codes to country names is shown below.

function generateCountries() {
    // Generate country names
    let getCountryNames = new Intl.DisplayNames(['en'], {type: 'region'});

    let output = {};
    for (let i = 0; i < 26; i++) {
        for (let j = 0; j < 26; j++) {
            
            let code = (String.fromCharCode(97 + i) + String.fromCharCode(97 + j)).toUpperCase();
            let countryName = getCountryNames.of(code);

            if (countryName.length > 2) {	
                output[code] = countryName;
            }
        }
    }
    console.log(JSON.stringify(output, null, 4))
}

Javascript code to map country codes to country names

The output of the previous function is a Javascript object which maps all country codes to country names. An excerpt of the output is shown below.

// Generated by running the generateCountries() method above. Pre-generated
// ahead of time to maximize compatibility across browsers.
let countries = {
    "AC": "Ascension Island",
    "AD": "Andorra",
    "AE": "United Arab Emirates",
    "AF": "Afghanistan",
    "AG": "Antigua & Barbuda",
    "AI": "Anguilla",
    "AL": "Albania",
    "AM": "Armenia",
    ...
}

I added a few more finishing touches to my demo, such as converting each country code into an Emoji flag for the respective country and adding the appropate 'the' prefix before country names such as the United States and the United Kingdom to make the output grammatically correct.

I hope you enjoy using my geolocation API as much as I enjoyed making it.