skip to Main Content

I have 3 billion IPs stored in the following ip_location table.

CREATE TABLE ip_location (
        id BIGINT AUTO_INCREMENT PRIMARY KEY,
        ip VARCHAR(15) NOT NULL,
        country VARCHAR(2),
        region_1 VARCHAR(255),
        region_2 VARCHAR(255),
        latitude DECIMAL(10, 6) NOT NULL,
        longitude DECIMAL(10, 6) NOT NULL,
        INDEX idx_ip (ip)
);

// example json data
{
        "ip": "1.2.3.4",
        "country": "US",
        "region_1": "California",
        "region_2": "San Francisco",
        "latitude": 37.769700,
        "longitude": -122.393300
}

The service we’re launching is popular with people all over the world.
I want to limit some of our APIs to a certain regions.
To check the user’s access location, checking the IP in the database every time the user calls
the API increases the burden on the DB and network.
Therefore I want to preload the IP data into memory to reduce the load.

Could you please suggest an efficient way to store and access this type of data?
I am going to load all data to Redis but I think it requires a lot of RAM, am I wrong?

2

Answers


  1. You can store IP addresses yourself, but what seems to be a better idea is to store only recently used IP addresses in your Redis, maybe IP addresses that you’ve got a request from in the latest 30 minutes and regularly emptying IP addresses that were not being sending requests from for a while.

    You can query for the country of an IP address via

    https://api.country.is/<the IP address>
    

    Try it yourself with your IP.

    So the flow would be:

    1. If the IP address is in Redis, then infer the country from there and update its timestamp to now
    2. If the IP address is not in Redis, then query for it from api.country.is and store the resulting country as well as the timestamp in Redis
    3. Have a cron job that removes old references from Redis
    Login or Signup to reply.
  2. You need to normalize this table better, and use better data types. This will keep the size of the data down.

    • IP address can be either v4 or v6. v4 fit in 32 bits, v6 fits in 128 bits and includes all v4 addresses. So if you only need to support v4 then store the data in binary(4) otherwise use binary(16).
    • Get rid of the id column, it’s not necessary as the ip column should be unique, and it’s small. Surrogate keys are normally only necessary when the natural key is wide or changes often.
    • Country codes are fixed at either two or three characters in the ISO spec.
    CREATE TABLE country (
      country CHAR(2) PRIMARY KEY
    );
    
    CREATE TABLE region_1 (
      id bigint PRIMARY KEY,
      country CHAR(2) NOT NULL REFERENCES country (code2),
      region_1 VARCHAR(255) NOT NULL,
      UNIQUE (country, region_1)
    );
    
    CREATE TABLE region_2 (
      id bigint PRIMARY KEY,
      region_1 bigint REFERENCES region_1 (id),
      region_2 VARCHAR(255) NOT NULL,
      UNIQUE (region_1, region_2)
    );
    
    CREATE TABLE ip_location
    (
      ip binary(4) NOT NULL PRIMARY KEY,
      region_2 bigint NOT NULL REFERENCES region_2 (id),
      latitude DECIMAL(10, 6) NOT NULL,
      longitude DECIMAL(10, 6) NOT NULL,
    );
    

    You can then use joins to get the data you want.

    For example:

    SELECT
      r1.country,
      r1.region_1,
      r2.region_2,
      il.latitude,
      il.longitude
    FROM ip_location il
    JOIN region_2 r2 ON r2.id = il.region_2
    JOIN region_1 r1 ON r1.id = r2.region_1
    WHERE il.ip = @someIp;
    

    The various primary key indexes should make these joins pretty efficient.

    It might be worth pre-loading the smaller tables into application memory so you don’t need to keep loading it off the database.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search