I have 3 billion IPs stored in the following ip_location table.
CREATE TABLE ip_location (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
ip VARCHAR(15) NOT NULL,
country VARCHAR(2),
region_1 VARCHAR(255),
region_2 VARCHAR(255),
latitude DECIMAL(10, 6) NOT NULL,
longitude DECIMAL(10, 6) NOT NULL,
INDEX idx_ip (ip)
);
// example json data
{
"ip": "1.2.3.4",
"country": "US",
"region_1": "California",
"region_2": "San Francisco",
"latitude": 37.769700,
"longitude": -122.393300
}
The service we’re launching is popular with people all over the world.
I want to limit some of our APIs to a certain regions.
To check the user’s access location, checking the IP in the database every time the user calls
the API increases the burden on the DB and network.
Therefore I want to preload the IP data into memory to reduce the load.
Could you please suggest an efficient way to store and access this type of data?
I am going to load all data to Redis but I think it requires a lot of RAM, am I wrong?
2
Answers
You can store IP addresses yourself, but what seems to be a better idea is to store only recently used IP addresses in your Redis, maybe IP addresses that you’ve got a request from in the latest 30 minutes and regularly emptying IP addresses that were not being sending requests from for a while.
You can query for the country of an IP address via
Try it yourself with your IP.
So the flow would be:
You need to normalize this table better, and use better data types. This will keep the size of the data down.
binary(4)
otherwise usebinary(16)
.id
column, it’s not necessary as theip
column should be unique, and it’s small. Surrogate keys are normally only necessary when the natural key is wide or changes often.You can then use joins to get the data you want.
For example:
The various primary key indexes should make these joins pretty efficient.
It might be worth pre-loading the smaller tables into application memory so you don’t need to keep loading it off the database.