I compared Postgres and LMDB by inserting 1 Million entries into each which have a mix of unique ID’s and some array type values. In Postgres i used jsonb to store the array and in LMDB as attributes and multivalued attributes.
I ran this test on a Debian VM with 6GB RAM.
Postgres was a LOT faster than LMDB. Even when i did a search on the array type values by checking if a value existed in the array. Where the json column was not indexed and i was looking if a value existed in an json array.
Based on what i have read both of them used B-Tree’s.
So shouldn’t LMDB which is Memory mapped, be faster than Postgres, At least in some cases.
Here are the scripts i used to insert data into Postgres and MDB.
Postgres:
import psycopg2
import random
import string
import json
import time
connection = psycopg2.connect("dbname='name' user='user' host='localhost' password='test'")
cursor = connection.cursor()
count = 698595
ports = [80, 143, 3389, 22, 21, 8080, 443, 289, 636]
def get_random_ports():
l_ports = list(ports)
num_service = random.randrange(len(ports))
result = []
for i in range(num_port):
l_i = random.randrange(len(l_ports))
result.append(l_ports[l_i])
l_services.pop(l_i)
return result
def get_random_string():
stringLength = random.randrange(5, 15)
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for _ in range(stringLength))
def show_progress(n):
print "|" + ("".join("." for _ in range(n))) + ("".join(" " for _ in range(99 - n))) + "|", "r" if n < 100 else "n",
start_time = time.time()
postgres_insert_query = """ INSERT INTO test (port, name) VALUES (%s::jsonb, %s)"""
current_count = 0.0
while current_count < count:
record_to_insert = (
json.dumps({"services": get_random_services()}),
get_random_name()
)
try:
cursor.execute(postgres_insert_query, record_to_insert)
connection.commit()
current_count = current_count + 1
show_progress(int((current_count / count) * 100))
except Exception as err:
print(err)
connection.rollback()
connection.close()
print(str(time.time() - start_time))
MDB:
import uuid
import random
import string
import json
import time
import ldap
from ldap import modlist
connection = ldap.initialize('ldapi:///')
connection.simple_bind_s('cn=admin,dc=local', 'doc')
count = 1
ports = [80, 143, 3389, 22, 21, 8080, 443, 289, 636]
def get_random_ports():
l_ports = list(ports)
num_service = random.randrange(len(ports))
result = []
for i in range(num_port):
l_i = random.randrange(len(l_ports))
result.append(l_ports[l_i])
l_services.pop(l_i)
return result
def get_random_name():
stringLength = random.randrange(5, 15)
letters = string.ascii_lowercase
return ''.join(random.choice(letters) for _ in range(stringLength))
start_time = time.time()
session_id = bytes(str(uuid.uuid4()), 'utf-8')
while count > 0:
try:
name = bytes(get_random_hostname(), 'utf-8')
entry = ldap.modlist.addModlist(
{
"name": name,
"port": get_random_services(),
"objectClass": bytes('tmp', 'utf-8')
}
)
connection.add_s('name=' + name + ",dc=local", entry)
count = count - 1
except Exception as err:
pass
print(str(time.time() - start_time))
After inserting 1M entries each. i tried a basic search on name and ports. Didn’t need to run multiple searches as openldap didn’t return in 1s. If there’s some other information needed. please let me know.
2
Answers
In testing I did several years ago, on a properly tuned/configured OpenLDAP instance, with a 5 million entry database, the performance for back-mdb was approximately 61,000 reads/second. And it’s certainly possible to get much higher performance than that, I was trying to go for something that was particularly exhaustive.
https://mishikal.wordpress.com/2013/05/16/openldap-a-comparison-of-back-mdb-and-back-hdb-performance/
I know I’m answering a very old question here, but I’m answering for people who might come across this in the future.
So, the root of the issue here, is that while you might be using LMDB as the backing store for your OpenLDAP instance, that doesn’t mean that you’re getting the same performance as if you were using LMDB directly.
OpenLDAP isn’t designed to store arbitrary data, it’s designed to store the organisational directory and authentication information.
To do a direct comparison you’d need to use something like this library https://lmdb.readthedocs.io/en/release/
Also, the code you’ve posted doesn’t work (you’re calling
get_random_services()
which doesn’t exist, but I think you meant to callget_random_ports()
which is defined in your code), and wouldn’t be an accurate way to evaluate either case, as you’re blindly silencing exceptions that could be telling you about timeouts, or connection issues.