skip to Main Content

I compared Postgres and LMDB by inserting 1 Million entries into each which have a mix of unique ID’s and some array type values. In Postgres i used jsonb to store the array and in LMDB as attributes and multivalued attributes.

I ran this test on a Debian VM with 6GB RAM.

Postgres was a LOT faster than LMDB. Even when i did a search on the array type values by checking if a value existed in the array. Where the json column was not indexed and i was looking if a value existed in an json array.

Based on what i have read both of them used B-Tree’s.

So shouldn’t LMDB which is Memory mapped, be faster than Postgres, At least in some cases.

Here are the scripts i used to insert data into Postgres and MDB.

Postgres:

import psycopg2
import random
import string
import json
import time

connection = psycopg2.connect("dbname='name' user='user' host='localhost' password='test'")

cursor = connection.cursor()

count = 698595

ports = [80, 143, 3389, 22, 21, 8080, 443, 289, 636]

def get_random_ports():
    l_ports = list(ports)
    num_service = random.randrange(len(ports))
    result = []
    for i in range(num_port):
        l_i = random.randrange(len(l_ports))
        result.append(l_ports[l_i])
        l_services.pop(l_i)
    return result

def get_random_string():
    stringLength = random.randrange(5, 15)
    letters = string.ascii_lowercase
    return ''.join(random.choice(letters) for _ in range(stringLength))

def show_progress(n):

    print "|" + ("".join("." for _ in range(n))) + ("".join(" " for _ in range(99 - n))) + "|", "r" if n < 100 else "n",


start_time = time.time()
postgres_insert_query = """ INSERT INTO test (port, name) VALUES (%s::jsonb, %s)"""
current_count = 0.0
while current_count < count:

    record_to_insert = (
        json.dumps({"services": get_random_services()}),
        get_random_name()
    )

    try:
        cursor.execute(postgres_insert_query, record_to_insert)
        connection.commit()
        current_count = current_count + 1
        show_progress(int((current_count / count) * 100))
    except Exception as err:
        print(err)
        connection.rollback()

connection.close()

print(str(time.time() - start_time))

MDB:

import uuid
import random
import string
import json
import time
import ldap
from ldap import modlist

connection = ldap.initialize('ldapi:///')
connection.simple_bind_s('cn=admin,dc=local', 'doc')

count = 1

ports = [80, 143, 3389, 22, 21, 8080, 443, 289, 636]

def get_random_ports():
    l_ports = list(ports)
    num_service = random.randrange(len(ports))
    result = []
    for i in range(num_port):
        l_i = random.randrange(len(l_ports))
        result.append(l_ports[l_i])
        l_services.pop(l_i)
    return result

def get_random_name():
    stringLength = random.randrange(5, 15)
    letters = string.ascii_lowercase
    return ''.join(random.choice(letters) for _ in range(stringLength))

start_time = time.time()
session_id = bytes(str(uuid.uuid4()), 'utf-8')
while count > 0:

    try:

        name = bytes(get_random_hostname(), 'utf-8')

        entry = ldap.modlist.addModlist(
            {
                "name": name,
                "port": get_random_services(),
                "objectClass": bytes('tmp', 'utf-8')
            }
        )

        connection.add_s('name=' + name + ",dc=local", entry)

        count = count - 1

    except Exception as err:
        pass

print(str(time.time() - start_time))

After inserting 1M entries each. i tried a basic search on name and ports. Didn’t need to run multiple searches as openldap didn’t return in 1s. If there’s some other information needed. please let me know.

2

Answers


  1. In testing I did several years ago, on a properly tuned/configured OpenLDAP instance, with a 5 million entry database, the performance for back-mdb was approximately 61,000 reads/second. And it’s certainly possible to get much higher performance than that, I was trying to go for something that was particularly exhaustive.

    https://mishikal.wordpress.com/2013/05/16/openldap-a-comparison-of-back-mdb-and-back-hdb-performance/

    Login or Signup to reply.
  2. I know I’m answering a very old question here, but I’m answering for people who might come across this in the future.

    So, the root of the issue here, is that while you might be using LMDB as the backing store for your OpenLDAP instance, that doesn’t mean that you’re getting the same performance as if you were using LMDB directly.
    OpenLDAP isn’t designed to store arbitrary data, it’s designed to store the organisational directory and authentication information.

    To do a direct comparison you’d need to use something like this library https://lmdb.readthedocs.io/en/release/

    Also, the code you’ve posted doesn’t work (you’re calling get_random_services() which doesn’t exist, but I think you meant to call get_random_ports() which is defined in your code), and wouldn’t be an accurate way to evaluate either case, as you’re blindly silencing exceptions that could be telling you about timeouts, or connection issues.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search