skip to Main Content

After this comment to one of my questions, I’m thinking if it is better using one database with X schemas or vice versa.

I’m developing a web application where, when people register, I create (actually) a database (no, it’s not a social network: everyone must have access to his own data and never see the data of the other user). That’s the way I used for the previous version of my application (that is still running on MySQL): through the Plesk API, for every registration, I do:

  1. Create a database user with limited privileges;
  2. Create a database that can be accessed just by the previous created user and the superuser (for maintenance)
  3. Populate the database

Now, I’ll need to do the same with PostgreSQL (the project is getting mature and MySQL don’t fulfil all the needs). I need to have all the databases/schemas backups independent: pg_dump works perfectly in both ways, and the same for the users that can be configured to access just one schema or one database.

So, assuming you are more experienced PostgreSQL users than me, what do you think is the best solution for my situation, and why? Will there be performance differences using $x database instead of $x schemas? And what solution will be better to maintain in the future (reliability)? All of my databases/schemas will always have the same structure!

For the backups issue (using pg_dump), is maybe better using one database and many schemas, dumping all the schemas at once: recovering will be quite simple loading the main dump in a development machine and then dump and restore just the schema needed: there is one additional step, but dumping all the schema seem faster than dumping them one by one.

UPDATE 2012

Well, the application structure and design changed so much during those last two years. I’m still using the "one db with many schemas" -approach, but still, I have one database for each version of my application:

Db myapp_01
    _ my_customer_foo_schema
    _ my_customer_bar_schema
Db myapp_02
    _ my_customer_foo_schema
    _ my_customer_bar_schema

For backups, I’m dumping each database regularly, and then moving the backups on the development server. I’m also using the PITR/WAL backup but, as I said before, it’s not likely I’ll have to restore all database at once. So it will probably be dismissed this year (in my situation is not the best approach).

The one-db-many-schema approach worked very well for me since now, even if the application structure is totally changed. I almost forgot: all of my databases/schemas will always have the same structure! Now, every schema has its own structure that change dynamically reacting to users data flow.

8

Answers


  1. Chosen as BEST ANSWER

    Definitely, I'll go for the one-db-many-schemas approach. This allows me to dump all the database, but restore just one very easily, in many ways:

    1. Dump the db (all the schema), load the dump in a new db, dump just the schema I need, and restore back in the main db.
    2. Dump the schema separately, one by one (but I think the machine will suffer more this way - and I'm expecting like 500 schemas!)

    Otherwise, googling around I've seen that there is no auto-procedure to duplicate a schema (using one as a template), but many suggest this way:

    1. Create a template-schema
    2. When need to duplicate, rename it with new name
    3. Dump it
    4. Rename it back
    5. Restore the dump
    6. The magic is done.

    I've written two rows in Python to do that; I hope they can help someone (in-2-seconds-written-code, donโ€™t use it in production):

    import os
    import sys
    import pg
    
    # Take the new schema name from the second cmd arguments (the first is the filename)
    newSchema = sys.argv[1]
    
    # Temperary folder for the dumps
    dumpFile = '/test/dumps/' + str(newSchema) + '.sql'
    
    # Settings
    db_name = 'db_name'
    db_user = 'db_user'
    db_pass = 'db_pass'
    schema_as_template = 'schema_name'
    
    # Connection
    pgConnect = pg.connect(dbname= db_name, host='localhost', user= db_user, passwd= db_pass)
    
    # Rename schema with the new name
    pgConnect.query("ALTER SCHEMA " + schema_as_template + " RENAME TO " + str(newSchema))
    
    # Dump it
    command = 'export PGPASSWORD="' + db_pass + '" && pg_dump -U ' + db_user + ' -n ' + str(newSchema) + ' ' + db_name + ' > ' + dumpFile
    os.system(command)
    
    # Rename back with its default name
    pgConnect.query("ALTER SCHEMA " + str(newSchema) + " RENAME TO " + schema_as_template)
    
    # Restore the previous dump to create the new schema
    restore = 'export PGPASSWORD="' + db_pass + '" && psql -U ' + db_user + ' -d ' + db_name + ' < ' + dumpFile
    os.system(restore)
    
    # Want to delete the dump file?
    os.remove(dumpFile)
    
    # Close connection
    pgConnect.close()
    

  2. I would say, go with multiple databases AND multiple schemas ๐Ÿ™‚

    Schemas in PostgreSQL are a lot like packages in Oracle, in case you are familiar with those. Databases are meant to differentiate between entire sets of data, while schemas are more like data entities.

    For instance, you could have one database for an entire application with the schemas “UserManagement”, “LongTermStorage” and so on. “UserManagement” would then contain the “User” table, as well as all stored procedures, triggers, sequences, etc. that are needed for the user management.

    Databases are entire programs, schemas are components.

    Login or Signup to reply.
  3. A number of schemas should be more lightweight than a number of databases, although I cannot find a reference which confirms this.

    But if you really want to keep things very separate (instead of refactoring the web application so that a “customer” column is added to your tables), you may still want to use separate databases: I assert that you can more easily make restores of a particular customer’s database this way — without disturbing the other customers.

    Login or Signup to reply.
  4. A PostgreSQL “schema” is roughly the same as a MySQL “database”. Having many databases on a PostgreSQL installation can get problematic; having many schemas will work with no trouble. So you definitely want to go with one database and multiple schemas within that database.

    Login or Signup to reply.
  5. It depends on how the availability and connectivity of your system is designed. What are the data that are stored in these Databases.If they are linked data, there they can be kept on single DB instance but if they are partially linked and can run partially if one system is down then it must be on different instances.

    Detailed explanation:-

    1) When you use one DB instance and in that you use multiple databases, then you are caught up with the issue that if your connection goes down(due to system crash or mysql server is down),all Databases as they are on same instance are also down, so all your applications are impacted.

    2) When you separate DB instance for each Database,then if any one Database system is down,your other applications doesn’t have impact.So other application can run only the application which depends on down DB is impacted.

    Also,in both the cases i think you must also use replication mechanism so that load balancing can be done on slave Databases.

    Login or Signup to reply.
  6. In a PostgreSQL context I recommend to use one db with multiple schemas, as you can (e.g.) UNION ALL across schemas, but not across databases. For that reason, a database is really completely insulated from another database while schemas are not insulated from other schemas within the same database.

    If you -for some reason- have to consolidate data across schemas in the future, it will be easy to do this over multiple schemas. With multiple databases you would need multiple db-connections and collect and merge the data from each database “manually” by application logic.

    The latter have advantages in some cases, but for the major part I think the one-database-multiple-schemas approach is more useful.

    Login or Signup to reply.
  7. I would recommend against accepted answer – multiple databases instead of multiple schemas for this set of reasons:

    1. If you are running microservices, you want to enforce the inability to join between your "schemas", so the data is not entangled and developers won’t end up joining other microservice’s schema and wonder why when other team makes a change their stuff no longer works.
    2. You can later migrate to a separate database machine if your load requires with ease.
    3. If you need to have a high-availability and/or replication set up, it’s better to have separate databases completely independent of each other. You cannot replicate one schema only compared to the whole database.
    Login or Signup to reply.
  8. Working with single Database with multiple Schemas is good way to
    practice in postgres database because:

    1. No any data is shared across databases in postgres.
    2. any given connection to the server can access only the data in the single database, the one specified in the connection request.

    With using multiple schemas:

    1. To allow many users to use one database without interfering with eachother.
    2. To organize database objects into logical groups to make them more manageable.
    3. Third party applications can be put into separate schemas so they cannot collide with the names of other objects.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search