skip to Main Content

A team is developing a stateless docker container with the Cassandra database to be run under Kubernetes, with all data and metadata files shipped inside the container, so putting the database into a read-only mode would be ideal. The app to be connected with this database is an infrequently updated feature store.

How to approximate read-only mode as closely as possible, specifically in case of Cassandra or perhaps even in general (if some actions undertaken here are in common)?



  1. Chosen as BEST ANSWER

    Even under view-only user a lot of write attempts are still going on, e.g. to commit log files and system keyspace tables (even under normal conditions such as orderly stops and starts of pod replicas that an horizontal pod auto-scaler can initiate under k8s - not just when an eviction from a node occurs in abnormal conditions).

    So first, these are config file settings that can theoretically prevent container crashes during failed write attempts to a write-protected location inside the container:

    # disk_failure_policy: stop
    disk_failure_policy: ignore
    # commit_failure_policy: stop
    commit_failure_policy: ignore

    You can also point non-data folders where Cassandra writes to an universally writable location. The complete list of these folders can be found in the config file, but most of them are better left as they are and where they are (after being changed to be owned by nobody and writable by anybody), unless you want to re-create the entire original folder structure in the new location. Bitnami Cassandra containers let you change easily (with a dedicated env variable) the location of only one such folder - the one with commit logs, so this is a safe bet (tested by me to have no impact on Cassandra even after its contents get removed by each container restart):

    commitlog_directory: /tmp # or use CASSANDRA_COMMITLOG_DIR env var for bitnami/cassandra:latest container

    Note: in case of the rootless bitnami/cassandra containers Cassandra config file is located at /opt/bitnami/cassandra/conf/cassandra.yaml and its customized version can be substituted (but not simply copied at build time into the container: it has to be mounted at deployment time as a ConfigMap to have any effect).

    The way to deal with the unwanted system keyspace writes is similar: point the data folder location to a folder writable by any user (such as /tmp), (pre-populated with data files at build time and then never mapped outside of the container at run time). If you can't move the entire data folder and its contents to /tmp (or you suspect that it may not prevent all write attempts to the original location, despite config changes made to the data_file_directories config file key), keep the data in the original data folder, but change its ownership to nobody and make it writable by all users (at container build time).

    Committing all cached data fully to disk by either calling nodetool flush or performing an orderly database shutdown (by scaling down the Cassandra pod to 0 replicas) will minimize disk operations at subsequent container startup (but will not eliminate writes at startup/shutdown completely).

    For application stability I'd also recommend monitoring how much (if any) data gets written by Cassandra inside the container under production loads (even if they are restricted to read-only queries made by the client app).

  2. Enable both authentication and authorization in your Cassandra image with:

    authenticator: PasswordAuthenticator
    authorizer: CassandraAuthorizer

    Then provision a new role with only view permissions to the keyspace/tables you want them to access, for example:

    CREATE ROLE readonlyrole WITH LOGIN = false AND PASSWORD 'Som3Pa$$word';
    GRANT SELECT ON KEYSPACE appks TO readonlyrole;

    This role will not be able to login and will only have read access to the app keyspace. Cheers!

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top