I am looking to build a scalable real-time chat app (I am just doing this for fun and out of interest so please don’t ask why!) and I know that I am going to be handling the realtime messaging part through redis but I am not sure of what database to use for the following information:
- User Relationships (friends)
- Cold Chat History – this would only be queried in limited amounts (maybe like 50 messages) ordered by timestamp and queried in reverse (just as your messages would load in imessage or whatsapp when scrolling to view older messages)
- Chat user relationship
I know for cold chat history an RDBMS or Cassandra is probably my best bet but handling friend relationships, as well as user-to-chat relationships in an RDBMS or cassandra, is ugly. I’m not sure if it’s necessary, worth it, or even "right" to have a graph database in my tech stack just for this relationship mapping.
I was thinking of MongoDB or some other document-based storage could be a solution but querying the data seems like it would be really taxing. My thoughts were to have a chat document that has a list of users and then I would have several other documents with a list of message id’s pointing to message documents. These documents would be mapped back to the chatID. I’m sure you can see though, the time and resources to query a set of messages would be quite high. Maybe I’m just underestimating the power of MongoDB as I haven’t really used it. I would also be more easily able to handle the Chat User Relationship using documents as well as friendships by just storing user-ids in a list within the document.
I understand there is no perfect tool for the job but I would like someone’s thoughts and inputs on how to design the data storage.
Thank you in advance!
2
Answers
It’s best to keep relationship in realtional database.
I use PostgreSQL for such purposes in my chat applications.
For chat history and other events Cassandra is a good choice (I also use Cassandra). However it depends on your database size (records quantity). If you don’t need to keep tens of thousands historical messages for thousands users then using Cassandra will be an overkill. In this case you can also use PostgreSQL or another relational database.
In PostgreSQL you can optimize an access to history tables using partitioning.
If transaction volume is not high then you can go with Postgresql otherwise Cassandra is a good choice for all your mentioned requirements.
In Cassandra you should have multiple tables in de-normalized for low latency and high availability.
Data modeling is an area where a lot of discussions are required, anyways I tried to answer this as simple as possible.