I am working on Social networking application with at least 0.2M users. On the application user can share anything from third parties as well as user can upload own media as post. There are different types of privacy
- user privacy
- user can be public
- user can be private
Any content shared or uploaded by the user will be in a box, and box also has different types of privacy
- public box (Everyone can see the content of this box if you are public)
- friend only box (Only your followers can see the content of this box)
- private (Only you can see the content of this box)
Now the problem is that I have large data set. So when a user change his/her account privacy from public to private or private to public I have to update all the data according to privacy. Also, user can change the privacy of the box too.
So I need to update the user all shared posts of this box accordingly. But most of the time I failed to update due to framework and also technologies that I am using
Technologies that I am using
- Lumen (PHP) microservice architecture
- MySQL
- Elasticsearch (For retrieval with joins)
- Redis & Memcached
- Postgres
When user shares anything on the platform the shared data is stored in the database and also data inserted in elasticsearch so all the data retrieved from elasticsearch with PHP client.
Now I want to define the architecture like Instagram that whenever user change account privacy or box privacy I have to change the content according to both privacies.
I read different types of articles but didn’t get any close idea for this. Kindly suggest any helpful article or idea to me.
2
Answers
I would agree with @KolovosKonstantinos, you should try to model your application data to avoid updates of large data sets.
Also it might be interesting for you to check our concept of Embedded vs Referenced Documents. Here are couple of nice posts on this topic :
I would suggest trying following approach :
So you data might look like as following :
Handling a large dataset with complex privacy settings can indeed be a challenging task. It’s great that you’re considering an architecture similar to Instagram’s, as they have likely faced similar challenges at scale. Here are some suggestions and ideas that might help you:
Use Asynchronous Processing:
Batch Processing:
Event-Driven Architecture:
Denormalization:
Caching:
Elasticsearch Indexing Strategy:
Data Sharding:
Monitoring and Optimization:
Documentation and Testing:
I don’t have a specific article to point you to, but exploring articles on large-scale system design, microservices, and asynchronous processing could provide valuable insights. Additionally, Instagram’s engineering blog might have articles discussing some of the challenges they faced and how they addressed them.
Remember to always test your updates in a staging environment before applying them to your production environment, especially when dealing with critical privacy-related data.