I’m creating a small service where I poll around 100 accounts (in a Twitter-like service) frequently (every 5 seconds or so) to check for new messages, as the service doesn’t yet provide a streaming API (like Twitter actually does).
In my head, I have the architecture planned as queuing Ticker
s every 5 seconds for every user. Once the tick fires I make an API call to the service, check their messages, and call SELECT
to my Postgres database to get the specific user details and check the date of the most recent message, and if there are messages newer than that UPDATE
the entry and notify the user. Repeat ad nauseum.
I’m not very experienced in backend things and architecture, so I want to make sure this isn’t an absolutely absurd setup. Is the amount of calls to the database sensible? Am I abusing goroutines?
3
Answers
Let me answer given what you describe.
I understand the following. For each user, you create a tick every 5 seconds in one goroutine. Another goroutine consumes those ticks, performing the polling and comparing the date of the last message with the date you have recorded in your PostgreSQL database.
The answer is: it depends. How many users do you have and how many can your application support? In my experience the best way to answer this question is to measure performance of your application.
It depends. To give you some reassurance, I have seen a single PostgreSQL database take hundreds of
SELECT
per second. I don’t see a design mistake, so benchmarking your application is the way to go.Do you mean like executing too many of them? I think it is unlikely that you are abusing goroutines that way. If there is a particular reason you think this could be the case, posting the corresponding code snippet could make your question more precise.
You can always go deeper with optimisations, in your case you need client throughput so you can use a bunch of well known optimisations like switching to a reactive model, add some cache server, spread the load on multiple DB slaves, …
You should test your solution at scale, if it fits your needs in term of user throughput and server cost, then your solution is the right one.
Your proposed solution: 1 query in every 5 seconds for every user. Having 100 users this is:
This is not considered a big load if the queries are fast.
But why do you need to do this for every users separately? If you need to pick up updates in the granularity of 5 seconds, you could just execute 1 query in every 5 seconds which does not filter by user but checks for updates from all the users.
If the above query gives results, you can iterate over the results and do the necessary for each user that had updates in the last 5 seconds. This results in:
Which is a hundred times less queries, still getting you all the updates in the same time granularity.
If the task to be performed for the updates is long or depends on external systems (e.g. a call to another server), you may perform those tasks in separate goroutines. You may choose to either launch a new goroutine for each task, or you may have a pool of worker goroutines which consume these queued tasks, and just queue the task (using channels).