as we know there’s no sharing state mechanism in Flink at the moment, but I suppose we can achieve it. Suppose we have a Flink job (with a single input source) and we want to know what happened at the end of it, in order to adjust the job processing steps.
I have thought:
- Sinking a state into a broadcast source, then consuming it to update the state of functions
- Using external services to store and retrieve it:
- sink state to a db, and use an async function to retrieve it amid the job flow
- use state func to update/read from external services amid the job flow
- store state in a redis table and retrieve it amid the job flow
I think the first should be the more suitable, as other requires extra setup and extend the complexity to other systems.
What’s your opinion on those options?
Are there other ways?
Thanks
2
Answers
I used kafka. Whenever the state is changed, as a side output I sent it to Kafka sink, and other tasks which subscribed to same topic is being notified.
If you use Stateful Functions then it’s easy to send a message from the final processing step back to the upstream operator(s).
If you’re OK with potentially losing this state if it’s in-flight and your job restarts (so it’s a hint re adjusting job processing, versus a requirement), then you can use an IterativeStream to send it back upstream. That would remove the need for Kafka or some other external feedback system. See also How does Flink treat checkpoints and state within IterativeStream?