Redis - Is there a way to write an operator in apache beam which runs differently in different runner?

gagansingh
January 22, 2022
194 views
0 votes
2 Answers

For example, lets say I want to enrich a collection with more values from a lookup table, by joining on a key.
In the spark runner, I would prefer to do a broadcast join for this operator where as in the flink runner, I would like to make rpc calls (say to redis) to load the values based on the key.

So is there a way to achieve this? Same logical semantics but different execution based on the runner.

Tags: apache-beam

Answers

- AlexeyRomanenko
- January 24, 2022 at 6:43 pm
- 0 votes
0
In any case, an internal Beam implementation for different transforms depends on runner but if you need it for your own transforms than you can use PipelineOptions to get a runner name and decide which code path to take.

Login or Signup to reply.

- KennKnowles
- January 25, 2022 at 10:09 pm
- 0 votes
0
This is deliberately not part of Beam. The purpose of Beam is provide a portable programming model, so producing transforms that are not portable works against the goals of the project.

From within a DoFn you can run arbitrary code, so integrating with the storage system of your choice is easy. But be careful when you do this since getting correct exactly once behavior requires design that takes into account parallelism, retries, checkpointing, etc.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Redis – Is there a way to write an operator in apache beam which runs differently in different runner?

Answers