skip to Main Content

May I confirm with you, when we have an Event Hub with multiple partitions,
is there any possibility that you can have more than one instances of Azure Function that process the event on same partition?

For example (assuming we take in event batches):

  1. Event batch 1 arrives and triggers the Azure Function.
  2. Then, when the Azure Function for previous batch still processing, the Event Batch 2 arrives and trigger another instance of Azure Function to process Batch 2, even the previous batch have not finished yet

Is scenario above possible? Or is it confirmed that only one instance of Azure Function per partition allowed? Means, only if the batch 1 processing is completed, then invocation of next batch Azure Function can be done.

Some thread said it is possible for single partition to have parallel invocations.text

I also saw this article text since I want to ensure sequential order processing within one Azure Event Hub partition. However I am not convinced if the order can be preserved when 2 Azure Function instances on different batches can run within same partition, since there may be possibility of race condition that will mess up the order.

Hope you can clarify the concept here. Thank you!

2

Answers


  1. From Jeff Hollan’s article you linked:

    The processor host will automatically create leases on available partitions based on how many other hosts are provisioned. There is another important constraint that works in our benefit in this case: a single partition will only have a lease on one processor host at a time. That means multiple function instances can’t pull messages from the same partition.

    The other answer you linked only applies if you have single-message triggers. Then you can have concurrency within the same instance. If you implement batch triggers, you will have guaranteed order.

    Edit: You’ll have guaranteed order regardless except for some edge cases, see Jesse’s answer for better details.

    [FunctionName("EventHubTrigger")]
    public static async Task RunAsync([EventHubTrigger("ordered", Connection = "EventHub")] EventData[] eventDataSet, TraceWriter log)
    {
        log.Info($"Triggered batch of size {eventDataSet.Length}");
        foreach (var eventData in eventDataSet) {
            try
            {
                await db.ListRightPushAsync("events:" + eventData.Properties["partitionKey"], (string)eventData.Properties["counter"]);
            }
            catch
            {
                // handle event exception
            }
        }
    }
    
    Login or Signup to reply.
  2. In short, yes, in one specific scenario, it is possible to see two instances process the same partition for a batch or two. The linked thread discussing parallel execution with single-dispatch is incorrect.

    Partitions and ownership

    Generally speaking, every partition can have only one active reader. However, it is possible to have two Function instances dispatching events for the same partition for a small period of overlap (1-2 batches) when instances are scaling and partition ownership rebalances.

    If the old owner is holding events in memory and dispatches them, it may not be aware that the partition has a new owner. The old owner becomes aware of the change the next time that it attempts to read or when the trigger’s load balancing loop ticks (30 seconds by default). At this point, the old owner is no longer permitted to read from the partition and will stop dispatching events after your code completes processing the current event/batch.

    Outside of that scenario, there will not be parallel processing for any partition. Whether using single-dispatch or multi-dispatch, each time your Function is invoked to perform processing, the trigger will wait for that invocation to complete (and checkpoint, if needed) before dispatching another event from that partition.

    Ordering and duplication

    With respect to ordering, the general guarantee is that the trigger will dispatch each event in the order that it appears in the partition. However, you may see rewinds where the trigger repositions to an earlier point in time and replays events. This occurs when scaling occurs and ownership rebalances, a host machine crashes or migrates, or when there’s an exception in the runtime that causes the trigger to be restarted.

    That will cause the trigger to position its reader at the last checkpoint written and to start dispatching from there. The events will be read in order from that point forward. To your application, ordering will be disrupted, and it will see events that it already processed.

    In short, its important that your application keep the Event Hubs "at least once" guarantee in mind and ensure that processing is idempotent and can handle rewinds.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search