skip to Main Content

When there are updates done to an underlying topic of a GlobalKTable, what is the logic for all instances of KStream apps to get the latest data? Below are my follow up questions:

  1. Would the GlobalKTable be locked at record level or table level when the updates are happening?
  2. According to this blog: Kafka GlobalKTable Latency Issue, can the latency go upto 0.5s?! If so, is there any alternative to reduce the latency?
  3. Since GlobalKTable uses RocksDB by default as the state store, are all features of RocksDB available to use?

I understand the GlobalKTable should not be used for use-cases that require frequent updates to the lookup data. Is there any other key-value store that we can use for use-cases that might require updates on table data – Redis for example?

I could not find much documentation about GlobalKTable and its internals. Is there any available documentations available?

2

Answers


  1. The Javadocs for KStream#join() are pretty clear that joins against a GlobalKTable only occur as records in the stream are processed. Therefore, to answer your question, there are no automatic updates that happen to the underlying KStreams: new messages would need to be processed in them in order for them to see the updates.

    “Table lookup join” means, that results are only computed if KStream
    records are processed. This is done by performing a lookup for
    matching records in the current internal GlobalKTable state. In
    contrast, processing GlobalKTable input records will only update the
    internal GlobalKTable state and will not produce any result records.

    1. If a GlobalKTable is materialized as a key-value store, most of the methods for iterating over and mutating KeyValueStore implementations use the synchronized keyword to prevent interference from multiple threads updating the state store concurrently.

    2. You might be able to reduce the latency by using an in-memory key-value store, or by using a custom state store implementation.

    3. Interacting with state stores is controlled via a set of interfaces in Kafka Streams, for example KeyValueStore, so in this sense you’re not interacting directly with RocksDB APIs.

    Login or Signup to reply.
  2. GlobalKTables are updated async. Hence, there is no guarantee whatsoever when the different instances are updated.

    Also, the "global thread" uses a dedicated "global consumer" that you can fine tune individually to reduce latency: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#naming

    RocksDB is integrated via JNI and the JNI interface does not expose all features of RocksDB. Furthermore, the "table" abstraction "hides" RocksDB to some extent. However, you can tune RocksDB via rocksdb.config.setter (https://docs.confluent.io/current/streams/developer-guide/config-streams.html#rocksdb-config-setter).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search