When there are updates done to an underlying topic of a GlobalKTable
, what is the logic for all instances of KStream
apps to get the latest data? Below are my follow up questions:
- Would the
GlobalKTable
be locked at record level or table level when the updates are happening? - According to this blog: Kafka GlobalKTable Latency Issue, can the latency go upto 0.5s?! If so, is there any alternative to reduce the latency?
- Since
GlobalKTable
uses RocksDB by default as the state store, are all features of RocksDB available to use?
I understand the GlobalKTable
should not be used for use-cases that require frequent updates to the lookup data. Is there any other key-value store that we can use for use-cases that might require updates on table data – Redis for example?
I could not find much documentation about GlobalKTable
and its internals. Is there any available documentations available?
2
Answers
The Javadocs for
KStream#join()
are pretty clear that joins against aGlobalKTable
only occur as records in the stream are processed. Therefore, to answer your question, there are no automatic updates that happen to the underlyingKStream
s: new messages would need to be processed in them in order for them to see the updates.If a
GlobalKTable
is materialized as a key-value store, most of the methods for iterating over and mutatingKeyValueStore
implementations use thesynchronized
keyword to prevent interference from multiple threads updating the state store concurrently.You might be able to reduce the latency by using an in-memory key-value store, or by using a custom state store implementation.
Interacting with state stores is controlled via a set of interfaces in Kafka Streams, for example
KeyValueStore
, so in this sense you’re not interacting directly with RocksDB APIs.GlobalKTables are updated async. Hence, there is no guarantee whatsoever when the different instances are updated.
Also, the "global thread" uses a dedicated "global consumer" that you can fine tune individually to reduce latency: https://docs.confluent.io/current/streams/developer-guide/config-streams.html#naming
RocksDB is integrated via JNI and the JNI interface does not expose all features of RocksDB. Furthermore, the "table" abstraction "hides" RocksDB to some extent. However, you can tune RocksDB via
rocksdb.config.setter
(https://docs.confluent.io/current/streams/developer-guide/config-streams.html#rocksdb-config-setter).