Producer side compression in apache kafka
Clash Royale CLAN TAG#URR8PPP
Producer side compression in apache kafka
I hve enabled snappy compression on producer side with a batch size of 64kb, and processing messages of 1 kb each and setting linger time to inf, does this mean till i process 64 messages, producer wont send the messages to kafka out topic...
In other words, will producer send each message to kafka or wait for 64 messages and send them in a single batch...
Cause the offsets are increasing one by one rather than in the multiple of 64
Edit - using flink-kafka connectors
2 Answers
2
Messages are batched by producer so that the network usage is minimized not to be written "as a batch" into Kafka's commitlog. What you are seeing is correctly done by Kafka as each message needs to be accounted for i.e. identified key / partition relationship, appended to the commitlog and then offset is incremented. Unless the first two steps are done, offset is not incremented.
Also there is data replication to be taken care of based on configurations as well as message tracking systems get updated for each message received (to support lag apis).
Also do note, the batch.size parameter considers ready to ship message's size, which has been pre-processed as 1. compressed 2. serialized by your favorite serializer.
we clearly observe that msgs came all of a sudden as a batch in topic (while observing through console-consumer). Serialization & compression maybe changing data size impacting batch processing. Try adding more messages ~100 or so and send.
– AbhishekN
Aug 13 at 23:29
I sent over a 100,000 messages, each message being 1kb in size and batch size being 128kb, still we observed the messages coming one by one when we were using flinkkafka connectors..... While we observed the messages to come in a batch while using native kafka connectors... So is this a bug in flinkkafka connectors?
– Urjit Patel
Aug 14 at 5:28
Missed to note that you are using Flink-Kafka, flink internally will give records one by one to executors (defined in parallelism). This is an intended behavior, if you define a batch window of e.g. 5 sec, all records will be buffered internally by Flink and will be provided to executors one by one for processing. Not sure what exactly you wanted to observe. I have created a new chat room, pls ping there if need to discuss further chat.stackoverflow.com/rooms/info/178030/apache-kafka
– AbhishekN
Aug 14 at 17:30
batch.size default is 16384
and batch.size of 0
will disable batching, from the docs
16384
0
The producer will attempt to batch records together into fewer requests whenever multiple records are being sent to the same partition. This helps performance on both the client and the server. This configuration controls the default batch size in bytes.
No attempt will be made to batch records larger than this size.
Requests sent to brokers will contain multiple batches, one for each partition with data available to be sent.
To be clear Batch consists of multiple requests and each request consists of messages being sent to same partition and final note No attempt will be made to batch records larger than this size
linger.ms default value is 0
which is disabled
0
This setting gives the upper bound on the delay for batching: once we get batch.size worth of records for a partition it will be sent immediately regardless of this setting, however if we have fewer than this many bytes accumulated for this partition we will 'linger' for the specified time waiting for more records to show up. This setting defaults to 0
so from the document if the request have enough batch size the request will sent immediately regardless of linger.ms
linger.ms
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Atleast in the kafka out topic we should observe the messages to come in batches... But we are not observing this behaviour... We are receiving the messages one by one... EDIT - i am using flinkkafka connectors
– Urjit Patel
Aug 13 at 6:32