How does Thrift handle Zlib flush markers being split over multiple messages?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How does Thrift handle Zlib flush markers being split over multiple messages?



I have an application which has a c++ server and a c# client using Apache Thrift. I use TZlibTransport.cpp for zlib compression on the server, and a wrapper that uses Ionic.Zlib to decompress the data in the client, which works most of the time.



I noticed that in very specific situations the client would crash with one of the following errors:


Thrift.Protocol.TProtocolException: Missing version in readMessageBegin, old client?
at Thrift.Protocol.TBinaryProtocol.ReadMessageBegin()

Ionic.Zlib.ZlibException: Bad state (invalid block type)
at Ionic.Zlib.InflateManager.Inflate(FlushType flush)



I found that in all the cases where these errors are occurring, the server was sending two packages, one just over 1024 bytes (which is the size of the compressed write buffer that TZlibTransport.cpp uses), and one of 5-8 bytes. Looking at the data on the second package, I noticed that it was the flush marker that zlib uses, added twice,


ff ff 00 00 00 ff ff



with the first part of the first marker at the end of the previous package. If I increase the size of the buffer slightly, so that it has enough space to write the marker in one package, the crash does not occur, so I believe that it is this marker being added twice that is causing the problem. It however isn't a solution to just change this buffer size, as it will mean that the error occurs at some other place in the application.



I have looked into zlib, and found that this is expected behaviour if it is not given enough space in the buffer (https://github.com/madler/zlib/issues/149). I haven't however been able to find anybody that has come across this causing a problem with thrift.



My question therefore is whether it is expected that for specific data lengths thrift will split the marker over multiple packages, and how the client is supposed to handle this.




1 Answer
1



It looks like the problem is not that the marker was emitted twice, but rather simply that the first marker didn't entirely fit in the buffer. Had the output been just ff ff, you would have exactly the same problem and the same error message. ff cannot start a deflate stream, because it gives an invalid block type (3).


ff ff


ff



From your description it sounds like there is a bug in Thrift in that it does not assure and/or check that all of the compressed data actually fit in the buffer.





Thanks for the response, has helped me with understanding what is going wrong. In this case, should the first package be padded with zeros so that all of the marker is in the second package? Would this cause any problems with zlib?
– ldsrc
Aug 13 at 8:14





@ldsrc: Could you file a JIRA ticket and (if possible) add a pull request? That would be really awesome!
– JensG
Aug 13 at 12:19






@JensG I have filed a JIRA ticket for it, THRIFT-4620 .
– ldsrc
Aug 13 at 13:49





@ldsrc To assure that the current deflate block end on a byte boundary, an empty stored block is emitted, which is three zero bits, the resulting last byte filled out with zeros as needed to get to a byte boundary, and then the bytes 00 00 ff ff. For the inflate operation on a single packet to complete properly and be in a state for the bytes in the next block, all of that should fit in the previous packet. Padding with zeros is not a thing that inflate expects or will handle. Bottom line: the packet needs to be bigger, or the amount of data compressed needs to be less so that it fits.
– Mark Adler
Aug 13 at 16:59






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard