TensorFlow Serving frequent request timeouts

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



TensorFlow Serving frequent request timeouts



The problem we encounter is the following. Serving is configured to load and serve 7 models, and with an increase in the number of models, Serving requests timeout more frequently. On the contrary, with a decrease in the number of models request timeouts are insignificant. From the client's side, timeout was configured to 5 seconds.



Interestedly, the maximum batch processing duration is approximately 700ms, with a configured maximum batch size of 10. The average batch processing duration is ~60ms.



We've checked the TensorFlow Serving logs but no warnings nor errors were found. In addition to, we've monitored the network of the running GPU machines and hosts executing inference requests towards Serving, but no network issues were identified neither.



Decreasing the number of loaded and served models, however not the expected solution because this requires setting up multiple distinct GPU instance each loading and serving only a subset of models.



OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
TensorFlow Serving installed from (source or binary): source
TensorFlow Serving version: 1.9
TensorFlow serving runs on multiple AWS g2.2xlarge instances. We run TensorFlow Serving using Docker, with a base image nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04


nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04



What could be the route cause of such a behaviour? How is Serving expected to handle requests when having multiple models loaded in-memory? How does it change the model context?









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard