Python Pandas hanging in Docker container for small volume of data

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Python Pandas hanging in Docker container for small volume of data



I am trying load the data worth of 44 mb in to a Pandas data frame. this is the code:


import pandas as pd
from sqlalchemy import create_engine
import cx_Oracle
import sqlalchemy

engineor = create_engine('oracle+cx_oracle://xxxxx:xxx@xxx:1521/?service_name=XXX')
sql = "select * from accounts where date >= '10-MAY-18 06.00.16.170000000 PM'"
do = pd.read_sql(sql, engineor)
do.info(memory_usage='deep')



The above query returns around 70k rows and the size is around 44 mb.



When I run this from my local machine (Win 7) in Anaconda, the data loads in to data frame without any issues in a minute or two. However, when I run the same thing in Docker container (Linux based) it just hangs.



I verified that docker container has sufficient memory, and memory doesn't grow over time (although the size is quite small ~44 mb). It just gets submitted and hangs indefinitely that I am unable to kill it by pressing control + c or control + z. I need to disconnect from the machine and login back.



I tried to match the version of Pandas from Anaconda that I am running on local machine. But it didn't help much, it is still hanging. The only thing that is now differing between my local machine and the Python version. In docker container it is 3.5.3 and in my local version it is 3.6.3, and that Anaconda is running from Windows and docker container is Linux based. I am not sure if these things make any difference.



Any suggestions on how to overcome this?





I suspect your container is unable to access the Oracle database. You could try getting the logs for the container (identify the container ID using docker [ps|container] ls and then docker logs [ID]. If your container image includes a shell, you could try accessing the shell and pinging the DB: docker exec --interactive --tty [ID] bash (if not bash try sh and ash). I suspect, because you're using a Windows host that you're having to run Docker on a VM and thus the host networking (and your DNS names) aren't being extended to your container.
– DazWilkin
Aug 10 at 15:05


docker [ps|container] ls


docker logs [ID]


docker exec --interactive --tty [ID] bash


bash


sh


ash





I can confirm that there are no such issues. It is able to load the data without any issues if the volume is small (100s or up to few thousands) consistently. It runs in to this issue when the volume happens to be big. Another inconsistent behavior is that.. when I tried to load 30k rows, it works sometimes, and it hangs most of the times.
– CuriP
Aug 10 at 15:14





Are you running Docker in a VM on a Windows machine? It's possible that resources are being constrained by the VM and/or there are networking problems with that configuration. I'm unfamiliar with Docker on Windows. Were you able to see anything in the logs?
– DazWilkin
Aug 10 at 15:21





It is on a Linux machine,and it is not a VM. The Python log is just stuck in step one without any updates. I tried running the above code in command line mode too because logs were not helpful, but as I mentioned earlier, it just hangs up without being able to come out using Control C or Control Z. It is more or less similar to this issue (except my is read_sql).. stackoverflow.com/questions/48430886/… .. I am not sure how to incorporate the workaround that he has suggested.
– CuriP
Aug 10 at 15:34






Unsure. You may want to try chunking the reads (see pandas.pydata.org/pandas-docs/stable/generated/…).
– DazWilkin
Aug 10 at 15:47









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

Creating a leaderboard in HTML/JS