MPI4PY shared memory - memory usage spike on access

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



MPI4PY shared memory - memory usage spike on access



I'm using shared memory to share a large numpy array (write-once, read-many) with mpi4py, utilising shared windows. I am finding that I can set up the shared array without problem, however if I try to access the array on any process which is not the lead process, then my memory usages spikes beyond reasonable limits. I have a simple code snippet which illustrates the application here:


from mpi4py import MPI
import numpy as np
import time
import sys

shared_comm = MPI.COMM_WORLD

is_leader = shared_comm.rank == 0

# Set up a large array as example
_nModes = 45
_nSamples = 512*5

float_size = MPI.DOUBLE.Get_size()

size = (_nModes, _nSamples, _nSamples)
if is_leader:
total_size = np.prod(size)
nbytes = total_size * float_size
else:
nbytes = 0

# Create the shared memory, or get a handle based on shared communicator
win = MPI.Win.Allocate_shared(nbytes, float_size, comm=shared_comm)
# Construct the array
buf, itemsize = win.Shared_query(0)
_storedZModes = np.ndarray(buffer=buf, dtype='d', shape=size)

# Fill the shared array with only the leader rank
if is_leader:
_storedZModes[...] = np.ones(size)

shared_comm.Barrier()

# Access the array - if we don't do this, then memory usage is as expected. If I do this, then I find that memory usage goes up to twice the size, as if it's copying the array on access
if shared_comm.rank == 1:
# Do a (bad) explicit sum to make clear it is not a copy problem within numpy sum()
SUM = 0.
for i in range(_nModes):
for j in range(_nSamples):
for k in range(_nSamples):
SUM = SUM + _storedZModes[i,j,k]

# Wait for a while to make sure slurm notices any issues before finishing
time.sleep(500)



With the above set up, the shared array should take about 2.3GB, which is confirmed when running the code and querying it. If I submit to a queue via slurm on 4 cores on a single node, with 0.75GB per process it runs ok only if I don't do the sum. However, if if do the sum (as shown, or using np.sum or similar), then slurm complains that the memory usage is exceeded. This does not happen if the leader rank does the sum.



With 0.75GB per process, the total memory allocated is 3GB, which gives about 0.6GB for everything else other than the shared array. This should clearly be plenty.



It seems that accessing the memory on any process other than the leader is copying the memory, which is clearly useless. Have I done something wrong?









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard