Allocate and free memory in the loop (C + MPI)

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Allocate and free memory in the loop (C + MPI)



Please see my following code snippet (floatalloc2 is for allocating a 2D contiguous array with datatype float, see Appendix if interested):


#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

int main(int argc, char *argv)

float **p=NULL, **buffer=NULL;
int it, nt=3, i, j, k, NP, MYID, nx=1, nz=2, nsrc=3, isrc;

MPI_Init ( &argc, &argv );
MPI_Comm_size ( MPI_COMM_WORLD, &NP );
MPI_Comm_rank ( MPI_COMM_WORLD, &MYID );

p = floatalloc2(nx,nz);
memset(p[0],0,nz*nx*sizeof(float));

for (it=0; it<nt; it++)
for (isrc=MYID; isrc<nsrc; isrc+=NP)
for (j=0; j<nz; j++)
for (i=0; i<nx; i++)
p[j][i] += 1.5 + (float)(isrc) + (float)(j);





for (k=0;k<nsrc-1;k++)
if (MYID==k)
buffer = floatalloc2(nx,nz);
memset(buffer[0],0,nz*nx*sizeof(float));
buffer = p;

else
buffer = floatalloc2(nx,nz);
memset(buffer[0],0,nz*nx*sizeof(float));

MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&buffer[0][0],nx*nz,MPI_FLOAT,k,MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
for (j=0; j<nz; j++)
for (i=0; i<nx; i++)
printf("it=%d,k=%d,Node %d,p[%d][%d]=%fn",it,k,MYID,j,i,p[j][i]);


free(*buffer);free(buffer); /*w/o this line is ok while not ok with this line */




MPI_Finalize();
exit(0);



As a thumb of rule, C frees memory once it was allocated, even in the loop. But here, if I don't add free(*buffer);free(buffer);, it's quite good. However, if with the free, the results are wrong. So what's wrong in my code?


free(*buffer);free(buffer);


free



Appendix for floatalloc2:


/*@out@*/ void *sf_alloc (size_t n, size_t size )
/*< output-checking allocation >*/

void *ptr;

size *= n;

if (0>=size) sf_error("%s: illegal allocation (%d bytes)",__FILE__,size);

ptr = malloc (size);

if (NULL == ptr)
sf_error ("%s: cannot allocate %lu bytes:", __FILE__,size);

return ptr;



/*@out@*/ float *sf_floatalloc (size_t n)
/*< float allocation >*/

float *ptr;
ptr = (float*) sf_alloc (n,sizeof(float));
return ptr;



/*@out@*/ float **floatalloc2 (size_t n1 , size_t n2 )
/*< float 2-D allocation, out[0] points to a contiguous array >*/

size_t i2;
float **ptr;

ptr = (float**) sf_alloc (n2,sizeof(float*));
ptr[0] = sf_floatalloc (n1*n2);
for (i2=1; i2 < n2; i2++)
ptr[i2] = ptr[0]+i2*n1;

return ptr;





Doesn't the library that provides floatalloc2 not provide something like floatunalloc2? Most likely your free calls are not in correspondence with the memory allocations.
– Bathsheba
Aug 10 at 14:31



floatalloc2


floatunalloc2


free





@Bathsheba I've added an appendix for floatalloc2
– coco
Aug 10 at 14:36


floatalloc2





There must be a library function to free this for you. Even if you manage to replicate that, there's no guarantee it will work in subsequent versions of whatever library you're using.
– Bathsheba
Aug 10 at 14:38


free





Since you are using MPI, you are probably interested in high-performance computing. In that case, you should know that double indirection (float **p, i.e. p[row][column] referring to an element, with p and p[row] being pointers) is much slower than using a linear array with index arithmetic, i.e. float *all; with all[row*columns + column] referring to an element, on current architectures. This means that this approach will never be as efficient as a linear array would be.
– Nominal Animal
Aug 10 at 14:38


float **p


p[row][column]


p


p[row]


float *all;


all[row*columns + column]





@NominalAnimal Thanks for your comment. At current stage my goal is to get the code run correctly. I'll think about efficiency at next stage.
– coco
Aug 10 at 14:44




2 Answers
2



Let's say we use float **p = floatalloc2(columns, rows). Then, p is allocated for rows float pointers (float *), and p[0] for columns*rows floats. If the library does not provide a floatfree2() or free2() function, you can free the memory allocated for the 2D array using free(*p); free(p);, in that order. This is what OP is doing, too.


float **p = floatalloc2(columns, rows)


p


rows


float *


p[0]


columns*rows


floatfree2()


free2()


free(*p); free(p);



The inner loop,


for (k=0;k<nsrc-1;k++)
if (MYID==k)
buffer = floatalloc2(nx,nz);
memset(buffer[0],0,nz*nx*sizeof(float));
buffer = p;
else
buffer = floatalloc2(nx,nz);
memset(buffer[0],0,nz*nx*sizeof(float));

...
free(*buffer);free(buffer); /*w/o this line is ok while not ok with this line */



The buffer = p; line replaces the 2D array allocated and initialized on the preceding two lines; this is essentially a memory leak. I suppose the idea was to copy the contents of the p to it, but I cannot be sure.


buffer = p;


p



Furthermore, later on, the free(*buffer); free(buffer); line ends up freeing the 2D array described by p on the k == MYID iteration.


free(*buffer); free(buffer);


p


k == MYID



One approach to solve this is to let buffer alias p when k == MYID:


buffer


p


k == MYID


for (k = 0; k < nsrc-1; k++)
if (MYID == k)
buffer = p;
else
buffer = floatalloc2(nx,nz);
memset(buffer[0], 0, nz*nx*sizeof(float));

...
if (buffer != p)
free(*buffer);
free(buffer);




This way, when k == MYID, buffer actually points to p; otherwise it is dynamically allocated and freed for each iteration. Obviously, when buffer aliases p, we do not free it, because that would free p.


k == MYID


buffer


p


buffer


p


p



You could add free(*p); free(p); just before MPI_Finalize(), to free the memory allocated for p too, but it isn't strictly necessary because the process is about to exit anyway. (It is useful, however, if you use e.g. Valgrind to look for memory leaks, or wish to show that you (the programmer) do track dynamic allocations correctly; in that case, adding a comment there is probably useful.)


free(*p); free(p);


MPI_Finalize()


p



A much better approach is to allocate the buffer only once, and then reuse it for each iteration:


float **p, **buffer;

p = floatalloc2(nx, nz);
memset(p[0], 0, nz*nx*sizeof p[0][0]);

buffer = floatalloc2(nx, nz);
memset(buffer[0], 0, nz*nx*sizeof buffer[0][0]);

for (it=0; it < nt; it++)
for (isrc=MYID; isrc < nsrc; isrc+=NP)
for (j=0; j<nz; j++)
for (i=0; i<nx; i++)
p[j][i] += 1.5 + (float)(isrc) + (float)(j);




for (k=0; k<nsrc-1; k++)
float **data;

if (MYID == k)
data = p;
else
data = buffer;
memset(buffer[0], 0, nz*nx*sizeof buffer[0][0]);


MPI_Barrier(MPI_COMM_WORLD);
MPI_Bcast(&(data[0][0]), nx*nz, MPI_FLOAT, k, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);

for (j=0; j<nz; j++)
for (i=0; i<nx; i++)
printf("it=%d,k=%d,Node %d,data[%d][%d]=%f, p[%d][%d]=%fn",
it, k, MYID, j, i, data[j][i], p[j][i]);





free(*buffer);
free(buffer);

free(*p);
free(p);



Note that the sizeof p[0][0] and sizeof buffer[0][0] are statements that evaluate to the size (in chars) of each element of p/buffer, and do not actually examine any memory. Those are allowed, and safe, even if p == NULL or p[0] == NULL, because the compiler simply examines the type of the expression (to the right of the sizeof operator). To remind myself of that, I never use parentheses around the variable reference on the right of the sizeof operator. (When you use a type, like float, the parentheses are required. Then, I write it as sizeof (float).) This helps me remember that sizeof is an operator, and does not behave like a function, although it may look like a function.


sizeof p[0][0]


sizeof buffer[0][0]


p


buffer


p == NULL


p[0] == NULL


sizeof


sizeof


float


sizeof (float)


sizeof



The float **data; is just a reference (or alias) to the actual data in p or buffer; all we need to use it is to assign it, either data = p; or data = buffer;. Because it is a reference or alias, pointing to exactly the same data, we do not free() it at all. If we did, we'd just end up freeing the original data, p or buffer.


float **data;


p


buffer


data = p;


data = buffer;


free()


p


buffer



In a comment to the question, I mentioned that using double indirection (float **data, where data[row][column] is an element, and both data[row] and data are pointers) is not very efficient. In practice, a structure is used instead. For example:


float **data


data[row][column]


data[row]


data


typedef struct
int rows;
int cols;
float *data;
float2d;
#define FLOAT2D_INIT 0, 0, NULL

static inline void float2d_alloc(float2d *m, const int rows, const int cols)

do
if (!m)
fprintf(stderr, "float2d_alloc(): No matrix specified.n");
break;


if (rows < 1 while (0);

/* MPI_Abort(MPI_COMM_WORLD, 1); */
exit(1);


static inline void float2d_free(float2d *m)

if (m)
free(m->data);
m->rows = 0;
m->cols = 0;
m->data = NULL;




If you have float2d p; allocated, then to access the element on row r, column c, you use p.data[r*p.cols + c]. All the data is consecutive in memory; there being p.cols*p.rows elements, with total size being p.cols*p.rows*sizeof p.data[0]. To refer to the array of floats on row r, you can use p.data + r*p.rows (which is equivalent to &(p.data[r*p.rows])).


float2d p;


r


c


p.data[r*p.cols + c]


p.cols*p.rows


p.cols*p.rows*sizeof p.data[0]


r


p.data + r*p.rows


&(p.data[r*p.rows])



If you are interested in this approach, and work a lot with matrices (or dense 2D arrays of data), you might wish to look here for an example of structures that takes this a step further. It basically allows one to create realtime views (references to the exact same data) to any regular rectangular part (can be e.g. diagonal, a submatrix, some consecutive odd rows, even rows, etc.) of another matrix, with the code keeping tabs (counting references to) the data, so that when you destroy a matrix that was the final user of some data, the data gets freed automatically. Each element access does require two multiplications and one addition instead of just one multiplication and one addition, but it turns out to be neglible overhead in practice, and the versatility of the matrix structures definitely balances that out.





Very Excellent and detailed answer. As you said, A much better approach is to allocate the buffer only once, and then reuse it for each iteration. I found that if I allocate the buffer only once, and in the for loop, I use for (j=0; j<nz; j++) for (i=0; i<nx; i++) buffer[j][i] = p[j][i]; (so called deep copy) can also get the expected results without freeing buffer
– coco
Aug 10 at 15:52


A much better approach is to allocate the buffer only once, and then reuse it for each iteration


for (j=0; j<nz; j++) for (i=0; i<nx; i++) buffer[j][i] = p[j][i];


buffer





"I suppose the idea was to copy the contents of the p to it, but I cannot be sure." Yes, exactly, you're right.
– coco
Aug 10 at 16:01





I read your answer word by word. It's really great answer and I love it. Thanks.
– coco
Aug 10 at 16:27





But I have a small question, you defined the alias float **data in the code body. Can we do this? Because I was taught variables are defined in the very beginning of the code.
– coco
Aug 10 at 23:14


float **data





@coco: Yes. Even in C89, you can declare variables at the start of each block ( ... ).
– Nominal Animal
Aug 11 at 16:59


...



A different approach is to use VLAs:


#include <stdio.h>
#include <stdlib.h>

void print_matrix(int nx, int ny, int a[nx][ny])

printf("Printing matrix at %pn", &a[0][0]);
for (int i=0; i < nx; i++)
printf("%d", a[i][0]);
for (int j=1; j < ny; j++)
printf(", %d", a[i][j]);
printf("n");



int main(int argc, char *argv)

int nx = 3, nz = 2;

/* two different syntaxes */
int (*A)[nx][nz] = malloc(sizeof(int[nx][nz]));
int (*B)[nz] = malloc(sizeof(int[nx][nz]));

for (int i = 0; i < nx; i++)
for (int k = 0; k < nz; k++)
(*A)[i][k] = (i+1)*(k+1);
B[i][k] = (i+1)*(k+1);

print_matrix(nx, nz, *A);
print_matrix(nx, nz, B);

free(A);
free(B);



I strongly recommend the free book "Modern C" available here: https://gustedt.wordpress.com/2016/11/25/modern-c-is-now-feature-complete/





Tip: int (*A)[nx][nz] = malloc(sizeof(int[nx][nz])); may be simplified to int (*A)[nx][nz] = malloc(sizeof *A); and int (*B)[nz] = malloc(sizeof *B * nx);
– chux
Aug 10 at 16:47



int (*A)[nx][nz] = malloc(sizeof(int[nx][nz]));


int (*A)[nx][nz] = malloc(sizeof *A);


int (*B)[nz] = malloc(sizeof *B * nx);





@chux, thanks for highlighting that. I go back and forth on which syntax I prefer.
– ptb
Aug 10 at 17:33






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

Creating a leaderboard in HTML/JS