Issue with simple atomic counter test in OpenGL compute shader

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Issue with simple atomic counter test in OpenGL compute shader



I've been trying to wrap my head around memory synchronization and coherency by trying trivial examples.



In this, I'm dispatching a compute shader with 8x8x1 size work groups. The number of work groups is sufficient to cover the screen, which is 720x480.



Compute shader code:


#version 450 core

layout (local_size_x = 8, local_size_y = 8, local_size_z = 1) in;

layout (binding = 0, rgba8) uniform image2D u_fboImg;

layout (binding = 0, offset = 0) uniform atomic_uint u_counters[100];

void main()



This is what I get: (The heights of the jagged bars are different each time, but on average about that height)



enter image description here



This is what I would expect, and is the result of hard coding the for loop to go to 400.



enter image description here



Strangely enough, if I decrease the number of work groups in the dispatch, say by halving the x value (would now only cover half the screen), the bars get bigger:



enter image description here



Finally to prove there isn't some other nonsense going on, here i'm just coloring based on local invocation id:



enter image description here



*Edit: Forgot to mention the dispatch is followed immediately by glMemoryBarrier(GL_ALL_BARRIER_BITS);


glMemoryBarrier(GL_ALL_BARRIER_BITS);





Does your implementation actually allow 100 atomic counters?
– Nicol Bolas
Aug 13 at 3:22





Sure doesn't. 8 is the value I get for GL_MAX_COMPUTE_ATOMIC_COUNTER_BUFFERS. I don't know why i didn't think to check that.
– Daskie
Aug 13 at 3:37





That's the number of atomic counter buffers, not counters. I was asking about ATOMIC_COUNTER_BUFFER_SIZE. For example, most NVIDIA implementations allow 64KB of atomic counter buffer size, so you could have 16K counters. By contrast, lots of AMD hardware only has 32 bytes of buffer size, so you can only write 8 counters.
– Nicol Bolas
Aug 13 at 3:38



ATOMIC_COUNTER_BUFFER_SIZE





I'm not getting a value back from glGetIntegerv for GL_ATOMIC_COUNTER_BUFFER_SIZE, although it is defined. Additionally, i don't see it listed as an option in the docs: khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml/glGet.xml
– Daskie
Aug 13 at 3:41





Atomic counters are an OpenGL 4.x feature; you wouldn't see them on a page for OpenGL 2.1. And the full name of the enumerator is GL_MAX_ATOMIC_COUNTER_BUFFER_SIZE.
– Nicol Bolas
Aug 13 at 3:44



GL_MAX_ATOMIC_COUNTER_BUFFER_SIZE




1 Answer
1



Unless otherwise stated, all shader invocations for a particular shader stage, including the compute shader stage, execute independently of one another, in an order which is undefined. And calling memoryBarrier does not change this fact. This means that, when the stuff after the memoryBarrier is called, there is no guarantee that the value from the atomic counter has been incremented by all of the shader invocations that will eventually do so.


memoryBarrier


memoryBarrier



So what you're seeing is exactly what one would expect to see: the invocations writing somewhat random values, depending on the implementation-dependent order that the invocations just so happen to be executed in.



What you're wanting to do is execute all of the atomic increments for all invocations, then read those values and draw stuff based on what you read. Your code as written cannot do that.



While compute shaders do have some ability to manipulate the order of execution of invocations, this only works for invocations within the same work group (this is in fact why work groups exist). That is, you can have invocations ordered to a degree in a work group, but never between work groups.



The simple fix for this is to turn it into 2 compute shader dispatch operations. The first does all of the incrementing. The second will read the values and write the results to the image.



More clever solutions would involve employing work groups. That is, group your work so that whatever would have incremented the same atomic counter will execute within the same work group. This way, you don't even need atomic counters; you just use shared variables (which can perform atomic operations). You call barrier() after you do all of the incrementing of the shared variable; that ensures that all invocations have executed at least that far before any invocation continues past that point. So all of the incrementing is done.


barrier()






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard