openCL - Creating sub-buffers returns errorcode 13
Clash Royale CLAN TAG#URR8PPP
openCL - Creating sub-buffers returns errorcode 13
Hi I am new to OpenCL and using the C++ wrapper. Trying to run the same kernel on two devices simultaneously. The buffer is created and the attempt is to chunk it up using sub-buffers and passing those chucks to the kernel and dispatching them twice - once to Command Queue 1 and then to Command Queue 2 with different chunks of the main buffer.
When running it throws an error -13. All the other sub-buffers have been created except this one in question.
Any guidance will be much appreciated.
Using OpenCL 1.1
//Creating main buffer
cl::Buffer zeropad_buf(openclObjects.context,CL_MEM_READ_ONLY| CL_MEM_COPY_HOST_PTR,(size+2)*(size+2)*cshape[level][1]*sizeof(float),zeropad);
cl::Buffer output_buf(openclObjects.context,CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR ,cshape[level][0]*size*size*sizeof(float),output_f);
//Creating sub_buffers for zeropad_buf
size_t zeropad_buf_size = (size+2)*(size+2)*cshape[level][1]*sizeof(float);
size_t output_buf_size = cshape[level][0]*size*size*sizeof(float);
cl_buffer_region zero_rgn_4core = 0, zeropad_buf_size/2;
**cl_buffer_region zero_rgn_2core = zeropad_buf_size/2, zeropad_buf_size/2;** //Throws error -13
cl_buffer_region output_rgn_4core = 0, output_buf_size/2;
cl_buffer_region output_rgn_2core = output_buf_size/2, output_buf_size/2;
cl::Buffer zeropad_buf_4Core = zeropad_buf.createSubBuffer(CL_MEM_READ_ONLY,CL_BUFFER_CREATE_TYPE_REGION, &zero_rgn_4core);
**cl::Buffer zeropad_buf_2Core = zeropad_buf.createSubBuffer(CL_MEM_READ_ONLY,CL_BUFFER_CREATE_TYPE_REGION, &zero_rgn_2core);**
std::cout<<"zero_pad sub-buffer created"<<std::endl;
cl::Buffer output_buf_4Core = output_buf.createSubBuffer(CL_MEM_READ_WRITE,CL_BUFFER_CREATE_TYPE_REGION, &output_rgn_4core);
cl::Buffer output_buf_2Core = output_buf.createSubBuffer(CL_MEM_READ_WRITE,CL_BUFFER_CREATE_TYPE_REGION, &output_rgn_2core);
1 Answer
1
From the documentation:
CL_MISALIGNED_SUB_BUFFER_OFFSET
is returned in errcode_ret
if there are no devices in context associated with buffer for which the origin value is aligned to the CL_DEVICE_MEM_BASE_ADDR_ALIGN
value.
CL_MISALIGNED_SUB_BUFFER_OFFSET
errcode_ret
CL_DEVICE_MEM_BASE_ADDR_ALIGN
It looks like you might need to align your split region offsets and sizes to lie on integer multiples of the least common multiple (LCM) of the CL_DEVICE_MEM_BASE_ADDR_ALIGN
properties of all of your devices.
CL_DEVICE_MEM_BASE_ADDR_ALIGN
By this, I mean something like the following:
Assuming the devices you are using are in a variable
std::vector<cl::Device> devices;
Query the CL_DEVICE_MEM_BASE_ADDR_ALIGN
property for each device:
CL_DEVICE_MEM_BASE_ADDR_ALIGN
cl_uint total_alignment_requirement = 1;
for (cl::Device& dev : devices)
cl_uint device_mem_base_align = 0;
if (CL_SUCCESS == dev.getInfo(CL_DEVICE_MEM_BASE_ADDR_ALIGN, &device_mem_base_align))
total_alignment_requirement = std::lcm(total_alignment_requirement, device_mem_base_align);
Then, when it comes to allocating zeropad
, make sure the memory is aligned to total_alignment_requirement
, for example if you're currently allocating it with malloc()
, use posix_memalign()
instead. (Even better, don't create the buffer using CL_MEM_USE_HOST_PTR
and let OpenCL allocate the memory if you can.)
zeropad
total_alignment_requirement
malloc()
posix_memalign()
CL_MEM_USE_HOST_PTR
Finally, your regions need to be aligned too:
size_t zeropad_split_pos = zeropad_buf_size / 2;
zeropad_split_pos -= zeropad_split_pos % total_alignment_requirement;
cl_buffer_region zero_rgn_4core = 0, zeropad_split_pos;
cl_buffer_region zero_rgn_2core = zeropad_split_pos, zeropad_buf_size - zeropad_split_pos;
This ensures that the first region starts and ends on an address that is a multiple of total_alignment_requirement
, and the second region starts on an aligned address too.
total_alignment_requirement
(I haven't tested this code, but it should be close to correct. Note that std::lcm
is a very new C++ standard library feature, so if that's not available in your toolchain, you'll need to supply your own lcm function.)
std::lcm
I'm not aware of example code, but I've edited my answer to give you an idea of the kind of code that's required.
– pmdj
Aug 7 at 14:21
Thanks a ton. Just what i was looking for.
– Sanjay Rakshit
Aug 7 at 20:56
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Thanks for your response. But I don't understand what it means. Is there an example code or an explanation with an example perhaps?
– Sanjay Rakshit
Aug 6 at 21:30