Meaning of the CL_DEVICE… parameters

I've implemented a single function that retrieves some informations related to my opencl devices, specifically I have this device:

1. Vendor NVIDIA Corporation 1. Device: GeForce GTX 1070 1.1 Hardware Version: OpenCL 1.2 CUDA 1.2 Software Version: 391.24 1.3 OpenCL C version: OpenCL C 1.2 1.4 Address bits: 64 1.5 Max Work Item Dimensions: 3 1.6 Work Item Sizes 1024 1024 64 1.7 Work group size: 1024 1.8 Parallel compute units 15

And I need to be sure I understand some of them (specifically work groups/items).

Given I have Work Item Sizes : 1024 1024 64 this means that when I instantiate the kernel I can use a total amount of 2^26 work items, is this correct? The Work group size : 1024 means, I guess, the max amount per work groups (in case I need to sue barriers etc I guess this info is useful). Not sure about the Parallel compute units because to me, given the name, this should be covered somehow in the work items, so

Work Item Sizes : 1024 1024 64

2^26

Work group size : 1024

Parallel compute units

CL_DEVICE_MAX_COMPUTE_UNITS

Work items

And one more question

Is there any relationship between Address bits and Work items?

Address bits

Work items

Thank you

1 Answer
1

What's the meaning of Parallel compute units

On CPUs, this is the amount of logical processors. On NVidias, this is the amount of "Streaming Multiprocessors", on AMD GPUs they are actually called "Compute Units".
The point of having these in OpenCL is that with some devices, you can "carve them up" by their compute units, and launch kernels independently on these units.

Given I have Work Item Sizes : 1024 1024 64 this means that when I instantiate the kernel I can use a total amount of 2^26 work items, is this correct?

Incorrect. These are the maximums of each dimension. The Work group size limit is the maximum of the multiplication of each dimension. IOW, if you have maximum "Work group size" of 1024, then you could launch e.g. [1024,1,1] or [128,8,1] or [4,16,4] but launching [2000,1,1] or [100,100,1] will fail. Go ahead and try it.

The reason for such (usually) small limit is related to barriers, but also local memory sizes (which are relatively tiny on most GPUs).

Also, it's actually explained in documentation to clEnqueueNDRangeKernel:

local_work_size

Points to an array of work_dim unsigned values that describe the number of
work-items that make up a work-group (also referred to as the size of the
work-group) that will execute the kernel specified by kernel. The total
number of work-items in a work-group is computed as local_work_size[0]
*... * local_work_size[work_dim - 1]. The total number of work-items
in the work-group must be less than or equal to the CL_DEVICE_MAX_WORK_GROUP_SIZE value

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

Meaning of the CL_DEVICE… parameters

Meaning of the CL_DEVICE… parameters

1 Answer
1

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

React Native Navigation and navigating to another Screen problem

Meaning of the CL_DEVICE… parameters

Meaning of the CL_DEVICE… parameters

1 Answer 1

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

React Native Navigation and navigating to another Screen problem

1 Answer
1