Throughput Reported by Visual Profiler

For devices with compute capability of 2.0 or greater, the Visual Profiler can be used to collect several different memory throughput measures. The following throughput metrics can be displayed in the Details view:

The Requested Global Load Throughput and Requested Global Store Throughput values indicate the global memory throughput requested by the kernel, and thus correspond to the effective bandwidth obtained by the calculation in Effective Bandwidth Calculation.

Because the minimum memory transaction size is larger than most word sizes, the actual memory throughput required for a kernel includes the transfer of data not used by the kernel. For global memory accesses, this actual throughput is reported by the Global Load Throughput and Global Store Throughput values.

It’s important to note that both numbers are useful. The actual memory throughput shows how close the code is to the hardware limit, and the comparison of the effective or requested bandwidth with the actual bandwidth presents a good estimate of how much bandwidth is wasted by suboptimal coalescing of memory accesses. For global memory accesses, this comparison of requested memory bandwidth to actual memory bandwidth is reported by the Global Memory Load Efficiency and Global Memory Store Efficiency metrics.

Note: the Visual Profiler uses 1,0243 when converting B/sec to GB/sec.