Python | ShareTechnote

CUDA

CUDA stands for Compute Unified Device Architecture. It is a tool made by NVIDIA that lets computer programmers use NVIDIA's graphics cards to do calculations. Normally, graphics cards are used to process images and videos, but with CUDA, they can be used for all sorts of calculations including calculations for Machine Learning.

It is a parallel computing platform and application programming interface (API) model created by NVIDIA. In other words, it works like a translator. It takes the instructions written by programmers and translates them into a language that the graphics card can understand. This allows the graphics card to do many calculations at the same time, which can make programs run faster.

In CUDA, the instructions for the graphics card are written in small pieces called kernels. Each kernel is like a mini-program that can be run many times at once on the graphics card. This is how CUDA makes programs run faster: by doing many things at once.

The purpose of this note is not for CUDA technology itself. My personal interest is just to use the graphics card in my laptop with CUDA framework from Python. So most of the contents are mainly for software setup process to make the graphic Card to work with Python in CUDA framework.

Get CUDA Compatible Graphics Card
Software Pre-Requisits
Get CUDA Toolkit Ready
Get Anaconda Ready
Install pycuda with Anaconda
Graphics Card Detection Check
CUDA Operation Check

Get CUDA Compatible Graphics Card

Before I purchase my new laptop and I wanted to have it with a graphics card that has CUDA compatibility. So the first step was to search on what kind of laptop graphics card are CUDA compatible. In May 2023, I first checked with Bing Chat and asked to give me a list of laptop Graphics card that are CUDA compatible. It gave me a short list of the graphics card that lead me to the site : https://developer.nvidia.com/cuda-gpus which provides official information from NVIDIA.

Finally the laptop that I decided to buy is the one that has the graphics card as shown below.

Software Pre-Requisits

Even if you have CUDA compatible hardware, it doesn't mean that you can use it right away. You need a set of software component that are required to make the hardware work. In short, there are two major software component as follows :

C compiler (cl.exe in Windows, gcc in Linux or Mac OS)
CUDA compiler (nvcc)

Sound simple ? It just SOUND simple, but I think most part of the CUDA setup problem you would encounter may be related to these two components.

When you get these software ready on your PC, there are also a few things you need to make it sure as follows.

Software (Complilers) and all the necessary dependencies are properly installed.
The directories (location) of those compliers (both cl and nvcc) are added to the system variable Path
A system variable CUDA_PATH should be created as a system varible and the path for nvcc should be assigned to the variable

The first item can easily be done. Just download the necessary package and install it, but many of the problem is related to 2nd or 3rd items. Ideally just installing the software should do the 2nd and 3rd automatically, but in many cases it doesn't seem to be the case.

The simplest way to check if the compilers are installed and the path for the compilers are properly set is as follows :

Run Windows command line tool
Run following commands in the directory outside of the compiler directory (e.g, C:\). If these command does not show you error message, it is highly likely that the compiler and the required path are properly setup.

cl -help
nvcc --help

Get CUDA Toolkit Ready

Download CUDA Toolkit from https://developer.nvidia.com/cuda-downloads

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin

Get Anaconda Ready

https://www.anaconda.com/download

Install pycuda with Anaconda

$> C:\ProgramData\anaconda3\Scripts\conda install -c conda-forge pycuda

Collecting package metadata (current_repodata.json): done

Solving environment: done

==> WARNING: A newer version of conda exists. <==

current version: 23.3.1

latest version: 23.5.0

Please update conda by running

$ conda update -n base -c defaults conda

Or to minimize the number of packages updated during conda update use

conda install conda=23.5.0

## Package Plan ##

environment location: C:\ProgramData\anaconda3

added / updated specs:

- pycuda

The following NEW packages will be INSTALLED:

boost conda-forge/win-64::boost-1.78.0-py310h220cb41_4

boost-cpp conda-forge/win-64::boost-cpp-1.78.0-h5b4e17d_0

cudatoolkit conda-forge/win-64::cudatoolkit-11.8.0-h09e9e62_11

mako conda-forge/noarch::mako-1.2.4-pyhd8ed1ab_0

pycuda conda-forge/win-64::pycuda-2022.2.2-py310ha2c4f5d_0

python_abi conda-forge/win-64::python_abi-3.10-2_cp310

pytools conda-forge/noarch::pytools-2022.1.14-pyhd8ed1ab_0

ucrt conda-forge/win-64::ucrt-10.0.22621.0-h57928b3_0

vc14_runtime conda-forge/win-64::vc14_runtime-14.34.31931-h5081d32_16

The following packages will be UPDATED:

ca-certificates pkgs/main::ca-certificates-2023.01.10~ --> conda-forge::ca-certificates-2023.5.7-h56e8100_0

certifi pkgs/main/win-64::certifi-2022.12.7-p~ --> conda-forge/noarch::certifi-2023.5.7-pyhd8ed1ab_0

openssl pkgs/main::openssl-1.1.1t-h2bbff1b_0 --> conda-forge::openssl-1.1.1u-hcfcfb64_0

vs2015_runtime pkgs/main::vs2015_runtime-14.27.29016~ --> conda-forge::vs2015_runtime-14.34.31931-hed1258a_16

Proceed ([y]/n)? y

Downloading and Extracting Packages

Preparing transaction: done

Verifying transaction: done

Executing transaction: | "By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html"

done

Graphics Card Detection Check

Cuda_test_01.py

import pycuda.driver as cuda

def check_cuda_compatibility():

cuda.init()

device_count = cuda.Device.count()

if device_count == 0:

print("No CUDA compatible device detected.")

else:

print(f"{device_count} CUDA compatible device(s) detected.")

for i in range(device_count):

device = cuda.Device(i)

print(f"Device #{i + 1}: {device.name()}")

check_cuda_compatibility()

Result

1 CUDA compatible device(s) detected.

Device #1: NVIDIA GeForce RTX 3060 Laptop GPU

CUDA Operation Check

Cuda_test_02.py

import pycuda.autoinit

import pycuda.driver as cuda

import pycuda.compiler

import numpy

device = cuda.Device(0) # Replace 0 with the ID of your GPU if you have more than one

print(device.get_attributes()[cuda.device_attribute.MAX_THREADS_PER_BLOCK])

# Create a numpy array of 10000 elements, initialized to 1

a = numpy.ones(10000).astype(numpy.float32)

# Allocate memory on the GPU

a_gpu = cuda.mem_alloc(a.size * a.dtype.itemsize)

# Copy the numpy array to the GPU

cuda.memcpy_htod(a_gpu, a)

# Create a CUDA function (also known as a "kernel") that multiplies each element by 2

mod = pycuda.compiler.SourceModule("""

__global__ void multiply_by_2(float *a)

{

int idx = threadIdx.x;

if (idx < 10000)

{

a[idx] *= 2;

}

""")

# Get the function from the module

func = mod.get_function("multiply_by_2")

# Call the function on the GPU

func(a_gpu, block=(256,1,1), grid=(40,1))

# Copy the result back to the CPU

a_doubled = numpy.empty_like(a)

cuda.memcpy_dtoh(a_doubled, a_gpu)

# Print the result

print(a_doubled)

Result

1024

[2. 2. 2. ... 1. 1. 1.]