Python

 

 

 

 

CUDA

 

CUDA stands for Compute Unified Device Architecture. It is a tool made by NVIDIA that lets computer programmers use NVIDIA's graphics cards to do calculations. Normally, graphics cards are used to process images and videos, but with CUDA, they can be used for all sorts of calculations including calculations for Machine Learning.

 

It is a parallel computing platform and application programming interface (API) model created by NVIDIA. In other words, it works like a translator. It takes the instructions written by programmers and translates them into a language that the graphics card can understand. This allows the graphics card to do many calculations at the same time, which can make programs run faster.

 

In CUDA, the instructions for the graphics card are written in small pieces called kernels. Each kernel is like a mini-program that can be run many times at once on the graphics card. This is how CUDA makes programs run faster: by doing many things at once.

 

The purpose of this note is not for CUDA technology itself. My personal interest is just to use the graphics card in my laptop with CUDA framework from Python. So most of the contents are mainly for software setup process to make the graphic Card to work with Python in CUDA framework.

 

 

 

Get CUDA Compatible Graphics Card

 

Before I purchase my new laptop and I wanted to have it with a graphics card that has CUDA compatibility. So the first step was to search on what kind of laptop graphics card are CUDA compatible. In May 2023, I first checked with Bing Chat and asked to give me a list of laptop Graphics card that are CUDA compatible. It gave me a short list of the graphics card that lead me to the site : https://developer.nvidia.com/cuda-gpus which provides official information from NVIDIA.

 

Finally the laptop that I decided to buy is the one that has the graphics card as shown below.

 

 

 

 

 

Software Pre-Requisits

 

Even if you have CUDA compatible hardware, it doesn't mean that you can use it right away. You need a set of software component that are required to make the hardware work. In short, there are two major software component as follows :

  • C compiler (cl.exe in Windows, gcc in Linux or Mac OS)
  • CUDA compiler (nvcc)

Sound simple ? It just SOUND simple, but I think most part of the CUDA setup problem you would encounter may be related to these two components.

 

When you get these software ready on your PC, there are also a few things you need to make it sure as follows.  

  • Software (Complilers) and all the necessary dependencies are properly installed.
  • The directories (location) of those compliers (both cl and nvcc) are added to the system variable Path
  • A system variable CUDA_PATH should be created as a system varible and the path for nvcc should be assigned to the variable

The first item can easily be done. Just download the necessary package and install it, but many of the problem is related to 2nd or 3rd items. Ideally just installing the software should do the 2nd and 3rd automatically, but in many cases it doesn't seem to be the case.

 

The simplest way to check if the compilers are installed and the path for the compilers are properly set is as follows :  

  • Run Windows command line tool
  • Run following commands in the directory outside of the compiler directory (e.g, C:\). If these command does not show you error message, it is highly likely that the compiler and the required path are properly setup.
    • cl -help
    • nvcc --help

 

 

 

Get CUDA Toolkit Ready

 

Download CUDA Toolkit from https://developer.nvidia.com/cuda-downloads

 

 

 

 

 

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin

 

 

 

Get Anaconda Ready

 

https://www.anaconda.com/download

 

 

 

 

Install pycuda with Anaconda

 

 $> C:\ProgramData\anaconda3\Scripts\conda install -c conda-forge pycuda

 

 

 $> C:\ProgramData\anaconda3\Scripts\conda install -c conda-forge pycuda

Collecting package metadata (current_repodata.json): done

Solving environment: done

 

 

==> WARNING: A newer version of conda exists. <==

  current version: 23.3.1

  latest version: 23.5.0

 

Please update conda by running

 

    $ conda update -n base -c defaults conda

 

Or to minimize the number of packages updated during conda update use

 

     conda install conda=23.5.0

 

 

 

## Package Plan ##

 

  environment location: C:\ProgramData\anaconda3

 

  added / updated specs:

    - pycuda

 

 

The following NEW packages will be INSTALLED:

 

  boost              conda-forge/win-64::boost-1.78.0-py310h220cb41_4

  boost-cpp          conda-forge/win-64::boost-cpp-1.78.0-h5b4e17d_0

  cudatoolkit        conda-forge/win-64::cudatoolkit-11.8.0-h09e9e62_11

  mako               conda-forge/noarch::mako-1.2.4-pyhd8ed1ab_0

  pycuda             conda-forge/win-64::pycuda-2022.2.2-py310ha2c4f5d_0

  python_abi         conda-forge/win-64::python_abi-3.10-2_cp310

  pytools            conda-forge/noarch::pytools-2022.1.14-pyhd8ed1ab_0

  ucrt               conda-forge/win-64::ucrt-10.0.22621.0-h57928b3_0

  vc14_runtime       conda-forge/win-64::vc14_runtime-14.34.31931-h5081d32_16

 

The following packages will be UPDATED:

 

  ca-certificates    pkgs/main::ca-certificates-2023.01.10~ --> conda-forge::ca-certificates-2023.5.7-h56e8100_0

  certifi            pkgs/main/win-64::certifi-2022.12.7-p~ --> conda-forge/noarch::certifi-2023.5.7-pyhd8ed1ab_0

  openssl              pkgs/main::openssl-1.1.1t-h2bbff1b_0 --> conda-forge::openssl-1.1.1u-hcfcfb64_0

  vs2015_runtime     pkgs/main::vs2015_runtime-14.27.29016~ --> conda-forge::vs2015_runtime-14.34.31931-hed1258a_16

 

 

Proceed ([y]/n)? y

 

 

Downloading and Extracting Packages

 

Preparing transaction: done

Verifying transaction: done

Executing transaction: | "By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html"

 

done

 

 

 

Graphics Card Detection Check  

 

Cuda_test_01.py

import pycuda.driver as cuda

 

def check_cuda_compatibility():

    cuda.init()

    device_count = cuda.Device.count()

    

    if device_count == 0:

        print("No CUDA compatible device detected.")

    else:

        print(f"{device_count} CUDA compatible device(s) detected.")

        for i in range(device_count):

            device = cuda.Device(i)

            print(f"Device #{i + 1}: {device.name()}")

 

check_cuda_compatibility()

Result

1 CUDA compatible device(s) detected.

Device #1: NVIDIA GeForce RTX 3060 Laptop GPU

 

 

 

CUDA Operation Check  

 

Cuda_test_02.py

import pycuda.autoinit

import pycuda.driver as cuda

import pycuda.compiler

import numpy

 

device = cuda.Device(0)  # Replace 0 with the ID of your GPU if you have more than one

print(device.get_attributes()[cuda.device_attribute.MAX_THREADS_PER_BLOCK])

 

# Create a numpy array of 10000 elements, initialized to 1

a = numpy.ones(10000).astype(numpy.float32)

 

# Allocate memory on the GPU

a_gpu = cuda.mem_alloc(a.size * a.dtype.itemsize)

 

# Copy the numpy array to the GPU

cuda.memcpy_htod(a_gpu, a)

 

# Create a CUDA function (also known as a "kernel") that multiplies each element by 2

mod = pycuda.compiler.SourceModule("""

    __global__ void multiply_by_2(float *a)

    {

        int idx = threadIdx.x;

        if (idx < 10000)

        {

            a[idx] *= 2;

        }

    }

""")

 

# Get the function from the module

func = mod.get_function("multiply_by_2")

 

# Call the function on the GPU

func(a_gpu, block=(256,1,1), grid=(40,1))

 

# Copy the result back to the CPU

a_doubled = numpy.empty_like(a)

cuda.memcpy_dtoh(a_doubled, a_gpu)

 

# Print the result

print(a_doubled)

Result

1024

[2. 2. 2. ... 1. 1. 1.]