GPU Accelerated Machine Learning With Rapids

Rajat Roy
3 min readJul 22, 2023
Photo by Media Studio Hong Kong on Unsplash

Introduction

As a Data Science practitioner, I often end up dealing with large datasets and need to run large model training experiments. These tasks often take longer and sometimes even encounter memory shortage.

Well, worry no more. Nvidia provides an interface known as Rapids to execute pandas, visualize large datasets and even Scikit-Learn for feature engineering and machine learning model training on GPU.

About Rapids

The RAPIDS data science framework uses GPU to run end-to-end pipelines and has a Python-like interface. It includes optimized NVIDIA CUDA primitives and high-bandwidth GPU memory for better performance.

We can access rapids on a free GPU environment runtime on Google colab.

Start with installing rapids on a colab notebook using following commands.

!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!python rapidsai-csp-utils/colab/pip-install.py

Code

For using pandas like interface rapids provides cudf to perform all data cleansing, transformation, feature selection & feature engineering related task. We can start by importing cudf as follows.

import cudf
cudf.__version__

The cuml library facilitates machine learning tasks by using the scikit-learn interface.

import cuml
cuml.__version__

Let's try clustering a sample dataset and compare the runtime of clustering functions by running it with CPU and then with GPU.

Import the packages.

import cudf
import cupy
from cuml.cluster import KMeans as cuKMeans
from cuml.datasets import make_blobs
from sklearn.cluster import KMeans as skKMeans

Initialize the sample size and number of clusters.

n_samples = 1000000
n_features = 100

n_clusters = 8
random_state = 0

Create sample data and copy the data from GPU to CPU numpy arrays.

device_data, device_labels = make_blobs(
n_samples=n_samples,
n_features=n_features,
centers=n_clusters,
random_state=random_state,
cluster_std=0.1
)

# Copy CuPy arrays from GPU memory to host memory (NumPy arrays).
# This is done to later compare CPU and GPU results.
host_data = device_data.get()
host_labels = device_labels.get()

Running KMeans clustering on CPU. Also record the time taken to execute the code.

kmeans_sk = skKMeans(
init="k-means++",
n_clusters=n_clusters,
random_state=random_state,
n_init='auto'
)
%timeit kmeans_sk.fit(host_data)

Similarly, running KMeans clustering on GPU.

kmeans_cuml = cuKMeans(
init="k-means++",
n_clusters=n_clusters,
random_state=random_state
)

%timeit kmeans_cuml.fit(device_data)

On comparing the execution times between the CPU and GPU. The CPU took 5.15 seconds to execute this code whereas the GPU finished the task in 180 milliseconds.

Conclusion

By leveraging rapids we can definitely see performance improvement in terms of runtime. Rapids is easy to implement which can help reduce the model training time.

Hope you will definitely give it a try.

I have a Message for You

Are you a programming, AI, or machine learning enthusiast? Then you’ll love my blog on Medium! I regularly post about these topics and share my insights on the latest trends and tools in data science. If you find my content helpful, please like and follow my blog. And if you want to show some extra support, you can give a tip by clicking the button below. Thanks for your time and support!

WRITER at MLearning.ai // Code Interpreter // Animate Midjourney

--

--

Rajat Roy

Data Scientist | Machine Learning Engineer | Blogger