NVIDIA + CUDA + cuDNN + TensorFlow (Py311) Setup on Fresh Ubuntu

Overview


Step 1: Base System Preparation

Update Ubuntu


sudo apt update
sudo apt upgrade -y
sudo apt install -y build-essential wget curl git

Reboot


sudo reboot

Step 2: Install NVIDIA Driver

Detect Recommended Driver


ubuntu-drivers devices

Install NVIDIA Driver (example: 535)


sudo apt install -y nvidia-driver-535
sudo reboot

Verify Driver


nvidia-smi

Step 3: Install Miniconda

Download Miniconda


wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Restart Shell


exec bash

Step 4: Create Python 3.11 Environment

Create and Activate Py311


conda create -n Py311 python=3.11 -y
conda activate Py311

Verify Python


python -V

Step 5: Install CUDA Toolkit 12.1 (Runfile)

Download CUDA 12.1 Runfile


wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run

Run Installer (Toolkit Only)


sudo sh cuda_12.1.1_530.30.02_linux.run

Persist CUDA Environment


sudo tee /etc/profile.d/cuda-12.1.sh << 'EOF'
export CUDA_HOME=/usr/local/cuda-12.1
export PATH=/usr/local/cuda-12.1/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64:$LD_LIBRARY_PATH
EOF

source /etc/profile.d/cuda-12.1.sh

Verify CUDA


nvcc --version

Step 6: Install cuDNN 8.9.7 (Correct Version)

Download cuDNN


wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz

Extract and Install


tar -xf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
sudo cp -r cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/* /usr/local/cuda-12.1/include/
sudo cp -r cudnn-linux-x86_64-8.9.7.29_cuda12-archive/lib/* /usr/local/cuda-12.1/lib64/
sudo ldconfig

Verify cuDNN


ls /usr/local/cuda-12.1/lib64/libcudnn*

Step 7: Install TensorFlow (GPU)

Install TensorFlow


pip install --upgrade pip
pip install tensorflow==2.16.2

Step 8: Verify TensorFlow GPU

GPU Test Script


import tensorflow as tf

gpus = tf.config.list_physical_devices("GPU")
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

print("TensorFlow:", tf.__version__)
print("Built with CUDA:", tf.test.is_built_with_cuda())
print("GPUs:", tf.config.list_physical_devices("GPU"))

Run


python gpu_test.py

Expected Output


Step 9: Sample Training Script (GPU)


import tensorflow as tf
import time

gpus = tf.config.list_physical_devices("GPU")
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

x = tf.random.normal((1_000_000, 10))
y = tf.reduce_sum(x, axis=1, keepdims=True)

model = tf.keras.Sequential([
    tf.keras.layers.Dense(256, activation="relu"),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dense(1),
])

model.compile(optimizer="adam", loss="mse")

with tf.device("/GPU:0"):
    start = time.time()
    model.fit(x, y, epochs=3, batch_size=4096)
    print("Training time:", time.time() - start)

Final State