Q Blocks Documentation
  • 👋Welcome to Q Blocks
  • 🌐GPU Computing at Scale
  • 💻Launch a Q Blocks GPU instance
    • Using Dashboard UI
    • Using Rest APIs
  • 💰GPU Instance Pricing
  • 🤖Fine-tuning Falcon 7B/40B LLM
  • 🔑IAM: Share access with team
  • 🤔Q Blocks How To Guide
    • Create a new user
    • Upload data using SCP command
    • Use Visual Studio Code with Q Blocks instances
    • Port forwarding to run web services
    • Launch Jupyter Hub in Q Blocks Instance
    • Launch TensorBoard in Q Blocks instance
    • Setup Horovod and OpenMPI in Q Blocks Instance
    • Setup AIM for ML experiment tracking
    • Disco Diffusion AI Art on Q Blocks
    • Stable Diffusion Text to Image GPU server on Q Blocks
    • Setup Docker with Nvidia GPU support
    • Enable port forwarding on a Docker container in Q Blocks instance
    • Run production ready lightweight kubernetes using K3s in Q Blocks instance
    • ↗️Upgrade CUDA to v12.2
Powered by GitBook
On this page
  • Pre-requisites:
  • Steps to bring up K3s cluster inside Q Blocks GPU instance:
  1. Q Blocks How To Guide

Run production ready lightweight kubernetes using K3s in Q Blocks instance

PreviousEnable port forwarding on a Docker container in Q Blocks instanceNextUpgrade CUDA to v12.2

Last updated 1 year ago

K3s is a production-ready lightweight Kubernetes distribution that allows easy and scalable container orchestration. Read more on .

Pre-requisites:

  • You need a pro / business Q Blocks instance

  • Ask to enable K3s support on your instance

Once pre-requisite is fulfilled, we can proceed ahead with K3s setup.

Steps to bring up K3s cluster inside Q Blocks GPU instance:

  1. Make sure nvidia-smi is running inside the container

  2. Install Docker

sudo apt-get update 
sudo apt-get install docker.io
  1. Install nvidia-container-toolkit

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | sudo tee /etc/apt/sources.list.d/libnvidia-container.list

sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
  1. Set nvidia runtime as default container runtime:

By default, k3s prefers containerd runtime. But for GPUs to work we need default runtime of nvidia. So we setup nvidia runtime as follows in docker daemon file:

sudo vim /etc/docker/daemon.json
{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
sudo systemctl restart docker
  1. Now, we will run setup K3s cluster using docker runtime:

First, we install K3s:

sudo curl -sfL https://get.k3s.io | sh -s - --docker
  1. Make sure k3s cluster is up and running

Wait for 5-10 seconds for the cluster to come up and then run this command:

sudo k3s kubectl get pods --all-namespaces
  1. Install NVIDIA daemon for K3s*:

This makes instance GPU available for k3s cluster

sudo k3s kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.1/nvidia-device-plugin.yml

8. Do check logs of nvidia-device-plugin to confirm GPU are detected:

Get the name of nvidia pod launched by step 7 using this command's output:

sudo k3s kubectl get pods --all-namespaces

Add the pod name in below command:

sudo k3s kubectl logs <daemon set pod name> -n kube-system

This should return an output like this:

0812 05:23:47.267089       1 main.go:154] Starting FS watcher.
I0812 05:23:47.267213       1 main.go:161] Starting OS watcher.
I0812 05:23:47.267548       1 main.go:176] Starting Plugins.
I0812 05:23:47.267563       1 main.go:234] Loading configuration.
I0812 05:23:47.267689       1 main.go:242] Updating config with default resource matching patterns.
I0812 05:23:47.267884       1 main.go:253] 
Running with config:
{
  "version": "v1",
  "flags": {
    "migStrategy": "none",
    "failOnInitError": false,
    "nvidiaDriverRoot": "/",
    "gdsEnabled": false,
    "mofedEnabled": false,
    "plugin": {
      "passDeviceSpecs": false,
      "deviceListStrategy": [
        "envvar"
      ],
      "deviceIDStrategy": "uuid",
      "cdiAnnotationPrefix": "cdi.k8s.io/",
      "nvidiaCTKPath": "/usr/bin/nvidia-ctk",
      "containerDriverRoot": "/driver-root"
    }
  },
  "resources": {
    "gpus": [
      {
        "pattern": "*",
        "name": "nvidia.com/gpu"
      }
    ]
  },
  "sharing": {
    "timeSlicing": {}
  }
}
I0812 05:23:47.267893       1 main.go:256] Retreiving plugins.
I0812 05:23:47.268313       1 factory.go:107] Detected NVML platform: found NVML library
I0812 05:23:47.268378       1 factory.go:107] Detected non-Tegra platform: /sys/devices/soc0/family file not found
I0812 05:23:47.279615       1 server.go:165] Starting GRPC server for 'nvidia.com/gpu'
I0812 05:23:47.280859       1 server.go:117] Starting to serve 'nvidia.com/gpu' on /var/lib/kubelet/device-plugins/nvidia-gpu.sock
I0812 05:23:47.283115       1 server.go:125] Registered device plugin for 'nvidia.com/gpu' with Kubelet
  1. Validate if GPUs are getting detected by K3s cluster node:

sudo k3s kubectl describe node -A | grep nvidia
  1. If you are able to see GPU recognised and deamonSet not throwing an error its time to do a test run and make sure a pod can access the GPU. Make sure to run this container only on a node with GPU.

Make sure the docker image used for testing has same or lower cuda version as the one supported by nvidia driver in Instance.

  1. Create a .yaml file k3sgputest.yaml

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  restartPolicy: Never
  containers:
    - name: cuda-container
      image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda11.2.1-ubuntu18.04
      resources:
        limits:
          nvidia.com/gpu: 1 # requesting 1 GPU
  tolerations:
  - key: nvidia.com/gpu
    operator: Exists
  1. Run the gpu pod

sudo k3s kubectl apply -f k3sgputest.yaml
sudo k3s kubectl logs gpu-pod
  1. Please wait for 5-10 seconds for the pod to load and run. If it ran successfully, it would display a log like this:

[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

This confirms K3s cluster was able to detect GPU and pods are able to run code on GPUs inside kubernetes cluster

If you face any difficulty in setting up K3s then please reach us out at support@qblocks.cloud.

K3s official Github Repo
Q Blocks support
🤔
Page cover image