Deploying a GPU VM with Bicep
Objective
This lab demonstrates how to provision an Azure Virtual Machine (VM) with GPU acceleration using Bicep, the Azure-native Infrastructure-as-Code (IaC) language.
You’ll deploy a GPU-enabled Ubuntu VM, connect to it via SSH, install the NVIDIA drivers, and validate GPU availability with nvidia-smi.
Prerequisites
Before starting, ensure you have:
✅ Access to an Azure subscription with GPU quotas
✅ SSH key pair generated locally
✅ A resource group for your deployment (or Terraform-created one)
Folder structure
bicep-vm-gpu/
├── main.bicep
├── parameters.json
└── README.mdBicep template overview
main.bicep
Defines the VM configuration, network interface, and OS profile.
resource vm 'Microsoft.Compute/virtualMachines@2022-11-01' = {
name: 'vm-gpu-inference'
location: resourceGroup().location
properties: {
hardwareProfile: {
vmSize: 'Standard_NC6s_v3'
}
osProfile: {
computerName: 'gpuvm'
adminUsername: 'azureuser'
linuxConfiguration: {
disablePasswordAuthentication: true
ssh: {
publicKeys: [
{
path: '/home/azureuser/.ssh/authorized_keys'
keyData: '<your-public-ssh-key>'
}
]
}
}
}
storageProfile: {
imageReference: {
publisher: 'Canonical'
offer: '0001-com-ubuntu-server-jammy'
sku: '22_04-lts-gen2'
version: 'latest'
}
osDisk: {
createOption: 'FromImage'
}
}
networkProfile: {
networkInterfaces: [
{
id: nic.id
}
]
}
}
}Deployment steps
1. Login and set your subscription
az login
az account set --subscription "<your-subscription-id>"2. Create a resource group
az group create --name rg-ai-lab --location eastus3. Deploy the Bicep template
az deployment group create \
--resource-group rg-ai-lab \
--template-file main.bicep \
--parameters @parameters.jsonValidation
After deployment completes, connect to your VM:
ssh azureuser@<public-ip>Check the GPU status:
nvidia-smi✅ Expected output: A list showing your NVIDIA GPU (e.g., Tesla T4, V100) with active driver and CUDA version.
Optional: Install NVIDIA drivers manually
If drivers are not pre-installed:
sudo apt update && sudo apt install -y build-essential dkms
wget https://us.download.nvidia.com/XFree86/Linux-x86_64/535.54.03/NVIDIA-Linux-x86_64-535.54.03.run
chmod +x NVIDIA-Linux-*.run
sudo ./NVIDIA-Linux-*.runReboot and re-check:
sudo reboot
nvidia-smiNext steps
Attach a data disk or mount a Blob Storage container for datasets
Containerize your model inference workload with Docker + CUDA
Connect to Azure ML Workspace for managed experimentation
Cleanup
To remove the VM and its resources:
az group delete --name rg-ai-lab --yes --no-waitReferences
Last updated
Was this helpful?