Creating an AKS Cluster with GPU using Terraform
Objective
Provision an Azure Kubernetes Service (AKS) cluster with a dedicated GPU node pool for AI and inference workloads.
This lab demonstrates how to:
Use Terraform for Infrastructure-as-Code (IaC)
Deploy a GPU-enabled AKS node pool
Prepare your cluster for AI workloads such as Azure OpenAI, MLflow, or custom inference containers
Prerequisites
Before running this lab, make sure you have:
✅ Terraform CLI installed (v1.5+ recommended)
✅ Access to an Azure subscription with permission to create resource groups and AKS clusters
✅ A quota for GPU-enabled VM SKUs (e.g.,
Standard_NC6s_v3orStandard_NCas_T4_v3)
Folder Structure
terraform-aks-gpu/
├── main.tf
├── variables.tf
├── outputs.tf
└── README.mdConfiguration
1. Define environment variables
Authenticate and set your default subscription:
az login
az account set --subscription "<your-subscription-id>"2. Initialize Terraform
terraform init3. Review and validate the plan
terraform plan -out=tfplan4. Apply the configuration
terraform apply "tfplan"What this deployment creates
Resource group
A logical container for all deployed resources
AKS cluster
Managed Kubernetes cluster configured with default node pool
GPU node pool
A secondary node pool using Standard_NC6s_v3 (or similar)
Managed identity
Used for AKS and node pool operations
Network resources
VNet, subnets, NSG (if defined)
Validation
After deployment, verify your GPU node pool:
az aks nodepool list \
--cluster-name aks-ai-cluster \
--resource-group rg-ai-lab \
--query "[].{Name:name,VMSize:vmSize,NodeCount:count,Mode:mode}"You can also connect to your cluster:
az aks get-credentials --resource-group rg-ai-lab --name aks-ai-cluster
kubectl get nodes -o wideCheck that the GPU node pool is labeled and ready:
kubectl get nodes -l "agentpool=gpu"Next steps
Install the NVIDIA device plugin to expose GPU metrics
Deploy your first inference workload (see
yaml-inferencia-api/lab)Integrate monitoring with Prometheus + DCGM Exporter
Cleanup
To remove all resources:
terraform destroyReferences
Last updated
Was this helpful?