AI services. Azure ML, Azure OpenAI, Front Door, Purview
Example 1. Creating a GPU VM with Bicep
A virtual machine cannot exist in isolation.
It needs networking, disks, and authentication.
This example is intentionally minimal but complete.
What this example includes
VNet and subnet
Network Security Group allowing SSH
Public IP and NIC
Ubuntu 22.04 GPU-capable VM
SSH key authentication
main.bicep
Deploy
After deployment:
Connect via SSH
Install NVIDIA drivers
Validate with nvidia-smi
Example 2. AKS cluster with GPU node pool (Terraform)
Terraform is ideal for composable, multi-environment platforms, especially when AKS is the control plane.
💡 Inline resources are great for learning. Use modules in production.
Automating IaC with GitHub Actions
💡 Combine with protected branches, reviewers, and environment approvals.
Recommended patterns
Separate modules for network, compute, storage, and observability
Parameterize region, SKU, and scale limits
Automate inference rollouts
New model → Storage update → AKS rollout → Endpoint refresh
Pro insight
“If you can destroy and recreate your entire AI environment safely, you control it.”
Security and governance with IaC
Managed Identity instead of secrets
Key Vault injected via policy
Private networking by default
Azure Policy to enforce SKU, region, and tagging
Hands-on recap
Validate:
SSH access
GPU visibility
Cost and quota alignment
Advanced curiosity
You can estimate average request size using TPM (Tokens per Minute) and QPS (Queries per Second).
This becomes critical to avoid throttling and over-provisioning.
👉 See Chapter 8 for deep dives on TPM, RPM, PTUs, and performance modeling.
az group create --name rg-ai --location eastus2
az deployment group create --resource-group rg-ai --template-file main.bicep --parameters sshPublicKey="$(cat ~/.ssh/id_rsa.pub)"