🧪Mini-Labs Overview

Welcome to the hands-on labs section of the AI for Infra Pros — The Practical Handbook for Infrastructure Engineers. Each lab demonstrates how to apply the infrastructure concepts from the book in real-world Azure environments.

Lab scope and expectations

These labs are infrastructure-focused and designed for:

Provisioning GPU-enabled environments
Deploying inference-ready workloads
Validating performance, access, and observability

They do not cover:

Model training or fine-tuning
Data science experimentation
Advanced MLOps pipelines

The goal is to help infrastructure engineers confidently run AI workloads, not build models from scratch.

Lab index

Lab

Description

Technologies

Terraform AKS GPU Cluster

Provision an Azure Kubernetes Service cluster with a dedicated GPU node pool for AI workloads.

Terraform, AKS, GPU, IaC

Bicep VM with GPU

Deploy a single GPU-enabled VM using Azure Bicep to host AI inference workloads.

Bicep, Azure CLI, NVIDIA Drivers

YAML Inference API (Azure ML)

Publish a trained model as an inference endpoint using Azure Machine Learning and YAML configuration.

Azure ML, YAML, CLI, REST API

Prerequisites

Before running any of the labs:

Have an active Azure Subscription
Install the latest Azure CLI
Install Terraform and/or Bicep depending on the lab
Ensure GPU quotas are available in your target region
- Common SKUs:
  - Standard_NCas_T4_v3 (T4 inference)
  - Standard_NC6s_v3 (V100)
- Check quotas with:
  az vm list-usage --location eastus --output table
Install and update the Azure ML CLI extension:
```
az extension add -n ml
az extension update -n ml
```
Tested with Azure CLI >= 2.55.0
Authenticate with Azure:
```
az login
```
Have sufficient permissions (Owner or Contributor on the target Resource Group)

⚠️ Cost warning

These labs may create GPU-backed resources, which can incur significant costs if left running.

Always:

Use the smallest GPU SKU possible
Complete validation steps promptly
Delete resource groups after finishing

GPU resources can cost $0.90–$30+/hour depending on SKU.

Lab workflow

All labs follow a similar structure:

Provision infrastructure (VM, AKS, or AML workspace)
Configure access, security, and monitoring
Deploy models or containers for inference
Validate performance and connectivity
Clean up resources to avoid unnecessary costs

Recommendations

Prefer West US 3 or West Europe — they historically offer broader GPU SKU availability, but quotas still apply
Always tag resources with project and owner names
Store deployment logs for auditing and rollback
For production-grade deployments, add Private Endpoints and Azure Policy validation

Cleanup reminder

After finishing a lab, remember to delete the created resources to prevent billing surprises:

az group delete --name <your-resource-group> --yes --no-wait

References

“You don’t scale AI with PowerPoint — you scale it with Infrastructure as Code.”

PreviousVisual Glossary - Translating AI for Infrastructure Engineers NextCreating an AKS Cluster with GPU using Terraform

Last updated 1 month ago

hashtagLab scope and expectations

hashtagLab index

hashtagPrerequisites

hashtag⚠️ Cost warning

hashtagLab workflow

hashtagRecommendations

hashtagCleanup reminder

hashtagReferences