🧪Mini-Labs Overview

Welcome to the hands-on labs section of the AI for Infra Pros — The Practical Handbook for Infrastructure Engineers. Each lab demonstrates how to apply the infrastructure concepts from the book in real-world Azure environments.

Lab scope and expectations

These labs are infrastructure-focused and designed for:

  • Provisioning GPU-enabled environments

  • Deploying inference-ready workloads

  • Validating performance, access, and observability

They do not cover:

  • Model training or fine-tuning

  • Data science experimentation

  • Advanced MLOps pipelines

The goal is to help infrastructure engineers confidently run AI workloads, not build models from scratch.

Lab index

Lab
Description
Technologies

Provision an Azure Kubernetes Service cluster with a dedicated GPU node pool for AI workloads.

Terraform, AKS, GPU, IaC

Deploy a single GPU-enabled VM using Azure Bicep to host AI inference workloads.

Bicep, Azure CLI, NVIDIA Drivers

Publish a trained model as an inference endpoint using Azure Machine Learning and YAML configuration.

Azure ML, YAML, CLI, REST API

Prerequisites

Before running any of the labs:

  • Have an active Azure Subscription

  • Install the latest Azure CLI

  • Install Terraform and/or Bicep depending on the lab

  • Ensure GPU quotas are available in your target region

    • Common SKUs:

      • Standard_NCas_T4_v3 (T4 inference)

      • Standard_NC6s_v3 (V100)

    • Check quotas with:

  • Install and update the Azure ML CLI extension:

    Tested with Azure CLI >= 2.55.0

  • Authenticate with Azure:

  • Have sufficient permissions (Owner or Contributor on the target Resource Group)

⚠️ Cost warning

These labs may create GPU-backed resources, which can incur significant costs if left running.

Always:

  • Use the smallest GPU SKU possible

  • Complete validation steps promptly

  • Delete resource groups after finishing

GPU resources can cost $0.90–$30+/hour depending on SKU.

Lab workflow

All labs follow a similar structure:

  1. Provision infrastructure (VM, AKS, or AML workspace)

  2. Configure access, security, and monitoring

  3. Deploy models or containers for inference

  4. Validate performance and connectivity

  5. Clean up resources to avoid unnecessary costs

Recommendations

  • Prefer West US 3 or West Europe — they historically offer broader GPU SKU availability, but quotas still apply

  • Always tag resources with project and owner names

  • Store deployment logs for auditing and rollback

  • For production-grade deployments, add Private Endpoints and Azure Policy validation

Cleanup reminder

After finishing a lab, remember to delete the created resources to prevent billing surprises:

References

“You don’t scale AI with PowerPoint — you scale it with Infrastructure as Code.”

Last updated