Building an Inference API with YAML

Objective

Deploy a scikit-learn model as a Managed Online Endpoint in Azure Machine Learning (AML) using YAML + Azure ML CLI v2.

You will:

  1. Create or reuse an Azure ML workspace

  2. Train a tiny model locally (diabetes regression), producing model.pkl

  3. Register the model in Azure ML

  4. Create a managed online endpoint

  5. Create a deployment from YAML and send all traffic to it

  6. Invoke the endpoint and validate results

  7. Clean up

This lab is intentionally written step-by-step and assumes you are new to AML endpoints.


Prerequisites

  • Azure CLI installed

  • Azure ML CLI extension installed:

    az extension add -n ml -y
    az extension update -n ml
  • kubectl is not required

  • Python 3.9+ locally

  • RBAC: Contributor on the resource group, plus permissions for AML operations

References:

  • Online endpoint YAML schema: https://learn.microsoft.com/azure/machine-learning/reference-yaml-endpoint-online?view=azureml-api-2

  • Managed online deployment YAML schema: https://learn.microsoft.com/azure/machine-learning/reference-yaml-deployment-managed-online?view=azureml-api-2

  • Azure ML inference server guidance: https://learn.microsoft.com/azure/machine-learning/how-to-inference-server-http?view=azureml-api-2


Folder structure


Step 0: Set defaults (subscription, RG, workspace)

Set defaults so you can omit --resource-group and --workspace-name later:

Quick sanity check:


Step 1: Train a tiny sample model locally

Install deps:

Train:

Expected:

  • ./model/model.pkl is created


Step 2: Register the model in Azure ML

Confirm:


Step 3: Create the online endpoint

The endpoint YAML contains only endpoint-level settings (name, auth, identity). Deployments are created separately.

Wait until provisioning completes:


Step 4: Create the deployment and route traffic

Check status:

If it fails, get logs:


Step 5: Invoke the endpoint

Get the scoring URI:

Invoke via CLI (recommended for first test):

Expected output:

  • JSON list of numeric predictions


Step 6: Optional hardening (quick pointers)

  • Switch to Entra-based auth (aad_token) for enterprise use cases (instead of key). ξˆ€citeξˆ‚turn0search10

  • Add Private Link for private endpoints and lock down public access for production (not covered in this lab)

  • Use managed identity for downstream access (Storage, Key Vault). ξˆ€citeξˆ‚turn0search17


Cleanup

Delete deployment first (optional):

Delete endpoint:

Optionally delete the resource group:


Troubleshooting quick guide

Endpoint created but deployment is stuck

  • Check logs:

401 Unauthorized

  • If using auth_mode: key, fetch keys:

Import errors in scoring

  • Ensure environment.yml and requirements.txt include your needed packages

  • Prefer minimal requirements. Large environments slow down deployment startup. ξˆ€citeξˆ‚turn0search2

Last updated