Building an Inference API with YAML
Objective
This lab walks you through deploying a machine learning model as an inference endpoint using Azure Machine Learning (AML) via a YAML configuration file.
You’ll deploy a simple example model (such as scikit-learn diabetes regression) and expose it securely as an HTTPS endpoint.
Prerequisites
Before starting, make sure you have:
✅ Azure CLI ML extension installed:
az extension add -n ml -y✅ An Azure Machine Learning Workspace
✅ A registered model in the workspace (you can use the sample
sklearn-diabetes)✅ Python 3.9+ and the
azuremlSDK if testing locally✅ Proper RBAC permissions for AML deployments
Folder structure
yaml-inference-api/
├── endpoint.yml
├── score.py
├── requirements.txt
└── README.md1. endpoint.yml — YAML Configuration for the endpoint
name: infer-demo
auth_mode: key
traffic:
blue: 100
deployments:
- name: blue
model: azureml:sklearn-diabetes:1
instance_type: Standard_DS2_v2
code_configuration:
code: ./src
scoring_script: score.pyExplanation
name
The name of your endpoint (unique per workspace)
auth_mode
Authentication mode (key or aml_token)
deployments
List of model versions and configurations
model
Reference to a registered model in your AML workspace
instance_type
VM SKU used to run the endpoint
scoring_script
The Python file that defines the inference logic
2. score.py — Inference logic
Example score.py:
import json
import joblib
import numpy as np
from azureml.core.model import Model
def init():
global model
model_path = Model.get_model_path("sklearn-diabetes")
model = joblib.load(model_path)
def run(raw_data):
try:
data = np.array(json.loads(raw_data)["data"])
result = model.predict(data)
return result.tolist()
except Exception as e:
return json.dumps({"error": str(e)})3. requirements.txt — Dependencies
numpy
scikit-learn
joblib
azureml-coreDeployment steps
Step 1: Log in and set workspace
az login
az account set --subscription "<your-subscription-id>"
az configure --defaults group=<your-rg> workspace=<your-workspace>Step 2: Create the online endpoint
az ml online-endpoint create --name infer-demo --file endpoint.ymlStep 3: Test the endpoint
Once the deployment finishes, you can test the API with:
az ml online-endpoint invoke \
--name infer-demo \
--request-file sample.jsonExample sample.json:
{
"data": [[0.038075906, 0.05068012, 0.06169621, 0.02187235, -0.0442235, -0.03482076, -0.04340085, -0.00259226, 0.01990749, -0.01764613]]
}✅ Expected output: A JSON response with numeric predictions.
Security options
Use
auth_mode: aml_tokento integrate with Azure Active DirectoryAdd Private Endpoint for network isolation
Configure Managed Identity for secure access to other services (like Blob Storage)
Troubleshooting
Endpoint fails to start
Check model dependencies in requirements.txt
HTTP 401 Unauthorized
Verify endpoint auth_mode and key/token
Slow inference
Try using a GPU-based SKU like Standard_NC6s_v3
Endpoint timeout
Increase timeout in AML or optimize model loading
Cleanup
To remove the endpoint and resources:
az ml online-endpoint delete --name infer-demo --yesReferences
Last updated
Was this helpful?