Visual Glossary - Translating AI for Infrastructure Engineers
“Infrastructure and AI don’t speak different languages — they just have distinct technical dialects.”
Overview
This visual glossary was created for professionals who already master infrastructure, networking, automation, and observability, and want to understand how those concepts translate into the world of Artificial Intelligence.
Each term includes:
✅ A practical definition 🔄 An analogy to the infrastructure world 💡 A real-world application in technical operations
Terms Table: Infrastructure ↔ Artificial Intelligence
Inference
Running a trained model with new data to generate a response.
Like a GET request that returns a prediction or computation.
Training
Teaching a model using labeled examples.
Like setting a performance baseline through repeated tests.
Model
The trained artifact that represents the AI’s “brain.”
Like a VM image or OVA ready to deploy in production.
Dataset
The data used to train or test a model.
Like log input in a SIEM or historical metrics in monitoring.
GPU
Graphics processor optimized for massive parallel computation.
Like an NVMe SSD — expensive but critical for performance.
TPU
AI-specific chip (Tensor Processing Unit).
Like a dedicated hardware appliance for acceleration.
Inference Latency
Time between model input and response.
Like ping between app and database — just as critical.
Fine-tuning
Adjusting an existing model with domain-specific data.
Like customizing a base IaC template with environment-specific parameters.
Embedding
Numeric vector representing semantic meaning of text or image.
Like a semantic hash — searching by “idea,” not word.
Vector Database
Database that stores and retrieves embeddings via similarity search.
Like DNS — but for meanings (“find me something similar”).
LLM (Large Language Model)
Model trained on billions of natural language parameters.
Like an operating system for AI — the base for other applications.
Prompt
Text sent to the model to guide its output.
Like a SQL query — but for intelligent text.
Prompt Injection
Malicious input designed to override model instructions.
Like a SQL Injection on a model API.
Token
Fragment of text processed by the model.
Like a network packet — the model reads in chunks, not words.
Rate limiting / Quotas
Limits on requests or tokens over time.
Like API throttling rules on an ingress or gateway.
MLOps
CI/CD, versioning, and lifecycle management for models.
Like a CI/CD pipeline for machine learning.
Azure Machine Learning (AML)
Managed platform for AI development and deployment.
Like Azure DevOps — but for models and pipelines.
Inference Endpoint
Public or private API exposing a trained model.
Like an App Service or Function — but for AI inference.
RAG (Retrieval Augmented Generation)
Combines LLMs with local data retrieval.
Like querying an indexed datastore before generating a response.
If You Already Understand Infrastructure...
Provision VMs with specific specs
Create inference endpoints with allocated GPU and memory
Balance traffic with health probes
Scale model APIs using latency and error metrics
Automate deploys with Bicep/Terraform
Deploy models using YAML or CLI in Azure ML
Troubleshoot using logs and metrics
Observe inference with Application Insights and GPU metrics
Replicate databases
Retrain models with updated data
Use SNMP/telemetry
Monitor GPU usage via Prometheus and DCGM
Create failover with Front Door
Configure multi-region fallback across endpoints
Visual diagrams
1. AI model lifecycle

Visualizes how models move from training to inference and continuous improvement.
2. Simplified infrastructure architecture for AI

Shows how networking, compute, security, and observability support AI workloads.
Quick checklists
AI environment readiness
Performance and cost
Security and governance
Practical use cases
Case 1: Internal chat with Azure OpenAI (Standard)
Scenario: Internal chatbot using AKS and Azure OpenAI. Challenge: High latency and throttling. Solution:
Implement local caching for repeated prompts
Monitor with Application Insights
Migrate to PTU-C for stable latency
Case 2: Data extraction on GPU VMs
Scenario: Automated batch inference on PDFs. Solution:
Automation using Azure CLI and Terraform
Execute during off-peak windows (Spot VMs)
Centralized logging in Log Analytics
Case 3: Multi-region deployment with fallback
Scenario: Global startup using GPT-4 across multiple regions. Solution:
Azure Front Door with health probes
Retry logic via API Management
Token quota watchdog per region
Best practices for infrastructure professionals
Training is expensive. Inference is constant.
Prompt = input. Model = brain. Response = output.
Idle GPU equals wasted cost.
AI logs may contain sensitive data. Always encrypt.
Tokens directly impact both cost and latency. Optimize continuously.
Conclusion
This glossary was built to help infrastructure professionals feel confident and fluent in applied AI vocabulary. You already master the essentials — now you speak the language too.
“From VMs to inference, from logs to tokens — the future of infrastructure is cognitive.”
Last updated