May 7, 2026

Preparing for the AWS AI Practitioner Certification

A condensed, exam-shaped study guide — concepts, AWS services, and the question patterns that actually show up.

These are my consolidated notes for AWS Certified AI Practitioner (AIF-C01). The exam is foundational — broad rather than deep — so the trick is recognising the pattern in each question, not memorising algorithm internals. I’ve organised this around how the questions actually read.

Exam logistics

Part 1 — Machine learning foundations

The vocabulary that trips people up

TermMeaning
FeatureAn input attribute. (Square footage of a house.)
LabelThe target output. (House price.)
WeightsHow important each feature is to the model’s prediction.
ParametersInternal variables learned during training.

Parameters = weights + biases + others. All weights are parameters; not all parameters are weights. Expect a question on this.

When ML is the right tool — and when it isn’t

Good fits: recommendations, demand forecasting, image/video recognition, sentiment analysis, translation, personalization, fraud detection.

Bad fits — pick these in “when should we not use ML?” questions:

The three learning paradigms

ParadigmDataWhat it doesExamples
SupervisedLabeledPredictSpam classification, house-price regression
UnsupervisedUnlabeledDiscover patternsCustomer clustering, anomaly detection
ReinforcementReward signalsLearn from interactionGame-playing, AWS DeepRacer

Supervised algorithms worth knowing

AlgorithmTaskTypical use
Linear regressionRegressionSales / price forecasting
Logistic regressionClassificationSpam detection
Decision treeBothCredit approval
Random forestBothFraud detection
Support Vector Machine (SVM)BothImage classification
K-Nearest Neighbours (KNN)BothDistance-based recommendations
Naive BayesClassificationText classification
Gradient boostingBothTabular Kaggle wins
Neural networks / deep learningBothMost modern systems

Unsupervised algorithms

The ML lifecycle

Business problem
  → Problem formulation
  → Data collection & integration
  → Preprocessing & visualization
  → Model training
  → Evaluation
  → Tuning & feature engineering
  → Deployment
  → Monitoring → (loop back to data)

Two stages that look similar but aren’t:

Inference modes (memorise this — easy points)

ModeUse when
Real-timeLow-latency interactive predictions; persistent endpoint
ServerlessIntermittent traffic; you don’t want idle cost
AsynchronousPayloads up to 1 GB, processing up to 1 hour, queue-based
Batch transformLarge offline jobs, no endpoint needed

Part 2 — Foundation models and Generative AI

The hierarchy

AI ⊃ Machine Learning ⊃ Deep Learning ⊃ Generative AI

Foundation Models (FMs)

FM lifecycle

Data Selection → Pre-training → Optimization → Evaluation
  → Deployment → Feedback & continuous improvement → (loop)

LLMs — a special case of FM

Input → Tokenization & encoding → Word embedding → Decoding → Output

Limitations to remember

Part 3 — Customizing foundation models

The AWS exam loves the effort/cost spectrum:

Prompt Engineering  →  RAG  →  Fine-tuning  →  Continued Pre-training  →  Train from scratch
       ↑ cheap, fast                                         expensive, slow ↑

Pick the leftmost option that solves the problem.

Prompt engineering techniques

TechniqueWhat it is
Zero-shotJust ask the question; no examples
One-shotShow one example of the task
Few-shotShow 2+ examples
Chain-of-thought”Think step by step” — surfaces reasoning
Negative promptingExplicitly say what not to include
ReActChain-of-thought + tool/API calls (e.g., REST integration)

Prompt templates are reusable formats with placeholders for variable input. They give you consistency, fewer errors, and easy iteration. Show up in workflow questions.

RAG (Retrieval-Augmented Generation)

Inject external knowledge at query time. Pick RAG when:

AWS vector stores you should recognize:

Fine-tuning vs. continued pre-training

Fine-tuningContinued pre-training
DataLabeled, domain-specificUnlabeled, domain-specific
GoalImprove a specific taskAdapt the model to a new domain
Modifies weights?YesYes

If a question mentions labeled data → fine-tuning. Unlabeled → continued pre-training.

Inference-time parameters

Original tokens → Temperature → Top-K → Top-P → Random selection

“Reduce randomness in the LLM’s output” → lower the temperature.

Guardrails

Part 4 — Evaluation metrics

Classification

MetricUse
AccuracyOverall % correct (misleading on imbalanced data)
PrecisionOf predicted positives, how many were truly positive
RecallOf actual positives, how many we caught
F1Harmonic mean of precision and recall — best for imbalanced datasets
AUC-ROCBinary classification; especially imbalanced scenarios like fraud detection
Confusion matrixVisualizes TP / TN / FP / FN

F1 is the go-to answer for imbalanced binary classification unless the question stresses ranking / threshold tradeoffs — then it’s AUC-ROC.

Regression

Text generation

MetricUse
BLEUMachine translation (compares n-grams to reference)
ROUGESummarization
BERTScoreSemantic similarity using contextual embeddings (e.g., comparing a chatbot’s response to an expert answer)

F1 does not evaluate text generation. Don’t pick it for a generation task.

Bias and variance

If accuracy is high on training data but low on test data, it’s overfitting.

Part 5 — AWS services cheat sheet

SageMaker family

ServiceWhat it does
SageMaker CanvasNo-code ML predictions
SageMaker AutopilotAutoML — automated build & deploy
SageMaker Data WranglerImport, prepare, transform, featurize
SageMaker Feature StoreCentralized feature repository for training & inference
SageMaker JumpStartPre-trained open-source models (great for summarization questions)
SageMaker Model CardsDocument key model details for governance
SageMaker Model MonitorDrift detection in production
SageMaker ClarifyBias reports + bias-drift monitoring
SageMaker inference endpointsHosted prediction endpoints; AWS manages infra

Model Monitor + Clarify together watch four dimensions: data quality, model quality, bias drift, feature attribution drift.

Higher-level AI services (no ML expertise required)

ServiceUse it for
Amazon ComprehendNLP — sentiment, entities, key phrases (e.g., analyzing customer reviews)
Amazon RekognitionImage and video analysis
Amazon TextractExtract text and structured data from documents, PDFs, images (invoices, receipts)
Amazon PersonalizeRecommendations and user-segment targeting (e.g., marketing campaigns)
Amazon QCode assistant — chat about code, completions, security scans, language upgrades
Amazon Augmented AI (A2I)Human-in-the-loop review of ML predictions for high precision

Security, governance, compliance

ServicePurpose
AWS ArtifactSelf-service compliance reports (ISO, SOC, PCI), HIPAA BAAs
AWS Audit ManagerContinuous compliance auditing — including a prebuilt GenAI framework
AWS ConfigResource configuration history & compliance
AWS Trusted AdvisorBest-practice recommendations across cost, security, performance, resilience
Amazon GuardDutyML-based threat detection across AWS accounts
Amazon MacieDiscover sensitive data inside S3 buckets
Amazon InspectorVulnerability scanning for workloads

Data lineage = tracking the flow and transformation of data, for privacy/compliance. Recognize the term.

Part 6 — Question patterns that show up

If you see these phrases, the answer is almost always:

The question says…The answer is…
”Generates new data resembling existing data”GAN (Generative Adversarial Network)
“AI plays a complex strategy game”Reinforcement learning
”Adapt a pre-trained model to a new task”Transfer learning
”Generate images from text descriptions”Diffusion models / Stable Diffusion
”High train accuracy, low test accuracy”Overfitting
”Reduce randomness in LLM output”Lower the temperature
”Real-time inference up to 1 GB / 1 hour”Asynchronous inference
”Find sensitive data in S3”Macie
”Threat detection across AWS”GuardDuty
”Analyze customer review sentiment”Comprehend
”Extract data from invoices / receipts”Textract
”Audit a GenAI app for compliance”Audit Manager (prebuilt framework)
“Recommend a customer segment for a campaign”Personalize
”Document model details for governance”SageMaker Model Cards
”Imbalanced binary classification metric”F1 or AUC-ROC
”Bias report on the model”SageMaker Clarify
”No-code ML for business users”SageMaker Canvas
”AutoML — build & deploy automatically”SageMaker Autopilot
”Pre-trained model for summarization”SageMaker JumpStart

Final prep tips

  1. Read AWS’s official exam guide once more in the last week. It’s short and tells you exactly what they care about.
  2. Memorize the SageMaker family. It’s worth roughly 8–10 questions.
  3. Know inference modes by their constraints — payload size, latency, async vs. sync. Easy points.
  4. Watch for “does NOT do X” wording. The trap is usually a service that almost fits but is missing a feature (e.g., Rekognition can’t summarize text; Personalize can’t generate images).
  5. Hit 80%+ on a full-length practice exam before booking the real one.

Good luck. The exam rewards pattern recognition over depth — if you’ve internalised the cheat sheet above, you’ll have time to spare.