ML Problem Types

Correctly framing your problem is the most critical first step in any ML project. Choosing the wrong problem type leads to wrong algorithms, wrong metrics, and wasted effort. Understanding the distinction between supervised and unsupervised learning, and between classification, regression, and clustering, will guide every downstream decision.

Quick Reference

Problem Type	Output	Examples	Algorithms
Binary Classification	Yes/No, 0/1	Fraud or not, churn or not, spam or not	XGBoost, Logistic Regression, Linear Learner, Random Forest
Multi-class Classification	One of N categories	Image labels, document type, product category	XGBoost, Random Forest, CNN, BlazingText
Regression	Continuous numeric value	Price prediction, demand quantity, temperature	XGBoost, Linear Learner, Linear Regression
Forecasting	Future values over time	Sales forecast, demand planning, stock prices	DeepAR, ARIMA, CNN-QR, Exponential Smoothing
Clustering	Group assignments (no labels)	Customer segmentation, anomaly grouping	K-Means, DBSCAN
Anomaly Detection	Normal vs anomalous	Fraud detection, defect detection, network intrusion	Random Cut Forest, Isolation Forest, IP Insights
Recommendation	Ranked items for a user	Product recommendations, content suggestions	Factorization Machines, Collaborative Filtering
Topic Modeling	Topics within documents	Categorize news articles, discover themes	LDA, Neural Topic Model (NTM)
Object Detection	Bounding boxes + labels	Find cars in images, detect faces	SSD, YOLO, Faster R-CNN
Semantic Segmentation	Pixel-level labels	Autonomous driving, medical imaging	FCN, U-Net, DeepLab

Decision Flow

Has labeled data? → Supervised (classification, regression, forecasting)
No labels? → Unsupervised (clustering, anomaly detection, topic modeling)
Predict a category? → Classification
Predict a number? → Regression
Predict future values? → Forecasting
Group similar items? → Clustering
Find unusual items? → Anomaly Detection

Flashcards

1 / 10

Question

What type of ML problem is 'predict whether a transaction is fraudulent or not'?

Click to reveal

Answer

Binary Classification — the output is one of two categories (fraud/not fraud). Common algorithms: XGBoost, Logistic Regression, Linear Learner.

Common Misconception

"Identify groups of customers" = Clustering (K-Means), NOT Semantic Segmentation. "Customer segmentation" is a business term for clustering. Semantic Segmentation is a computer vision technique for pixel-level image labeling.

Quick Reference​

Decision Flow​

Flashcards​

Quick Reference

Decision Flow

Flashcards