ML Problem Types
Correctly framing your problem is the most critical first step in any ML project. Choosing the wrong problem type leads to wrong algorithms, wrong metrics, and wasted effort. Understanding the distinction between supervised and unsupervised learning, and between classification, regression, and clustering, will guide every downstream decision.
Quick Reference​
| Problem Type | Output | Examples | Algorithms |
|---|---|---|---|
| Binary Classification | Yes/No, 0/1 | Fraud or not, churn or not, spam or not | XGBoost, Logistic Regression, Linear Learner, Random Forest |
| Multi-class Classification | One of N categories | Image labels, document type, product category | XGBoost, Random Forest, CNN, BlazingText |
| Regression | Continuous numeric value | Price prediction, demand quantity, temperature | XGBoost, Linear Learner, Linear Regression |
| Forecasting | Future values over time | Sales forecast, demand planning, stock prices | DeepAR, ARIMA, CNN-QR, Exponential Smoothing |
| Clustering | Group assignments (no labels) | Customer segmentation, anomaly grouping | K-Means, DBSCAN |
| Anomaly Detection | Normal vs anomalous | Fraud detection, defect detection, network intrusion | Random Cut Forest, Isolation Forest, IP Insights |
| Recommendation | Ranked items for a user | Product recommendations, content suggestions | Factorization Machines, Collaborative Filtering |
| Topic Modeling | Topics within documents | Categorize news articles, discover themes | LDA, Neural Topic Model (NTM) |
| Object Detection | Bounding boxes + labels | Find cars in images, detect faces | SSD, YOLO, Faster R-CNN |
| Semantic Segmentation | Pixel-level labels | Autonomous driving, medical imaging | FCN, U-Net, DeepLab |
Decision Flow​
- Has labeled data? → Supervised (classification, regression, forecasting)
- No labels? → Unsupervised (clustering, anomaly detection, topic modeling)
- Predict a category? → Classification
- Predict a number? → Regression
- Predict future values? → Forecasting
- Group similar items? → Clustering
- Find unusual items? → Anomaly Detection
Flashcards​
What type of ML problem is 'predict whether a transaction is fraudulent or not'?
Click to revealBinary Classification — the output is one of two categories (fraud/not fraud). Common algorithms: XGBoost, Logistic Regression, Linear Learner.
"Identify groups of customers" = Clustering (K-Means), NOT Semantic Segmentation. "Customer segmentation" is a business term for clustering. Semantic Segmentation is a computer vision technique for pixel-level image labeling.