Neural Network Architectures
Neural networks are the foundation of deep learning. Understanding which architecture to use for which problem type — and why — is essential for building effective ML systems. This section covers the major architectures, activation functions, and the powerful technique of transfer learning.
Quick Reference​
Architectures​
| Architecture | Best For | How It Works |
|---|---|---|
| CNN | Images: classification, detection, segmentation | Convolutional layers extract spatial features → Pooling reduces dimensionality → FC layers classify |
| RNN | Sequential data: text, time series | Hidden state passes info between time steps. Suffers from vanishing gradient on long sequences |
| LSTM | Long sequences: text, time series, speech | Gates (forget, input, output) control information flow. Solves vanishing gradient |
| GRU | Same as LSTM but simpler/faster | 2 gates vs LSTM's 3. Fewer parameters. Choose when speed matters |
| Transformer | NLP: translation, text generation, classification | Self-attention processes all positions in parallel. Basis for BERT, GPT |
| BERT | NLP: text classification, entity recognition, Q&A | Bidirectional Transformer, pre-trained, fine-tune for specific tasks |
| Encoder-Decoder (Seq2Seq) | Translation, summarization, chatbots | Encoder compresses input → Decoder generates output. Attention helps focus |
| Autoencoder | Dimensionality reduction, anomaly detection, denoising | Encoder compresses → bottleneck → Decoder reconstructs |
Activation Functions​
| Function | Output Range | Use Case |
|---|---|---|
| Sigmoid | 0 to 1 | Binary classification output layer |
| Softmax | 0 to 1 (sums to 1) | Multi-class classification output layer |
| ReLU | 0 to infinity | Hidden layers (default choice) |
| Tanh | -1 to 1 | Hidden layers, RNNs |
| Linear | -infinity to infinity | Regression output layer |
Transfer Learning Steps​
- Load pre-trained model (e.g., ImageNet ResNet, BERT)
- Keep all pre-trained weights in early/middle layers (universal features)
- Replace the last fully connected (FC) layer with new layer matching your number of classes
- Fine-tune on your dataset (optionally freeze early layers)
Flashcards​
Which neural network architecture should you use for image data?
Click to revealCNN (Convolutional Neural Network) — convolutional layers extract spatial features (edges, shapes, textures), pooling layers reduce dimensionality, and fully connected layers produce the final classification.
Binary classification output = sigmoid (or softmax with 2 neurons). Regression output = linear. Multi-class = softmax. Never use linear activation for classification output — this is a common mistake that leads to unconstrained output values.