Skip to main content

Neural Network Architectures

Neural networks are the foundation of deep learning. Understanding which architecture to use for which problem type — and why — is essential for building effective ML systems. This section covers the major architectures, activation functions, and the powerful technique of transfer learning.

Quick Reference​

Architectures​

ArchitectureBest ForHow It Works
CNNImages: classification, detection, segmentationConvolutional layers extract spatial features → Pooling reduces dimensionality → FC layers classify
RNNSequential data: text, time seriesHidden state passes info between time steps. Suffers from vanishing gradient on long sequences
LSTMLong sequences: text, time series, speechGates (forget, input, output) control information flow. Solves vanishing gradient
GRUSame as LSTM but simpler/faster2 gates vs LSTM's 3. Fewer parameters. Choose when speed matters
TransformerNLP: translation, text generation, classificationSelf-attention processes all positions in parallel. Basis for BERT, GPT
BERTNLP: text classification, entity recognition, Q&ABidirectional Transformer, pre-trained, fine-tune for specific tasks
Encoder-Decoder (Seq2Seq)Translation, summarization, chatbotsEncoder compresses input → Decoder generates output. Attention helps focus
AutoencoderDimensionality reduction, anomaly detection, denoisingEncoder compresses → bottleneck → Decoder reconstructs

Activation Functions​

FunctionOutput RangeUse Case
Sigmoid0 to 1Binary classification output layer
Softmax0 to 1 (sums to 1)Multi-class classification output layer
ReLU0 to infinityHidden layers (default choice)
Tanh-1 to 1Hidden layers, RNNs
Linear-infinity to infinityRegression output layer

Transfer Learning Steps​

  1. Load pre-trained model (e.g., ImageNet ResNet, BERT)
  2. Keep all pre-trained weights in early/middle layers (universal features)
  3. Replace the last fully connected (FC) layer with new layer matching your number of classes
  4. Fine-tune on your dataset (optionally freeze early layers)

Flashcards​

1 / 10
Question

Which neural network architecture should you use for image data?

Click to reveal
Answer

CNN (Convolutional Neural Network) — convolutional layers extract spatial features (edges, shapes, textures), pooling layers reduce dimensionality, and fully connected layers produce the final classification.

Common Misconception

Binary classification output = sigmoid (or softmax with 2 neurons). Regression output = linear. Multi-class = softmax. Never use linear activation for classification output — this is a common mistake that leads to unconstrained output values.