DEC 18
2025
Coffee time
Yes, I believe it was interesting mathematics, but nothing more than "cute"
The activation function is just a smoothed threshold. Sigmoid, ReLU, step function - all variations of "below threshold = off, above threshold = on."
What made neural networks wait 80 years to work?
The math was there in the 1940s. What changed:
- Scale (more neurons, more layers)
- Backpropagation (efficient gradient computation)
- Data (internet gave us training examples)
- Hardware (GPUs for parallel matrix multiplication)