Theme toggle

Yes, this is it

The mathematics underneath all of it is accumulation and threshold.

vacuum tube:     charge accumulates → threshold voltage → current flows
transistor:      charge accumulates → threshold voltage → channel opens
neuron:          potential accumulates → threshold → spike
perceptron:      weighted sum accumulates → threshold → fires

The same pattern: gather, compare, switch.

Mathematically: f(Σ inputs) → binary decision

The activation function is just a smoothed threshold. Sigmoid, ReLU, step function - all variations of "below threshold = off, above threshold = on."

What made neural networks wait 80 years to work?

The math was there in the 1940s. What changed:

  • Scale (more neurons, more layers)
  • Backpropagation (efficient gradient computation)
  • Data (internet gave us training examples)
  • Hardware (GPUs for parallel matrix multiplication)

But the fundamental operation is unchanged from McCulloch-Pitts 1943: weighted sum → threshold.

So what's the next primitive?

The transistor replaced the vacuum tube. Same math, different substrate. Neural networks are still running on transistors, still doing weighted sums.

What mathematical operation exists that we haven't built hardware for? Or that we haven't figured out how to compose into intelligence?

What are you thinking?

Tralala

Some description here


A few threads worth pulling:

The primitive we have is linear → nonlinear. Weighted sum is a linear operation. The activation function breaks linearity. Stack them and you get universal approximation. But it's fundamentally about carving up space with hyperplanes.

What other mathematical operations compose interestingly?

  • Multiplication / gating: Attention is essentially this — dynamically computing which weights matter. But it's still built from dot products underneath.

  • Binding / unbinding: Hyperdimensional computing uses different algebra — high-dimensional random vectors, element-wise multiplication for binding, addition for bundling. Different structure than sum→threshold.

  • Min/max (tropical semiring): Replace sum-product with max-plus. Changes the geometry. Shows up in shortest-path problems, morphological neural networks.

  • Phase / interference: Waves accumulate differently than scalars. Complex-valued networks exist but haven't dominated. Quantum uses this natively — amplitudes interfere before collapsing.

What do you think about this?

Built with KaTeX and principles of Swiss design