Artificial Intelligence

Neural Networks — The Idea Behind Modern AI

Most explanations of neural networks start with the brain. “It works just like neurons firing.” Then they show a diagram of circles and lines, and you nod along — but you still do not know what the network is actually doing.

The brain analogy is not wrong. It is just not useful. It describes the inspiration, not the mechanism. And the mechanism is what matters — because it is the mechanism that explains why these systems are so capable, and why they fail the way they do.

Every AI model you use today — ChatGPT, Claude, SAP Joule, the image generator, the fraud detection system behind your bank — is built on neural networks. This post explains the idea behind all of them.

🔗 Foundation reading

This post sits between two others on this site. AI vs ML vs Deep Learning places neural networks in the broader AI landscape. How Generative AI Works takes the transformer — a specific type of neural network — and explains how it generates text. Read this post first if either of those feels incomplete.

The brain analogy — useful and misleading

In the human brain, neurons fire electrical signals to other neurons. Whether a neuron fires depends on the strength of the signals it receives. Over time, connections that are used frequently get stronger. That is learning, in biological terms.

Artificial neural networks borrow this structure. Connected nodes. Signals that pass between them. Connections that strengthen during training. The analogy is genuine enough.

But biological neurons are electrochemical systems of staggering complexity. Artificial neurons are something far simpler — they are mathematical functions. Once you understand the function, the rest follows.

What a neuron actually does

Strip away everything and a single artificial neuron does four things. It takes one or more numbers as input. It multiplies each input by a weight — a number that says how important that input is. It adds all those results together, plus a bias value. Then it passes the result through an activation function, which decides what signal to send forward.

That is it. Input × weight, sum, activation. A function.

Here is a concrete example. Imagine a neuron trying to help predict whether a house will sell above asking price. Its inputs might be: square footage, number of bedrooms, and proximity to a school. Each input gets a weight reflecting its importance — proximity to a school might carry more weight than an extra bedroom. The neuron sums the weighted inputs and passes the result through an activation function, which produces the output signal.

💡 Note

What the activation function does — Without it, the network is just a chain of multiplications — which collapses into a single linear equation, no matter how many layers you stack. The activation function introduces non-linearity, which is what lets the network learn complex, real-world patterns that a straight line could never capture.

Single artificial neuron diagram on white background showing three inputs with weighted connections to a central neuron, a bias value, an activation function and a single output

How layers build understanding

One neuron is almost useless on its own. The power comes from connecting thousands of them in layers.

Every neural network has three types of layers. The input layer receives the raw data — pixel values for an image, token embeddings for text, numerical values for a spreadsheet. The hidden layers do the work. The output layer produces the final result — a classification, a probability, a generated token.

What makes layers interesting is what each one learns. In an image recognition network, the first hidden layer learns to detect edges — basic lines and contrasts. The next layer combines those edges into shapes, and the layer after that turns shapes into objects.

By the final hidden layers, the network is recognising faces, cars, and cats — not because anyone told it to, but because that is what emerged from training on millions of images.

📌 The key insight

Nobody programmes the features. Nobody tells the network “look for ears in this layer.” The network finds the features itself, through training. This is what makes neural networks different from every AI approach that came before — the representation is learned, not designed.

Neural network layers diagram on white background showing input layer, three hidden layers with increasing abstraction from edges to shapes to objects, and output layer with colour-coded nodes

How a network learns — weights and training

Before training, a network’s weights are set randomly. It makes predictions — and they are terrible. Training is the process of making them less terrible, systematically.

Here is how one training cycle works. The network takes a labelled example — say, an image of a cat with the label “cat” — and passes it forward through every layer. That is the forward pass: data in, prediction out.

That prediction is compared to the correct answer. The difference is measured as a loss — a single number representing how wrong the network was.

Then comes backpropagation. The error is propagated backwards through the network, layer by layer. For each weight, the algorithm calculates how much that weight contributed to the error. Weights that contributed more to the mistake get adjusted more. This adjustment process, repeated across millions of training examples, is gradient descent — the network is always moving its weights in the direction that reduces error.

📝 Note

Gradient descent in plain English — Imagine you’re blindfolded on a hilly landscape and you want to reach the lowest point. You feel the slope under your feet and take a small step in the downhill direction. Then you feel again and step again. Gradient descent does this in a mathematical landscape of error values, always nudging weights toward lower error. It rarely finds the perfect minimum — but it finds a good one.

After enough training cycles — millions or billions of examples — the weights settle into values that let the network make accurate predictions on data it has never seen before. That generalisation is the whole point.

Why depth matters — and when it doesn’t

More layers mean the network can learn more abstract, more complex patterns. GPT-3’s published architecture has 96 layers — and that depth is a big part of what lets large models handle nuanced language.

But depth is not always the answer. A simple classification problem — spam or not spam, fraud or not fraud based on ten structured fields — often needs nothing more than a shallow network or a different model type entirely. Stacking 50 layers onto a straightforward problem adds training cost and complexity without improving the result.

ArchitectureWhat it isWhat it is used for
Feedforward networkThe basic form — data flows one direction through layersTabular data, simple classification, regression
CNN (Convolutional)Uses filters that slide across input to detect spatial patternsImages, video, medical imaging
RNN (Recurrent)Loops outputs back as inputs — handles sequential dataTime series, older language models
TransformerUses self-attention — every input attends to every other input simultaneouslyLanguage models, LLMs, GPT, Claude, SAP Joule

📝 Note

The transformer is still a neural network — Its architecture is different from a basic feedforward network, but it follows the same principles: layers of nodes, learned weights, trained by backpropagation. How Generative AI Works covers how the transformer uses self-attention specifically.

Side-by-side comparison on white background showing a shallow neural network with one hidden layer on the left and a deep neural network with five hidden layers on the right, illustrating the difference in complexity

At a glance — neural networks

ConceptOne-line summary
Neural networkA mathematical system of connected nodes arranged in layers that learns patterns from data
Neuron (node)A single function: multiply inputs by weights, sum them, apply an activation function
WeightA number controlling how much influence one node has on the next — adjusted during training
BiasAn offset added to the weighted sum — gives each neuron more flexibility to fit the data
Activation functionIntroduces non-linearity — without it, stacking layers has no effect
Input layerReceives the raw data — pixels, token embeddings, numerical values
Hidden layersWhere the learning happens — each layer builds on the abstractions of the previous one
Output layerProduces the final result — a class label, a probability, a generated token
Forward passData moving from input to output through the network, producing a prediction
LossA measure of how wrong the network’s prediction was — the number training aims to reduce
BackpropagationThe algorithm that calculates how much each weight contributed to the error
Gradient descentThe process of adjusting weights step by step to reduce loss over many training examples
Deep learningMachine learning using neural networks with many hidden layers

What to take away

A neural network is not a simulation of a brain. It is a function — a very large, layered mathematical function that gets better at pattern recognition through repetition. The weights start random. Training adjusts them. Given enough data and enough training cycles, the weights settle into values that let the network generalise to examples it has never seen.

That is both simpler and more profound than the brain analogy suggests. Simpler, because the underlying mechanics are just multiplication, addition and a non-linear function — repeated millions of times. More profound, because nobody designs the features the network learns. They emerge. An image network that nobody told to look for ears learns to look for ears because ears are statistically useful for identifying cats.

Every AI model you interact with today — from the LLM answering your questions to the fraud detection system approving your transaction — is a network of weights that were adjusted, layer by layer, until the loss got low enough. That is the idea behind all of it.

🔗 Related posts on this site

How Generative AI Works — Tokens, Embeddings and the Transformer — takes the transformer architecture and explains exactly how it generates text, one token at a time.
AI vs ML vs Deep Learning — What Is the Actual Difference? — places neural networks in the broader landscape: where deep learning ends and machine learning begins.
What is a Large Language Model (LLM)? — the LLM post explains what the model is; this post explains the architecture underneath it.
AI Hallucinations — Why They Happen — hallucination is a direct consequence of how neural networks predict rather than verify — now you understand why.

Published on rakeshnarayan.com — Articles

URL: https://rakeshnarayan.com/articles/neural-networks-the-idea-behind-modern-ai/