Neural Networks — The Idea Behind Modern AI
Most explanations of neural networks start with the brain. “It works just like neurons firing.” Then they show a diagram of circles and lines, and you nod along — but you still do not know what the network is actually doing.
The brain analogy is not wrong. It is just not useful. It describes the inspiration, not the mechanism. And the mechanism is what matters — because it is the mechanism that explains why these systems are so capable, and why they fail the way they do.
Every AI model you use today — ChatGPT, Claude, SAP Joule, the image generator, the fraud detection system behind your bank — is built on neural networks. This post explains the idea behind all of them.
🔗 Foundation reading
This post sits between two others on this site. AI vs ML vs Deep Learning places neural networks in the broader AI landscape. How Generative AI Works takes the transformer — a specific type of neural network — and explains how it generates text. Read this post first if either of those feels incomplete.
The brain analogy — useful and misleading
In the human brain, neurons fire electrical signals to other neurons. Whether a neuron fires depends on the strength of the signals it receives. Over time, connections that are used frequently get stronger. That is learning, in biological terms.
Artificial neural networks borrow this structure. Connected nodes. Signals that pass between them. Connections that strengthen during training. The analogy is genuine enough.
But biological neurons are electrochemical systems of staggering complexity. Artificial neurons are something far simpler — they are mathematical functions. Once you understand the function, the rest follows.
What a neuron actually does
Strip away everything and a single artificial neuron does four things. It takes one or more numbers as input. It multiplies each input by a weight — a number that says how important that input is. It adds all those results together, plus a bias value. Then it passes the result through an activation function, which decides what signal to send forward.
That is it. Input × weight, sum, activation. A function.
Here is a concrete example. Imagine a neuron trying to help predict whether a house will sell above asking price. Its inputs might be: square footage, number of bedrooms, and proximity to a school. Each input gets a weight reflecting its importance — proximity to a school might carry more weight than an extra bedroom. The neuron sums the weighted inputs and passes the result through an activation function, which produces the output signal.
💡 Note
What the activation function does — Without it, the network is just a chain of multiplications — which collapses into a single linear equation, no matter how many layers you stack. The activation function introduces non-linearity, which is what lets the network learn complex, real-world patterns that a straight line could never capture.
How layers build understanding
One neuron is almost useless on its own. The power comes from connecting thousands of them in layers.
Every neural network has three types of layers. The input layer receives the raw data — pixel values for an image, token embeddings for text, numerical values for a spreadsheet. The hidden layers do the work. The output layer produces the final result — a classification, a probability, a generated token.
What makes layers interesting is what each one learns. In an image recognition network, the first hidden layer learns to detect edges — basic lines and contrasts. The next layer combines those edges into shapes, and the layer after that turns shapes into objects.
By the final hidden layers, the network is recognising faces, cars, and cats — not because anyone told it to, but because that is what emerged from training on millions of images.
📌 The key insight
Nobody programmes the features. Nobody tells the network “look for ears in this layer.” The network finds the features itself, through training. This is what makes neural networks different from every AI approach that came before — the representation is learned, not designed.
How a network learns — weights and training
Before training, a network’s weights are set randomly. It makes predictions — and they are terrible. Training is the process of making them less terrible, systematically.
Here is how one training cycle works. The network takes a labelled example — say, an image of a cat with the label “cat” — and passes it forward through every layer. That is the forward pass: data in, prediction out.
That prediction is compared to the correct answer. The difference is measured as a loss — a single number representing how wrong the network was.
Then comes backpropagation. The error is propagated backwards through the network, layer by layer. For each weight, the algorithm calculates how much that weight contributed to the error. Weights that contributed more to the mistake get adjusted more. This adjustment process, repeated across millions of training examples, is gradient descent — the network is always moving its weights in the direction that reduces error.
📝 Note
Gradient descent in plain English — Imagine you’re blindfolded on a hilly landscape and you want to reach the lowest point. You feel the slope under your feet and take a small step in the downhill direction. Then you feel again and step again. Gradient descent does this in a mathematical landscape of error values, always nudging weights toward lower error. It rarely finds the perfect minimum — but it finds a good one.
After enough training cycles — millions or billions of examples — the weights settle into values that let the network make accurate predictions on data it has never seen before. That generalisation is the whole point.
Why depth matters — and when it doesn’t
More layers mean the network can learn more abstract, more complex patterns. GPT-3’s published architecture has 96 layers — and that depth is a big part of what lets large models handle nuanced language.
But depth is not always the answer. A simple classification problem — spam or not spam, fraud or not fraud based on ten structured fields — often needs nothing more than a shallow network or a different model type entirely. Stacking 50 layers onto a straightforward problem adds training cost and complexity without improving the result.
| Architecture | What it is | What it is used for |
|---|---|---|
| Feedforward network | The basic form — data flows one direction through layers | Tabular data, simple classification, regression |
| CNN (Convolutional) | Uses filters that slide across input to detect spatial patterns | Images, video, medical imaging |
| RNN (Recurrent) | Loops outputs back as inputs — handles sequential data | Time series, older language models |
| Transformer | Uses self-attention — every input attends to every other input simultaneously | Language models, LLMs, GPT, Claude, SAP Joule |
📝 Note
The transformer is still a neural network — Its architecture is different from a basic feedforward network, but it follows the same principles: layers of nodes, learned weights, trained by backpropagation. How Generative AI Works covers how the transformer uses self-attention specifically.
At a glance — neural networks
| Concept | One-line summary |
|---|---|
| Neural network | A mathematical system of connected nodes arranged in layers that learns patterns from data |
| Neuron (node) | A single function: multiply inputs by weights, sum them, apply an activation function |
| Weight | A number controlling how much influence one node has on the next — adjusted during training |
| Bias | An offset added to the weighted sum — gives each neuron more flexibility to fit the data |
| Activation function | Introduces non-linearity — without it, stacking layers has no effect |
| Input layer | Receives the raw data — pixels, token embeddings, numerical values |
| Hidden layers | Where the learning happens — each layer builds on the abstractions of the previous one |
| Output layer | Produces the final result — a class label, a probability, a generated token |
| Forward pass | Data moving from input to output through the network, producing a prediction |
| Loss | A measure of how wrong the network’s prediction was — the number training aims to reduce |
| Backpropagation | The algorithm that calculates how much each weight contributed to the error |
| Gradient descent | The process of adjusting weights step by step to reduce loss over many training examples |
| Deep learning | Machine learning using neural networks with many hidden layers |
What to take away
A neural network is not a simulation of a brain. It is a function — a very large, layered mathematical function that gets better at pattern recognition through repetition. The weights start random. Training adjusts them. Given enough data and enough training cycles, the weights settle into values that let the network generalise to examples it has never seen.
That is both simpler and more profound than the brain analogy suggests. Simpler, because the underlying mechanics are just multiplication, addition and a non-linear function — repeated millions of times. More profound, because nobody designs the features the network learns. They emerge. An image network that nobody told to look for ears learns to look for ears because ears are statistically useful for identifying cats.
Every AI model you interact with today — from the LLM answering your questions to the fraud detection system approving your transaction — is a network of weights that were adjusted, layer by layer, until the loss got low enough. That is the idea behind all of it.
🔗 Related posts on this site
How Generative AI Works — Tokens, Embeddings and the Transformer — takes the transformer architecture and explains exactly how it generates text, one token at a time.
AI vs ML vs Deep Learning — What Is the Actual Difference? — places neural networks in the broader landscape: where deep learning ends and machine learning begins.
What is a Large Language Model (LLM)? — the LLM post explains what the model is; this post explains the architecture underneath it.
AI Hallucinations — Why They Happen — hallucination is a direct consequence of how neural networks predict rather than verify — now you understand why.
Published on rakeshnarayan.com — Articles
URL: https://rakeshnarayan.com/articles/neural-networks-the-idea-behind-modern-ai/



Did you enjoy this article?
Let me know — it takes one click.
0 Comments
Leave a Comment
Your comment has been submitted and will appear after review.