Neural Networks — The Idea Behind Modern AI

November 5, 2025 · Updated January 28, 2026 · 7 min read

Most explanations of neural networks start with the brain. “It works just like neurons firing.” Then they show a diagram of circles and lines, and you nod along — but you still do not know what the network is actually doing.

The brain analogy is not wrong. It is just not useful. It describes the inspiration, not the mechanism. And the mechanism is what matters — because it is the mechanism that explains why these systems are so capable, and why they fail the way they do.

Every AI model you use today — ChatGPT, Claude, SAP Joule, the image generator, the fraud detection system behind your bank — is built on neural networks. This post explains the idea behind all of them.

🔗 Foundation reading

This post sits between two others on this site. AI vs ML vs Deep Learning places neural networks in the broader AI landscape. How Generative AI Works takes the transformer — a specific type of neural network — and explains how it generates text. Read this post first if either of those feels incomplete.

The brain analogy — useful and misleading

In the human brain, neurons fire electrical signals to other neurons. Whether a neuron fires depends on the strength of the signals it receives. Over time, connections that are used frequently get stronger. That is learning, in biological terms.

Artificial neural networks borrow this structure. Connected nodes. Signals that pass between them. Connections that strengthen during training. The analogy is genuine enough.

But biological neurons are electrochemical systems of staggering complexity. Artificial neurons are something far simpler — they are mathematical functions. Once you understand the function, the rest follows.

What a neuron actually does

Strip away everything and a single artificial neuron does four things. It takes one or more numbers as input. It multiplies each input by a weight — a number that says how important that input is. It adds all those results together, plus a bias value. Then it passes the result through an activation function, which decides what signal to send forward.

That is it. Input × weight, sum, activation. A function.

Here is a concrete example. Imagine a neuron trying to help predict whether a house will sell above asking price. Its inputs might be: square footage, number of bedrooms, and proximity to a school. Each input gets a weight reflecting its importance — proximity to a school might carry more weight than an extra bedroom. The neuron sums the weighted inputs and passes the result through an activation function, which produces the output signal.

💡 Note

What the activation function does — Without it, the network is just a chain of multiplications — which collapses into a single linear equation, no matter how many layers you stack. The activation function introduces non-linearity, which is what lets the network learn complex, real-world patterns that a straight line could never capture.

How layers build understanding

One neuron is almost useless on its own. The power comes from connecting thousands of them in layers.

Every neural network has three types of layers. The input layer receives the raw data — pixel values for an image, token embeddings for text, numerical values for a spreadsheet. The hidden layers do the work. The output layer produces the final result — a classification, a probability, a generated token.

What makes layers interesting is what each one learns. In an image recognition network, the first hidden layer learns to detect edges — basic lines and contrasts. The next layer combines those edges into shapes, and the layer after that turns shapes into objects.

By the final hidden layers, the network is recognising faces, cars, and cats — not because anyone told it to, but because that is what emerged from training on millions of images.

📌 The key insight

Nobody programmes the features. Nobody tells the network “look for ears in this layer.” The network finds the features itself, through training. This is what makes neural networks different from every AI approach that came before — the representation is learned, not designed.

How a network learns — weights and training

Before training, a network’s weights are set randomly. It makes predictions — and they are terrible. Training is the process of making them less terrible, systematically.

Here is how one training cycle works. The network takes a labelled example — say, an image of a cat with the label “cat” — and passes it forward through every layer. That is the forward pass: data in, prediction out.

That prediction is compared to the correct answer. The difference is measured as a loss — a single number representing how wrong the network was.

Then comes backpropagation. The error is propagated backwards through the network, layer by layer. For each weight, the algorithm calculates how much that weight contributed to the error. Weights that contributed more to the mistake get adjusted more. This adjustment process, repeated across millions of training examples, is gradient descent — the network is always moving its weights in the direction that reduces error.

📝 Note

Gradient descent in plain English — Imagine you’re blindfolded on a hilly landscape and you want to reach the lowest point. You feel the slope under your feet and take a small step in the downhill direction. Then you feel again and step again. Gradient descent does this in a mathematical landscape of error values, always nudging weights toward lower error. It rarely finds the perfect minimum — but it finds a good one.

After enough training cycles — millions or billions of examples — the weights settle into values that let the network make accurate predictions on data it has never seen before. That generalisation is the whole point.

Why depth matters — and when it doesn’t

More layers mean the network can learn more abstract, more complex patterns. GPT-3’s published architecture has 96 layers — and that depth is a big part of what lets large models handle nuanced language.

But depth is not always the answer. A simple classification problem — spam or not spam, fraud or not fraud based on ten structured fields — often needs nothing more than a shallow network or a different model type entirely. Stacking 50 layers onto a straightforward problem adds training cost and complexity without improving the result.

Architecture	What it is	What it is used for
Feedforward network	The basic form — data flows one direction through layers	Tabular data, simple classification, regression
CNN (Convolutional)	Uses filters that slide across input to detect spatial patterns	Images, video, medical imaging
RNN (Recurrent)	Loops outputs back as inputs — handles sequential data	Time series, older language models
Transformer	Uses self-attention — every input attends to every other input simultaneously	Language models, LLMs, GPT, Claude, SAP Joule

📝 Note

The transformer is still a neural network — Its architecture is different from a basic feedforward network, but it follows the same principles: layers of nodes, learned weights, trained by backpropagation. How Generative AI Works covers how the transformer uses self-attention specifically.

At a glance — neural networks

Concept	One-line summary
Neural network	A mathematical system of connected nodes arranged in layers that learns patterns from data
Neuron (node)	A single function: multiply inputs by weights, sum them, apply an activation function
Weight	A number controlling how much influence one node has on the next — adjusted during training
Bias	An offset added to the weighted sum — gives each neuron more flexibility to fit the data
Activation function	Introduces non-linearity — without it, stacking layers has no effect
Input layer	Receives the raw data — pixels, token embeddings, numerical values
Hidden layers	Where the learning happens — each layer builds on the abstractions of the previous one
Output layer	Produces the final result — a class label, a probability, a generated token
Forward pass	Data moving from input to output through the network, producing a prediction
Loss	A measure of how wrong the network’s prediction was — the number training aims to reduce
Backpropagation	The algorithm that calculates how much each weight contributed to the error
Gradient descent	The process of adjusting weights step by step to reduce loss over many training examples
Deep learning	Machine learning using neural networks with many hidden layers

What to take away

A neural network is not a simulation of a brain. It is a function — a very large, layered mathematical function that gets better at pattern recognition through repetition. The weights start random. Training adjusts them. Given enough data and enough training cycles, the weights settle into values that let the network generalise to examples it has never seen.

That is both simpler and more profound than the brain analogy suggests. Simpler, because the underlying mechanics are just multiplication, addition and a non-linear function — repeated millions of times. More profound, because nobody designs the features the network learns. They emerge. An image network that nobody told to look for ears learns to look for ears because ears are statistically useful for identifying cats.

Every AI model you interact with today — from the LLM answering your questions to the fraud detection system approving your transaction — is a network of weights that were adjusted, layer by layer, until the loss got low enough. That is the idea behind all of it.

🔗 Related posts on this site

How Generative AI Works — Tokens, Embeddings and the Transformer — takes the transformer architecture and explains exactly how it generates text, one token at a time.
AI vs ML vs Deep Learning — What Is the Actual Difference? — places neural networks in the broader landscape: where deep learning ends and machine learning begins.
What is a Large Language Model (LLM)? — the LLM post explains what the model is; this post explains the architecture underneath it.
AI Hallucinations — Why They Happen — hallucination is a direct consequence of how neural networks predict rather than verify — now you understand why.

Published on rakeshnarayan.com — Articles

URL: https://rakeshnarayan.com/articles/neural-networks-the-idea-behind-modern-ai/

Neural NetworksNeural Networks ExplainedHow Neural Networks WorkWhat is a Neural NetworkDeep LearningMachine LearningWeights and BiaseBackpropagationActivation FunctionHidden LayersAI FundamentalsAI Fundamentals 2026Transformer Architecture

Neural Networks — The Idea Behind Modern AI

The brain analogy — useful and misleading

What a neuron actually does

How layers build understanding

How a network learns — weights and training

Why depth matters — and when it doesn’t

At a glance — neural networks

What to take away

0 Comments

Leave a Comment

The brain analogy — useful and misleading

What a neuron actually does

How layers build understanding

How a network learns — weights and training

Why depth matters — and when it doesn’t

At a glance — neural networks

What to take away

0 Comments

Leave a Comment

Related Articles

Open Source vs Closed Source AI Models — The Real Trade-offs

How LLMs Are Trained — Pretraining, Fine-Tuning and RLHF