Book highlights: “Models of the Mind: How Physics, Engineering and Mathematics Have Shaped Our Understanding of the Brain” by Grace Lindsay

2026.06.16
book-highlights books systems-thinking

LM → my personal comments

Alfred Whitehead, a revered twentieth-century mathematician whose work we will encounter in Chapter 3, has been paraphrased as saying: ‘The ultimate goal of mathematics is to eliminate any need for intelligent thought.’

LM: Loved this!

And while it was traditionally the domain of adventurous physicists or wandering mathematicians, today ‘theoretical’ or ‘computational’ neuroscience is a fully developed subdivision of the neuroscience enterprise with dedicated journals, conferences, textbooks and funding sources.

As he sat for days putting in the value of the voltage at one point in time just to calculate what it would be at the next one-ten-thousandth of a second, Huxley actually found the work somewhat suspenseful. As he said in his Nobel lecture: ‘It was quite often exciting … Would the membrane potential get away into a spike, or die in a subthreshold oscillation? Very often my expectations turned out to be wrong, and an important lesson I learnt from these manual computations was the complete inadequacy of one’s intuition in trying to deal with a system of this degree of complexity.’

LM: on really complex systems, intuition is not a good tool for solving problems. You need data and reason from first principles.

The ability to spot logic at play in the interactions of neurons came from McCulloch’s discerning eye. As a physiologist, he knew that neurons were more complex than his simple drawings and equations suggested. They had membranes, ion channels and forking paths of dendrites. But the theory didn’t need their full complexity. So, like an impressionist painter using only the necessary strokes, he intentionally highlighted only the elements of neural activity required for the story he wanted to tell. In doing so, he demonstrated the artistry inherent to model-building; it is a subjective and creative process to decide which facts belong in the foreground.

LM:
Good software engineering is also about creating proper abstractions to model reality. And you don’t need it to make it perfect, you just need to fit the model to solve your current use case.
Creating good abstractions is an art.
The map is not the territory (but a good map is sometimes better than the whole thing, as you need less “tokens” to parse the map than the territory)

“The Navy last week demonstrated the embryo of an electronic computer named the Perceptron which, when completed in about a year, is expected to be the first non-living mechanism able to ‘perceive, recognise, and identify its surroundings without human training or control.’ […] Dr. Frank Rosenblatt, research psychologist at the Cornell Aeronautical Laboratory, Inc., Buffalo, NY, designer of the Perceptron, conducted the demonstration. The machine, he said, would be the first electronic device to think as the human brain. Like humans, Perceptron will make mistakes at first, ‘but it will grow wiser as it gains experience’, he said”. This summary, from an article entitled ‘Electronic “brain” teaches itself’, appeared in the 13 July 1958 edition of the New York Times, opposite a letter to the editor about the ongoing debate on whether smoking causes cancer.

LM: this miscalculation is hilarious and beautiful! The thing that was supposed to take a year ended up taking 6 decades, but it’s happening exactly as predicted.

The backpropagation algorithm was necessary to boost artificial neural networks to the point where they could reach near-human levels of performance on some tasks. As a learning rule for neural networks, it really works. Unfortunately, that doesn’t mean it works like the brain. While the perceptron learning rule was something that could be seen at play between real neurons, the backpropagation algorithm is not. It was designed as a mathematical tool to make artificial neural networks work, not a model of how the brain learns (and its inventors were very clear on that from the start). The reason for this is that real neurons can typically only know about the activity of the neurons they’re connected to – not about the activity of the neurons those neurons connect to and so on and so on. For this reason, there is no obvious way for real neurons to implement the chain rule. They must be doing something different. For some researchers – particularly researchers in the field of artificial intelligence – the artificial nature of backpropagation is no problem. Their goal is to build computers that can think, by whatever means necessary. But for other scientists – neuroscientists in particular – finding the learning algorithm of the brain is paramount. We know the brain is good at getting better; we see it when we learn a musical instrument, how to drive or how to read a new language. The question is how.

LM: Different domains, different objectives.
FYI. Backpropagation computes how much each weight contributed to the error, then nudges them all in the direction that reduces it.

In their hunt to understand how the mind learns from supervision, modern researchers are doing just what McCulloch did. They’re looking at the piles of facts we have about the biology of the brain and trying to see in it a computational structure. Today, they are guided in their search by the workings of artificial systems. Tomorrow, the findings from biology will again guide the building of artificial intelligence. This back-and-forth defines the symbiotic relationship between these two fields.

LM: Amazing how on domain inspires the other and vice-versa!

It is now established science that experience leads to the activation of neurons and that activating neurons can alter the connections between them.

In the language of physics, a fully retrieved memory is an example of an attractor. An attractor is, in short, a popular pattern of activity. It is one that other patterns of activity will evolve towards, just as water is pulled down a drain. A memory is an attractor because the activation of a few of the neurons that form the memory will drive the network to fill in the rest. Once a network is in an attractor state, it remains there with the neurons fixed in their ‘on’ or ‘off’ positions. Always fond of describing things in terms of energy, physicists consider attractors ‘low energy’ states. They’re a comfortable position for a system to be in; that is what makes them attractive and stable.

LM: Retrieved memory are neurons activated in ‘low energy’ state.

Ring networks are a lovely solution to the complex problem of how to create robust and functional working memory systems. They are also beautiful mathematical objects. They display the desirable properties of simplicity and symmetry. They’re precise and finely tuned, elegant even.

Image source: Ring-shaped neuronal networks: a platform to study persistent activity

LM: I love how the author language shows her love for science and mathematics.

Because dopamine is associated with reward, this model also predicts that under conditions where a person is anticipating a large reward their working memory will be better – and that is exactly what has been found. When people are promised more reward in turn for remembering something, their working memory is better. Here, the concept of an attractor works as the thread that stitches chemical changes together with cognitive ones. It links ions to experiences.

As Einstein famously said with regard to the new science of quantum mechanics: ‘God does not play dice.’ So why should the brain? Could there be any good reason for evolution to produce noisy neurons? Some philosophers have claimed that the noise in the brain could be a source of our free will – a way to overcome a view of the mind as subject to the same deterministic laws of any machine. Yet others disagree. As British philosopher Galen Strawson wrote: ‘It may be that some changes in the way one is are traceable … to the influence of indeterministic or random factors. But it is absurd to suppose that indeterministic or random factors, for which one is by definition in no way responsible, can in themselves contribute in any way to one’s being truly morally responsible for how one is.’ In other words, following decisions based on a coin flip isn’t exactly ‘free’ either.

‘Those studying chaotic dynamics discovered that the disorderly behaviour of simple systems acted as a creative process. It generated complexity: richly organised patterns, sometimes stable and sometimes unstable, sometimes finite and sometimes infinite.’

LM: Antifragile

One professor purportedly described the aims more casually as ‘linking a camera to a computer and getting the computer to describe what it saw’. The goals of this project were not completed that summer. Nor the next. Nor many after that. Indeed, some of the core issues raised in the description of the summer project remain open problems to this day. The hubris on display in that memo is not surprising for its time. As discussed in Chapter 3, the 1960s saw an explosion in computing abilities and, in turn, naive hopes about automating even the most complex tasks. If computers could now do anything asked of them, it was just a matter of knowing what to ask for. With something as simple and immediate as vision, how hard could that be?

LM: Could we be going through the same naive hopes now with AI, aka expecting AGI to happen anytime soon?

The study of vision – of how patterns can be found in points of light – is full of direct influence from the biological to the artificial and vice versa. The harmony may not have been constant: when computer science embarked on methods that were useful but didn’t resemble the brain, the fields diverged. And when neuroscientists dig into the nitty-gritty detail of the cells, chemicals and proteins that carry out biological vision, computer scientists largely turn away. But the impacts of the mutual influence are still undeniable, and plainly visible in the most modern models and technologies.

These issues make template matching a challenge both for artificial visual systems and for the brain. The ideas on display in Pandemonium, however, represent a more distributed approach, as the features detected by the computational demons are shared across cognitive demons. The approach is also hierarchical. That is, Pandemonium breaks the problem of vision into two stages: first look for the simple things, then for the more complex.

LM: “Pandemonium” (which means a place of wild chaos and noise) perfectly captured that image of hundreds of little demons all shouting over each other simultaneously.

1959 paper ‘What the frog’s eye tells the frog’s brain’ he and his co-authors describe four different types of ganglion cells that each responded to a different simple pattern.

To get a sense of what this means, you can hold a pen out horizontally in front of your face and move it up and down. You’ve just excited a group of neurons in your primary visual cortex. Tilt the pen another way and you’ll excite a different group (you’ve now got at-home, targeted brain stimulation for free!).

LM: 😆

In an amusing bending of science in on itself, convolutional neural networks have even been used by neuroscientists to help automatically detect where neurons are in pictures of brain tissue. Artificial neural networks are now looking at real ones.

LM: what a plot twist! 🤯

In truth, neuroscientists continue to spar and struggle over the neural code to this day. They host conferences centered on ‘Cracking the neural code’. They write papers with titles like ‘Seeking the neural code’, ‘Time for a new neural code?’ and even ‘Is there a neural code?’ They continue to find good evidence for Adrian’s original rate-based coding, but also some against it. Identifying the neural code can seem a more distant goal now than when MacKay and McCulloch wrote their first musings on it.

LM: Neural code -> how the pattern of spikes (action potentials) carry information

In the MSO, cells that receive inputs from each ear compare the relative timing of these two inputs. For example, one cell may be set up to detect sounds that arrive at both ears simultaneously. To do so, the signals from each ear would need to take the exact same amount of time to reach this MSO cell. This cell then fires when it receives two inputs at the exact same time and this response indicates that the sound hit both ears at the same time (see Figure 17). The cell next to this one, however, receives slightly asymmetric inputs. That is, the nerve fibre from one ear needs to travel a little farther to reach this cell than the nerve from the other ear. Because of this, one of the temporal signals gets delayed. The extra length the signal travels determines just how much extra time it takes. Let’s say the signal from the left ear takes an extra 100 microseconds to reach this MSO cell. Then, the only way this cell will receive two inputs at once is if the sound hits the left ear 100 microseconds before it hits the right. Therefore, this cell’s response (which, like the other cell, only comes when it receives two inputs at once) would signal a 100-microsecond difference.

In this way, a temporal code has been transformed into a spatial code: the position of the active neuron in this map carries information about the source of the sound.

If the brain evolved within the constraints of information theory – and evolution tends to find pretty good solutions – then it makes sense to conclude that the brain is quite good at encoding information. ‘The safe course here is to assume that the nervous system is efficient,’ Barlow wrote in a 1961 paper. If this is true, any puzzle about why neurons are responding the way they are may be solved by assuming they are acting efficiently.

The redundancy of English means we could, in theory, be conveying the same amount of information with far fewer letters. In fact, in his original 1948 paper, Shannon estimated the redundancy of written English to be about 50 per cent. This is why, for example, ppl cn stll rd sntncs tht hv ll th vwls rmvd.

In total, it’s estimated that up to three-fourths of the brain’s energy budget goes towards sending and receiving signals. And the brain – using 20 per cent of the body’s energy while accounting for only 2 per cent of its weight – is the most energetically expensive organ to run.

In his experiments on muscle stretch receptors, Adrian noticed that ‘there is a gradual decline in the frequency of the discharge under a constant stimulus’. Specifically, while keeping the weight applied to the muscle constant, the firing rate of the nerve would decrease by about half over 10 seconds. Adrian called this phenomenon ‘adaptation’ and defined it as ‘a decline in excitability caused by the stimulus’.

In his 1972 paper, Barlow argues for adaptation as a means of increasing efficiency: ‘If sensory messages are to be given a prominence proportional to their informational value, mechanisms must exist for reducing the magnitude of representation of patterns which are constantly present, and this is presumably the underlying rationale for adaptive effects.’ In other words – specifically, in the words of information theory – if the same symbol is being sent across the channel over and over, its presence no longer carries information. Therefore, it makes sense to stop sending it. And that is what neurons do: they stop sending spikes when they see the same stimulus over and over.

Barlow’s contribution was conspicuously cut out. The Soviets, it turned out, had a problem with the use of information theory to understand the brain. Considered part of the ‘bourgeois pseudoscience’ of cybernetics, it ran counter to official Soviet philosophy by equating man with machine.

FYI.
The idea that a brain could be modeled as an information-processing machine felt deeply dehumanizing and reductive to Soviet ideologues.
Cybernetics was developed largely by Norbert Wiener, an American. Western science was automatically suspect, especially anything that crossed into philosophy or social theory.
Soviet philosophy followed a rigid framework where reality evolves through contradictions and struggle. A mathematical, mechanistic view of mind didn’t fit that framework neatly.
If a machine could process information like a brain, that raised uncomfortable questions about free will, the soul, and what made Soviet man special and revolutionary.
By the late 1950s, after Stalin’s death, the Soviets quietly reversed course and embraced cybernetics — partly because they realized they were falling dangerously behind the West in computing and military technology. The same ideas that were “bourgeois pseudoscience” in 1950 became state-sponsored research by 1960.

Computer scientist Geoffrey Hinton offers the best advice for this problem: ‘To deal with hyper-planes in a fourteen-dimensional space, visualise a 3-D space and say “fourteen” to yourself very loudly.’

LM: 😆 That’s probably what I was doing during my linear algebra course at university

For example, redundancy is a smart feature to have in any biological system. Neurons are noisy and can die, which makes a system with redundant ones more robust. Furthermore, neurons tend to be heavily interconnected. With all their talking back and forth with each other, it’s unlikely that any one of them can remain very independent. Instead, their activity becomes correlated the same way that the opinions of people in the same social circles start to converge.

LM: loved this comparison! Micro = Macro / the part reflects the whole

They noticed that it was possible to have short path lengths in a network that was highly clustered. A cluster refers to a subset of nodes that are heavily interconnected, like the members of a family. In these networks, most nodes form edges just with other nodes in their cluster, but occasionally a connection is sent to a node in a distant cluster.

Why should the nervous system of a nematode have the same shape as the social network of humans? The biggest reason may be energy costs. Neurons are hungry. They require a lot of energy to stay in working order and adding more or longer axons and dendrites only ups the bill. A fully interconnected brain is, thus, a prohibitively expensive brain. Yet if connections become too sparse the very function of the brain – processing and routing information – breaks down. A balance must be struck between the cost of wiring and the benefit of information sharing. Small worlds do just this. In a small world, the more common connections are the relatively cheap ones between cells in a local cluster. The pricey connections between faraway neurons are rare, but there are enough to keep information flowing. Evolution, it seems, has found small worldness to be the smart solution.

according to philosopher of mind Michael Rescorla, the Bayesian approach is ‘our best current science of perception’.

LM: Bayesian approach: updating beliefs with new evidence. Your brain is constantly asking “given everything I know and what my senses just told me, what is the most probable explanation for this?”

Prediction error is the grease that oils the wheels of learning.

LM: you don’t learn when what you expect to happen happens (even if you were expecting a good thing). You learn when what you expect to happen doesn’t happen (good or bad).

The paths of sequential decision-making and Pavlovian conditioning represent a victory of convergent scientific evolution. The trajectories of Bellman and Pavlov start with separate and substantial problems, each seething with their own demanding details. How should a hospital schedule its nurses and doctors to serve the most patients? What causes a dog to salivate when the sound of a buzzer hits its ears? These questions are seemingly worlds apart. But by peeling away the weight of the specifics – leaving only the bare bones of the problem to remain – their interlocking nature becomes clear. This is one of the roles of mathematics: to put questions disconnected in the physical world into the same conceptual space wherein their underlying similarities can shine through. The story of reinforcement learning is thus one of successful interdisciplinary interaction. It shows that psychology and engineering and computer science can work together to make progress on hard problems. It demonstrates how mathematics can be used to understand, and replicate, the ability of animals and humans to learn from their surroundings.

dopamine neurons encode the prediction errors necessary for temporal difference learning.

This study showed that the firing of dopamine neurons can signal the errors – both positive and negative – about predicted values that are needed for learning. It was thus an important point in shifting the understanding of dopamine from a pleasure molecule to a pedagogical one.

LM: super interesting this dopamine update!

One theory, put forth in 2004 by neuroscientist David Redish, tries to explain the addictive properties of drugs like amphetamine and cocaine in terms of the effects they have on dopamine release. It posits that these drugs cause a release of dopamine that is independent of the true prediction error. Specifically, by overdriving the dopamine neurons, these drugs send the false signal to the rest of the brain that the drug experience is always better than expected. This errant error signal still drives learning, pushing the estimated value of states associated with drug use higher and higher. Deforming the value function in this way is guaranteed to have detrimental effects on behaviour like the ones seen in addiction.

When aiming for something akin to a GUT (Grand Unified Theory), the hope is always to find the simplest set of principles that can explain the largest set of facts. With an object as dense and messy as the brain, that simple set may still be pretty complicated. To know in advance what level of detail and what magnitude of scale will be needed to capture the relevant features of brain function is impossible. It is only through the building and testing of models that progress on that question can be made.

NeuWrite community.

LM: might be useful if I someday decide to write another book 😆

High Output Software Engineering book cover

If you're a software engineer working on product teams you might enjoy my book High Output Software Engineering — it's all about understanding value creation, making value-driven trade-offs, communicating with excellence and knowing how to navigate organization dynamics.

Buy it now for $9.99 (PDF & EPUB instant access)