(Drivebycuriosity) - AI is eating the world. There are not many parts of the economy that are not yet influenced by Artificial Intelligence and its influence is rising with exponential speed. Anil Ananthaswamy describes in "Why Machines Learn: The Elegant Math Behind Modern AI" how mathematicians developed the building blocks (algorithm) for ChatGPT and other versions of machine learning ( amazon).
Ananthaswamy narrates the evolution of AI, the breakthroughs, the setbacks and the fermentation process of thinking. Scientists have been developing algorithms that can learn to discern patterns in data without being explicitly programmed to do so. "Machines can learn because of the extraordinary confluence of math and computer science, with more than a dash of physics and neuroscience added to the mix".
The book is full of information, spiced with mathematics & anecdotes. This humble blog can only present some tidbits here:
Inspired By Biology
The AI developers are inspired by biology and evolution. For instance: "Even fruit flies are thought to use some form an algorithm to react to odor: When a fly senses some odor and another odor most like it for which it already has the neural mechanisms to respond behaviorally."
The scientists noticed that "our brains learn because connections between neuron strengthen when one neuron`s output is consistently involved in the firing of another, and they weaken when this is not so".
Psychologist Frank Rosenblatt designed "artificial neurons that reconfigure as they learn, embodying information in the strength of their connections". The machine (the algorithm), once it had learned, contained knowledge in the strengths (weights) of its connections.
Learning About Patterns
In 1982 the American Physicist John Hopfield declared that neurobiological systems - our brains included - are dynamical and can be mathematical modeled as such. Given one instance of data, the network can memorize it. But an awful lot of the learning our brains do is incremental: Given enough data, we slowly learn about patterns in them.
An LLM (Large Language Model) is an example of generative AI. It has learned an extremely complex, ultra-high-dimensional probability distribution over words, and it is capable of sampling from this distribution, conditioned on the input sequence of words. There are other types of generative AI, but the basic idea behind them is the same: They learn the probability distribution and then sample from the distribution, either randomly or conditional on some input, and produce an output that looks the like training data.
Ananthaswamy writes "Every deep neural network today - with millions, billions, possibly trillions of weights - uses some form of gradient descent for training". Google AI explains: "Gradient descent is an iterative optimization algorithm used in machine learning to find the minimum of a function by taking steps in the opposite direction of the function's gradient. It works by repeatedly calculating the gradient of a cost function and updating the model's parameters (like weights and biases) to reduce the cost. This process is repeated until a minimum is reached, which can be a local or global minimum".
Listening To Neurons
ChatGPT & Co. are based on huge networks. A network could solve a problem or have a function that was beyond the capability of a single molecule and a linear pathway. In our network each neuron is listening to every other neuron. Neuron 1 is getting inputs from 99 other neurons. Then neuron 1 will calculate the weighted sum of the inputs from 99 neurons and will set its output to 1+ if the weighted sum is greater than zero; otherwise to -1. Of course, these network are simulations inside a computer, so they don´t really have a physical energy. But one could use this formula to calculate a number that`s analogue to physical energy.
The French-American computer scientist Yann André Le Cun said: "I always thought that human engineers would not be smart enough to conceive and design an intelligent machine. It will have to basically design itself through learning. I thought learning was an essential part of intelligence."
A team member who was training the neural network went on vacation and forgot to stop the training algorithm. When he came back, he found to his astonishment that the neural network had learned a general form of the addition. It`s as if it had understood something deeper about the problem than simply memorizing answers for the sets of numbers on which is was being trained.
Broadly Accurate Predictions
If the network works on something for long enough time, which is a very long time, many orders of magnitude longer than it takes to memorize the training set, then suddenly they figure out the deeper underlying patterns and are able to generalize and kind of make broadly accurate predictions about the other problems in the dataset.
These large networks are extremely adept at machine learning, meaning figuring out the patterns that exist in data (or correlations between inputs and outputs) and using the knowledge to make predictions when given new inputs.
Most of the mathematics is beyond me, but Ananthaswamy reintroduced me into the magic of calculus - and I learned much more. Anyway, the book helps me to follow the discussion about AI which is getting more important every day. Recommended!









































