Summary by Adrian Wilkins-Caruana
In a 2007 essay called The Origin of Circuits, Alan Bellows tells a fascinating story about an experiment that Dr. Adrian Thompson conducted in the 1990s. Thompson arranged a 10 x 10 array of logic gates (in a configuration now known as an FPGA) and tried to see if he could evolve a program encoded by these gates to reliably distinguish between signals of two different audio frequencies. With sophisticated logic configurations (like complicated signal processors), this was a trivial task at the time, but even though the array was so small, Thompson found that there were indeed logic configurations that could reliably detect these signals.
Aside from this main result, there are two other remarkable things about Thompson’s experiment. First, the logic configuration was updated iteratively in an evolutionary manner that’s quite similar to evolutionary machine learning algorithms. The other is that, upon investigating the most successful configuration, Thompson noticed a section of the array that was logically disconnected from the array’s output yet, without it, the array couldn’t reliably classify the signals. This means that the disconnected logic section was influencing the classification through some mechanism other than digital logic, and that the evolutionary algorithm seemed to account for the effects of this mechanism as it updated the program. (It turns out some of the logically-disconnected gates were influencing the voltage of other nearby gates via magnetic flux.) I highly recommend giving Bellows’s essay a read if you haven’t before.
With three decades of hindsight, we can see that Thompson’s array of logic gates was an example of a physical neural network, or PNN. PNNs are neural-like networks that aren’t built from silicon chips (though they could be) but instead from components that harness other physical phenomena, like light or sound. In a sense, PNNs offer an alternative paradigm of machine learning, one which isn’t necessarily constrained by the limitations of digital logic. That is to say, PNNs can let us harness various physical phenomena to solve problems with machine learning. Today’s summary explores training PNNs, i.e., the different ways that PNNs’ parameters can be updated to solve particular problems.
Before exploring how PNNs are trained, let me quickly describe the concept of back propagation (BP), the workhorse of traditional, digital neural network training. When an NN makes a prediction (sometimes called a forward pass) and we know what that prediction should be, we can calculate the error in the network's prediction. We can then propagate the error backwards through the network (sometimes called a backward pass), updating the network’s parameters so that it's more likely to give a more accurate answer the next time it runs on similar inputs.
One way of training a PNN, called in silico training, mirrors BP quite closely. In silico training involves digitally simulating and optimizing physical parameters (θ) using a digital twin, which is an emulation of the physical hardware within a computer environment. Similar to BP in traditional neural networks, in silico training uses these digital models to compute gradients and update weights, which you can then apply to the physical system through some other process. This approach benefits from the rapid, cost-effective iteration and testing of PNN architectures, but it might not work well if the digital twin isn’t perfect. This means that the entire process is simulated, and the physical model is only given the learned weights at the end.
Another training approach, called Physics-aware BP training (PAT), is a hybrid of in situ (meaning not simulated) and in silico methods. In PAT, the physical system handles the forward pass, while the backward pass is performed by differentiating a digital model that is an approximation of the physical system. This means the info for the forward pass will still be precise while maintaining the versatility of performing the backward pass on a computer. You still need an accurate digital twin to effectively model the backward pass, and the larger and more complicated the PNN is, the harder it is to make an accurate model of it.
Both of the methods described above are, in a sense, cheating, since they aim to train PNNs using conventional digital NN techniques. But there’s good reason for this, since BP has been shown to be much more effective than other techniques for training digital NNs. But, as we’ve seen with in silico training and PAT, it can be hard to accurately back propagate error signals through a PNN. Are there any other ways? Here are two:
Feedback alignment (FA) is an alternative to BP whereby some of the terms in the weight-update algorithm of BP are replaced by random terms. This essentially transforms the update rule into a random walk in weight space. This means that, unlike BP, we don’t need to know exactly what the weights were in the forward pass to know how to update them.
Physical local learning uses a concept called local learning to train the weights in each block or layer independently (i.e., without any BP). There are lots of different ways that local learning could be achieved, but they typically all try to define some objective function using the layer’s activations, one that indicates whether the activations are doing something useful, like compression or providing useful information for the next layer/block. Geoffery Hinton’s forward-forward technique is one example of local learning, and one study has already used it to train an optical NN with a contrastive-based approach.
These methods aren’t quite as effective as BP, though, so other studies try to reproduce BP without the need for a physical twin; they essentially encode the BP algorithm directly into the physical system. For this to work, the system needs to utilize some physical process that’s a linear reciprocal function, i.e., a system that behaves like y = 1 / x (I’ve omitted the coefficients for simplicity). Two examples are waves propagating through a linear medium in a photonic system, or a peculiar electrical device called a memristor crossbar array. There are also other techniques for in situ training, such as a one called continual learning that updates the model’s parameters as it’s used.
At the end of their review, the authors arrived at three qualities that would be great for a PNN to have, although nothing meets all three (yet):
They don’t depend on the model used.
They give a speed or efficiency advantage over regular NNs.
They are more resilient to noise.
But a PNN doesn’t need to have all three of these qualities to be useful. It just means a bit more effort might need to go into developing them, since we can’t yet say for certain things like, “Oh, this particular learning algorithm works best for this kind of PNN or this kind of application.” Given the current pace of AI developments, the possibility of realizing all three at once could be on the horizon, which could open the doors to an entirely new domain of AI.