We argue that kernel-based learning algorithms and, more generally, linear-in-the-parameters learning are more biologically plausible than has been supposed, and that they can be combined with neural-network ideas to gain advantages of both approaches. 1. While linear-in-the-parameters learning is fast, it seems to waste neurons because it does not permit as high a ratio of adjustable synapses to cells as does nonlinear learning. But we show that the ratios become comparable as the number of output variables increases – i.e. linear learning becomes plausible when one considers that a brain has to learn many different, high-dimensional tasks. 2. Fast linear algorithms like RLS involve computations with large matrices, but we show that the matrices needn't be represented in transmissible form, in cell firing, but can be stored in synapses, which are much more plentiful than cells in the brain – i.e. there is, plausibly, enough storage space for these matrices. 3. Linear algorithms train just one layer of synapses, but with appropriate internal models we show how the process can be repeated at different stations in series, to get supervised learning at many different layers. 4. We show that it is possible to back-propagate through kernels, without needing the weight transport that is the implausible aspect of backprop, and so get more-effective feature-shaping than is normally possible with kernel methods. 5. We show that linear learning does not imply that most, or even necessarily any, neurons stay inside their linear ranges. 6. More speculatively, we point out that aspects of kernel-network learning agree with certain of our intuitions about learning and memory, e.g. that at least some kinds of memory consist largely of specific experiences, not blends, and that when an experience is repeated over and over, we remember later instances less well.
Supported by CIHR.