anhinga_anhinga | Dataflow matrix machines as generalized recurrent neural networks

A year ago I posted about dataflow programming and linear models of computation:

http://anhinga-anhinga.livejournal.com/82757.html

It turns out that those dataflow matrix machines are a fairly powerful generalization of recurrent neural networks.

The main feature of dataflow matrix machines (DMMs) are vector neurons. While recurrent neural networks process streams of numbers, dataflow matrix machines process streams of representations of arbitrary vectors (linear streams).

Another important feature of DMMs is that neurons of arbitrary input and output arity are allowed, and a rich set of built-in transformations of linear streams is provided.

Recurrent neural networks are Turing-complete, but they are an esoteric programming language, and not a convenient general-purpose programming platform. DMMs provide a formalism friendly to handling sparse vectors, conditionals, and more, and there are indications that DMMs will grow to become a powerful general-purpose programming platform, in addition to being a convenient machine learning platform.

In this context, it is possible to represent large classes of programs by matrices of real numbers, which allows us to modify programs in continuous fashion and to synthesize programs by synthesizing matrices of real numbers.

Further details and preprints

Self-referential mechanism: Consider a linear stream of matrices describing the connectivity pattern and weights of a DMM. Select a dedicated neuron Self emitting such a stream on its output, and use the latest value of that stream as the current network matrix (matrix describing the connectivity pattern and weights of our DMM). A typical Self neuron would work as an accumulator taking additive updates from other neurons in the network. This mechanism enables reflection facilities and powerful dynamic self-modification facilities. In particular, the networks in question have facilities for dynamic expansion.

The recent DMM-related preprints by our group:

https://arxiv.org/abs/1603.09002

https://arxiv.org/abs/1605.05296

https://arxiv.org/abs/1606.09470

https://arxiv.org/abs/1610.00831

Modern recurrent neural networks with good machine learning properties such as LSTM and Gated Recurrent Unit networks are naturally understood in the DMM framework as networks having linear and bilinear neurons in addition to neurons with more traditional sigmoid activation functions.

Our new open source effort

The new open-source implementation of core DMM primitives in Clojure:

https://github.com/jsa-aerial/DMM

This open-source implementation features a new vector space of recurrent maps (space of "mixed rank tensors"), which allows us to represent a large variety of linear streams as streams of recurrent maps. The vector space of recurrent maps also makes it possible to express variadic neurons as neurons having just one argument.

Therefore a type of neuron is simply a function transforming recurrent maps, which is a great simplification compared to the formalism presented in the preprints above. See the design notes within this open-source implementation for further details.

Flat | Top-Level Comments Only

From:

anhinga-anhinga.livejournal.com

I am trying to keep two angles of view on this subject at the same time: neural nets as a computational platform and neural nets as a machine learning platform.

The activation functions in that Wikipedia article I link above all have one argument (and one result). What if we allow two arguments? For example, what if we allow a neuron to accumulate two linear combinations on two inputs on the "down movement", and to multiply them together during the "up movement"?

It turns out that this is very powerful. For example, if we think about one of those inputs as the "main signal", and about another of this inputs as the "modulating signal", then what we get is fuzzy conditional. By setting the modulating signal to zero, we can turn off parts of the network, and redirect the signal flow in the network. By setting the modulating signal to one, we just let the signal through. By setting it to something between 0 and 1, we can attenuate the signal, and by setting it above 1 or below 0, we can amplify or negate the signal.

This is understood for decades. In particular, the first known to me proof of Turing completeness of RNNs in 1987 features multiplication neurons prominently:

http://www.demo.cs.brandeis.edu/papers/neuring.pdf

But then it was mostly forgotten. In the 1990-s people managed to prove Turing completeness of RNNs without multiple arguments.

Of course, this does illustrate that theoretical Turing completeness and practical convenience of programming are not the same thing. (There was a recent talk by Edward Grefenstette of DeepMind, which I hope to discuss at some later point, which argued that the practical power of traditional RNNs is more like power of Finite State Machines, the theoretical Turing completeness notwithstanding. One of the objectives of DMM line of research is to boost the practical convenience of recurrent neural networks as a programming platform.)

The original RNNs had mediocre machine learning properties because of the problem of vanishing gradients. The first architecture which overcame this problem was LSTM in 1997:

https://en.wikipedia.org/wiki/Long_short-term_memory

LSTM and other architectures of this family eliminate the problem of vanishing gradient by introducing "memory" and "gates" (multiplicative masks) as additional mechanisms. However, the more straightforward way to think about this is to introduce neurons with linear activation functions for memory and bilinear neurons (the multiplication neurons I am discussing here) for gates (for good references, see Appendix C of the last of the preprints of this series, https://arxiv.org/abs/1610.00831 ).

So, here is the story. The neurons with two arguments are back, they are necessary for the modern RNN architectures such as LSTM and Gates Recurrent Units to work, but the way these modern architectures are usually presented avoids saying explicitly that we have neurons with two-argument activation function (namely, multiplication) here, and instead people talks about these things as "extra mechanisms added to RNNs", which makes them much more difficult to understand.

Edited Date: 2016-12-28 07:14 pm (UTC)

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Anhinga anhinga

Dataflow matrix machines as generalized recurrent neural networks

Dataflow matrix machines as generalized recurrent neural networks

no subject

Profile

July 2021

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags