anhinga_anhinga | Dataflow matrix machines as generalized recurrent neural networks

A year ago I posted about dataflow programming and linear models of computation:

http://anhinga-anhinga.livejournal.com/82757.html

It turns out that those dataflow matrix machines are a fairly powerful generalization of recurrent neural networks.

The main feature of dataflow matrix machines (DMMs) are vector neurons. While recurrent neural networks process streams of numbers, dataflow matrix machines process streams of representations of arbitrary vectors (linear streams).

Another important feature of DMMs is that neurons of arbitrary input and output arity are allowed, and a rich set of built-in transformations of linear streams is provided.

Recurrent neural networks are Turing-complete, but they are an esoteric programming language, and not a convenient general-purpose programming platform. DMMs provide a formalism friendly to handling sparse vectors, conditionals, and more, and there are indications that DMMs will grow to become a powerful general-purpose programming platform, in addition to being a convenient machine learning platform.

In this context, it is possible to represent large classes of programs by matrices of real numbers, which allows us to modify programs in continuous fashion and to synthesize programs by synthesizing matrices of real numbers.

Further details and preprints

Self-referential mechanism: Consider a linear stream of matrices describing the connectivity pattern and weights of a DMM. Select a dedicated neuron Self emitting such a stream on its output, and use the latest value of that stream as the current network matrix (matrix describing the connectivity pattern and weights of our DMM). A typical Self neuron would work as an accumulator taking additive updates from other neurons in the network. This mechanism enables reflection facilities and powerful dynamic self-modification facilities. In particular, the networks in question have facilities for dynamic expansion.

The recent DMM-related preprints by our group:

https://arxiv.org/abs/1603.09002

https://arxiv.org/abs/1605.05296

https://arxiv.org/abs/1606.09470

https://arxiv.org/abs/1610.00831

Modern recurrent neural networks with good machine learning properties such as LSTM and Gated Recurrent Unit networks are naturally understood in the DMM framework as networks having linear and bilinear neurons in addition to neurons with more traditional sigmoid activation functions.

Our new open source effort

The new open-source implementation of core DMM primitives in Clojure:

https://github.com/jsa-aerial/DMM

This open-source implementation features a new vector space of recurrent maps (space of "mixed rank tensors"), which allows us to represent a large variety of linear streams as streams of recurrent maps. The vector space of recurrent maps also makes it possible to express variadic neurons as neurons having just one argument.

Therefore a type of neuron is simply a function transforming recurrent maps, which is a great simplification compared to the formalism presented in the preprints above. See the design notes within this open-source implementation for further details.

Flat | Top-Level Comments Only

4 in an interesting question.

On one-hand, the requirements depend on what training methods are to be applicable. For example, people occasionally consider the step-function (y = 1 if x > 0 else 0), but its derivative is 0 everywhere where it is defined, so if one wants to do gradient descent, this is not a good neuron function to have.

At the same time, one can consider an intergal of the step-function, (ReLU: y = max(0,x) ). People were ignoring it for a long time, because of the discontinuity in the derivative, but then it turns out that it works fine in gradient methods, and that it is actually one of the best neuron functions available.

***

A priori, we don't impose many requirements. When a new neuron is recruited to an active part of the network, it happens because Self just changed the network matrix on the up movement. So the next thing that happens is that the inputs of that newly recruited neuron are computed on the down movement. Hence we don't even need to impose any particular requirements for zero.

In fact, we typically think about input neurons as neurons which ignore there own input if any and just inject their output into the network. So this would be an example of a neuron which is not trying to map zeros to zeros.

They are usually stateless, although when we described the general formalism in the second of those preprints, we specifically made sure to be general enough to allow state, if desired.

We found stochastic neurons quite interesting. E.g. all these funky self-organizing patterns in the videos linked from the first comment in http://anhinga-anhinga.livejournal.com/82757.html are obtained using neurons, which are "randomized propagators" (they usually pass through their input unchanged, but with a small probability they output zero instead; this turns out to be enough for all that complexity in those videos).

Misha, thank you a lot for so detailed answers! I'll keep making small steps and asking new questions.

Privet!

This is one of the most interesting talks I've ever heard in my life:

http://anhinga-drafts.livejournal.com/29954.html

Dataflow matrix machines as generalized recurrent neural networks

no subject

no subject

no subject