anhinga_anhinga (
anhinga_anhinga) wrote2016-12-27 01:29 am
![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
Dataflow matrix machines as generalized recurrent neural networks
A year ago I posted about dataflow programming and linear models of computation:
http://anhinga-anhinga.livejournal.com/82757.html
It turns out that those dataflow matrix machines are a fairly powerful generalization of recurrent neural networks.
The main feature of dataflow matrix machines (DMMs) are vector neurons. While recurrent neural networks process streams of numbers, dataflow matrix machines process streams of representations of arbitrary vectors (linear streams).
Another important feature of DMMs is that neurons of arbitrary input and output arity are allowed, and a rich set of built-in transformations of linear streams is provided.
Recurrent neural networks are Turing-complete, but they are an esoteric programming language, and not a convenient general-purpose programming platform. DMMs provide a formalism friendly to handling sparse vectors, conditionals, and more, and there are indications that DMMs will grow to become a powerful general-purpose programming platform, in addition to being a convenient machine learning platform.
In this context, it is possible to represent large classes of programs by matrices of real numbers, which allows us to modify programs in continuous fashion and to synthesize programs by synthesizing matrices of real numbers.
Further details and preprints
Self-referential mechanism: Consider a linear stream of matrices describing the connectivity pattern and weights of a DMM. Select a dedicated neuron Self emitting such a stream on its output, and use the latest value of that stream as the current network matrix (matrix describing the connectivity pattern and weights of our DMM). A typical Self neuron would work as an accumulator taking additive updates from other neurons in the network. This mechanism enables reflection facilities and powerful dynamic self-modification facilities. In particular, the networks in question have facilities for dynamic expansion.
The recent DMM-related preprints by our group:
https://arxiv.org/abs/1603.09002
https://arxiv.org/abs/1605.05296
https://arxiv.org/abs/1606.09470
https://arxiv.org/abs/1610.00831
Modern recurrent neural networks with good machine learning properties such as LSTM and Gated Recurrent Unit networks are naturally understood in the DMM framework as networks having linear and bilinear neurons in addition to neurons with more traditional sigmoid activation functions.
Our new open source effort
The new open-source implementation of core DMM primitives in Clojure:
https://github.com/jsa-aerial/DMM
This open-source implementation features a new vector space of recurrent maps (space of "mixed rank tensors"), which allows us to represent a large variety of linear streams as streams of recurrent maps. The vector space of recurrent maps also makes it possible to express variadic neurons as neurons having just one argument.
Therefore a type of neuron is simply a function transforming recurrent maps, which is a great simplification compared to the formalism presented in the preprints above. See the design notes within this open-source implementation for further details.
http://anhinga-anhinga.livejournal.com/82757.html
It turns out that those dataflow matrix machines are a fairly powerful generalization of recurrent neural networks.
The main feature of dataflow matrix machines (DMMs) are vector neurons. While recurrent neural networks process streams of numbers, dataflow matrix machines process streams of representations of arbitrary vectors (linear streams).
Another important feature of DMMs is that neurons of arbitrary input and output arity are allowed, and a rich set of built-in transformations of linear streams is provided.
Recurrent neural networks are Turing-complete, but they are an esoteric programming language, and not a convenient general-purpose programming platform. DMMs provide a formalism friendly to handling sparse vectors, conditionals, and more, and there are indications that DMMs will grow to become a powerful general-purpose programming platform, in addition to being a convenient machine learning platform.
In this context, it is possible to represent large classes of programs by matrices of real numbers, which allows us to modify programs in continuous fashion and to synthesize programs by synthesizing matrices of real numbers.
Further details and preprints
Self-referential mechanism: Consider a linear stream of matrices describing the connectivity pattern and weights of a DMM. Select a dedicated neuron Self emitting such a stream on its output, and use the latest value of that stream as the current network matrix (matrix describing the connectivity pattern and weights of our DMM). A typical Self neuron would work as an accumulator taking additive updates from other neurons in the network. This mechanism enables reflection facilities and powerful dynamic self-modification facilities. In particular, the networks in question have facilities for dynamic expansion.
The recent DMM-related preprints by our group:
https://arxiv.org/abs/1603.09002
https://arxiv.org/abs/1605.05296
https://arxiv.org/abs/1606.09470
https://arxiv.org/abs/1610.00831
Modern recurrent neural networks with good machine learning properties such as LSTM and Gated Recurrent Unit networks are naturally understood in the DMM framework as networks having linear and bilinear neurons in addition to neurons with more traditional sigmoid activation functions.
Our new open source effort
The new open-source implementation of core DMM primitives in Clojure:
https://github.com/jsa-aerial/DMM
This open-source implementation features a new vector space of recurrent maps (space of "mixed rank tensors"), which allows us to represent a large variety of linear streams as streams of recurrent maps. The vector space of recurrent maps also makes it possible to express variadic neurons as neurons having just one argument.
Therefore a type of neuron is simply a function transforming recurrent maps, which is a great simplification compared to the formalism presented in the preprints above. See the design notes within this open-source implementation for further details.
no subject
Learn more about LiveJournal Ratings in FAQ (https://www.dreamwidth.org/support/faqbrowse?faqid=303).
no subject
In any case, I'll be happy to post further details/comments on DMMs and/or recurrent neural nets, if anyone wants them.
And, on the other note, I'll be happy to post Clojure notes, if anyone wants to see those. If anyone is potentially interested in learning Clojure (a JVM-based modern Lisp) or in improving their Clojure skills, those notes might come handy...
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
no subject
First of all, one should think about recurrent neural networks (RNNs) and about DMMs as "two-stroke engines" (двухтактные двигатели). On the "up movement", the "activation functions" built into the neurons are applied to the inputs of the neurons, and the outputs of the neurons are produced. On the "down movement", the neuron inputs are recomputed from the neuron outputs using the network matrix. This cycle of "up movement"/"down movement" is repeated indefinitely.
The network matrix defines the topology and the weights of the network. The columns of the network matrix are indexed by the neuron outputs, and the rows of the network matrix are indexed by the neuron inputs. If a particular weight is zero, this means that the corresponding output and input are not connected, so the sparsity structure of the matrix defines the topology of the network.
It was traditional to have the same activation function for all neurons in the network, and it was traditional to use some kind of sigmoid for this purpose, but these days it is a common place to mix a few different kinds of activation functions within the same network. Besides sigmoid functions such as logistic ("soft step") and hyperbolic tangent, a particularly popular function in recent years is ReLU (rectifier, y=max(0,x) ). The table in the following Wikipedia article compares some of the activation functions people are trying.
https://en.wikipedia.org/wiki/Activation_function
no subject
The activation functions in that Wikipedia article I link above all have one argument (and one result). What if we allow two arguments? For example, what if we allow a neuron to accumulate two linear combinations on two inputs on the "down movement", and to multiply them together during the "up movement"?
It turns out that this is very powerful. For example, if we think about one of those inputs as the "main signal", and about another of this inputs as the "modulating signal", then what we get is fuzzy conditional. By setting the modulating signal to zero, we can turn off parts of the network, and redirect the signal flow in the network. By setting the modulating signal to one, we just let the signal through. By setting it to something between 0 and 1, we can attenuate the signal, and by setting it above 1 or below 0, we can amplify or negate the signal.
This is understood for decades. In particular, the first known to me proof of Turing completeness of RNNs in 1987 features multiplication neurons prominently:
http://www.demo.cs.brandeis.edu/papers/neuring.pdf
But then it was mostly forgotten. In the 1990-s people managed to prove Turing completeness of RNNs without multiple arguments.
Of course, this does illustrate that theoretical Turing completeness and practical convenience of programming are not the same thing. (There was a recent talk by Edward Grefenstette of DeepMind, which I hope to discuss at some later point, which argued that the practical power of traditional RNNs is more like power of Finite State Machines, the theoretical Turing completeness notwithstanding. One of the objectives of DMM line of research is to boost the practical convenience of recurrent neural networks as a programming platform.)
The original RNNs had mediocre machine learning properties because of the problem of vanishing gradients. The first architecture which overcame this problem was LSTM in 1997:
https://en.wikipedia.org/wiki/Long_short-term_memory
LSTM and other architectures of this family eliminate the problem of vanishing gradient by introducing "memory" and "gates" (multiplicative masks) as additional mechanisms. However, the more straightforward way to think about this is to introduce neurons with linear activation functions for memory and bilinear neurons (the multiplication neurons I am discussing here) for gates (for good references, see Appendix C of the last of the preprints of this series, https://arxiv.org/abs/1610.00831 ).
So, here is the story. The neurons with two arguments are back, they are necessary for the modern RNN architectures such as LSTM and Gates Recurrent Units to work, but the way these modern architectures are usually presented avoids saying explicitly that we have neurons with two-argument activation function (namely, multiplication) here, and instead people talks about these things as "extra mechanisms added to RNNs", which makes them much more difficult to understand.
no subject
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
I was particularly impressed by his examples of the nets learning to reproduce patterns observed in real-life C code and patterns observed in mathematical texts written in LaTeX. This is the post I recommend to people as a starting point to familiarize oneself with the impressive capabilities of the RNNs.
His examples are character-based nets, with characters being represented as vectors, with the dimension of vectors being the alphabet in question. A particular character, say 'X', would be represented on the input by a vector with value 1 at the coordinate corresponding to 'X' and zeros at other coordinates. So, a conventional scalar-based RNN he uses needs as many neurons as the size of the alphabet just to input the characters into the network. This might be problematic, if one wants to program the character-based algorithms in the RNNs manually, and this would be problematic for large alphabets such as Unicode, which has more than 100,000 characters.
This is the first illustration of why it might be good to have vector neurons. If one has neurons which process linear streams, one can treat characters as sparse vectors, and then one can have character-processing nets with the number of neurons independent of the size of the alphabet.
For example, our preprint https://arxiv.org/abs/1606.09470 has a simple network with 9 neurons and 10 non-zero weights solving a standard coding interview problem with character strings (Section 1 explains the problem and Section 3, pages 8 and 9, solves it). It seems that this problem cannot be solved in a traditional RNN in such a way that the size of the RNN would be independent of the size of the alphabet (at least, there is no simple solution of this kind). This is one of the first indications that DMMs are indeed a more powerful programming platform than RNNs.
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
(no subject)
no subject
Reading Clojure code
Two resources are the most important when reading Clojure code:
http://clojure.org/reference/reader
(This page explains the meaning of all special symbols, and search engines are not effective with those.)
The community-based Clojure Docs. Google search would typically take you right there, but from that page you can also search for strange-looking function names which are not handled well by search engines:
https://clojuredocs.org/clojure.core/-%3E
Installation
The most straightforward path is to install Leiningen, a lightweight alternative to Maven:
http://leiningen.org/
Then when you type lein repl, it will download and install the latest stable Clojure (currently 1.8) automatically.
Books
A number of books are available online for free.
"Clojure for the brave and true" is a useful online book. Unlike many nice paper books, it covers some of the newer aspects (such as core.async, available since Clojure 1.6):
http://www.braveclojure.com/core-async/
IDE (traditional style)
Major IDEs tend to offer a choice of plugins for Clojure. The most popular choice for Emacs is cider, the most popular choice for JetBrains (IntelliJ) is Cursive, etc.
Attractive IDEs were developed from scratch as Clojure IDEs, e.g. Light Table and Nightcode.
One of the attractive choices is proto-repl plugin for Atom:
https://atom.io/packages/proto-repl and http://blog.element84.com/proto-repl-update.html
IDE (notebook style)
Jupyter notebooks have a selection of Clojure plugins.
A stand-alone attractive IDE in the style of Mathematica notebooks for Clojure is Gorilla REPL. See the two videos linked from the first paragraph of the following page for a short and impressive introduction:
http://gorilla-repl.org/renderer.html
The latest Clojure/conj conference
http://2016.clojure-conj.org/
In particular, I'd like to attract attention to a talk by Jason Gilman, on the Proto REPL in the Atom atom editor, which contained a very impressive demo of the Proto REPL capabilities: http://2016.clojure-conj.org/proto-repl/
The video of the talk: http://youtube.com/watch?v=buPPGxOnBnk