http://anhinga-anhinga.livejournal.com/ ([identity profile] anhinga-anhinga.livejournal.com) wrote in [personal profile] anhinga_anhinga 2016-12-30 04:22 am (UTC)

Привет! Let me start with answering your questions, then pointing to some links.

1. Indeed there is a theorem that any reasonable function can be approximated by a neural cascade. But pragmatically speaking, it's a nightmare, not something easy to use in practice. (E.g. try to craft a network performing a multiplication of two reals. Also, in general, the more precise the approximation, the larger must the cascade/subnetwork.) So it is convenient to allow to add built-in activation functions as needed.

2. It's up to the designer. E.g. even for multiplication, sometimes an asymmetric interpretation is convenient, and sometimes it might be convenient to think in a symmetric fashion that both inputs modulate each other.

3. Typically, we think about a particular neuron as having a type. And this type says among other things how many inputs a neuron has, and what vectors are associated with them. For example, one input might take a vector representing a letter (so it would have a dimension of the alphabet in question), and another input might take a number (a one-dimensional vector), and so on. And the network can have neurons of different types.

So in that approach, the number of inputs for a single given neuron is fixed, but there can be neurons of varying type. This is our approach in the preprints.

Our latest version is more flexible (the one in our new open-source project in Clojure). There we use the vector space of recurrent maps to implement variadic neurons. Basically, recurrent maps allow us to assemble any number of arguments into one single argument, and so we can code neurons with arbitrary (and not necessarily fixed) number of inputs via functions of one argument.

4. Yes, we think about countable-sized networks, but with only finite number of non-zero elements in the network matrix at any given time. Only the neurons with a non-zero weight associated with them are considered active. So at any given moment, only a finite subnetwork is active and the rest is silent. The new neurons can be recruited from the silent part by making some of their associated weights non-zero. So as long as the network weights are allowed to change, the new neurons can be created dynamically (by recruiting them from the countable silent part).

5. There is actually friendship with any reasonable representations of vectors in computers. Even with very approximate representations (e.g. probability distributions, which are often infinitely dimensional as vectors, can be approximately represented by samples, and then variants of a stochastic sum can be used to implement linear combinations).

I think this is a somewhat difficult material. (I am telling various parts of this story to people, and it's not always easy for them to understand.)

It's not too difficult compared to many math texts, but as a computer science it's not very simple.

I am happy to answer more questions. I can also put together more references to RNNs (let me know, if I should).

But speaking about DMMs, the links above are a complete set of literature available at the moment. We discovered them last summer, and this year we understood that what we have is a generalized version of RNNs, and these preprints and notes in the open-source projects (this one, and the previous one mentioned in the previous livejournal post) is all that exists at the moment on DMMs.

RNN-related literature is quite extensive. The link to Karpathy's post in my comments above is very useful. I found various Wikipedia pages helpful. On LSTMs see Appendix C of the last of our preprints and this awesome paper it talks about: https://www.arxiv.org/abs/1603.09420 (this is the best way to understand LSTMs and similar architectures).

Post a comment in response:

This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting