The following post "The Unreasonable Effectiveness of Recurrent Neural Networks" by Andrej Karpathy illustrates the remarkable power of LSTM as a machine learning platform:
I was particularly impressed by his examples of the nets learning to reproduce patterns observed in real-life C code and patterns observed in mathematical texts written in LaTeX. This is the post I recommend to people as a starting point to familiarize oneself with the impressive capabilities of the RNNs.
His examples are character-based nets, with characters being represented as vectors, with the dimension of vectors being the alphabet in question. A particular character, say 'X', would be represented on the input by a vector with value 1 at the coordinate corresponding to 'X' and zeros at other coordinates. So, a conventional scalar-based RNN he uses needs as many neurons as the size of the alphabet just to input the characters into the network. This might be problematic, if one wants to program the character-based algorithms in the RNNs manually, and this would be problematic for large alphabets such as Unicode, which has more than 100,000 characters.
This is the first illustration of why it might be good to have vector neurons. If one has neurons which process linear streams, one can treat characters as sparse vectors, and then one can have character-processing nets with the number of neurons independent of the size of the alphabet.
For example, our preprint https://arxiv.org/abs/1606.09470 has a simple network with 9 neurons and 10 non-zero weights solving a standard coding interview problem with character strings (Section 1 explains the problem and Section 3, pages 8 and 9, solves it). It seems that this problem cannot be solved in a traditional RNN in such a way that the size of the RNN would be independent of the size of the alphabet (at least, there is no simple solution of this kind). This is one of the first indications that DMMs are indeed a more powerful programming platform than RNNs.
no subject
Date: 2016-12-28 10:21 pm (UTC)http://karpathy.github.io/2015/05/21/rnn-effectiveness/
I was particularly impressed by his examples of the nets learning to reproduce patterns observed in real-life C code and patterns observed in mathematical texts written in LaTeX. This is the post I recommend to people as a starting point to familiarize oneself with the impressive capabilities of the RNNs.
His examples are character-based nets, with characters being represented as vectors, with the dimension of vectors being the alphabet in question. A particular character, say 'X', would be represented on the input by a vector with value 1 at the coordinate corresponding to 'X' and zeros at other coordinates. So, a conventional scalar-based RNN he uses needs as many neurons as the size of the alphabet just to input the characters into the network. This might be problematic, if one wants to program the character-based algorithms in the RNNs manually, and this would be problematic for large alphabets such as Unicode, which has more than 100,000 characters.
This is the first illustration of why it might be good to have vector neurons. If one has neurons which process linear streams, one can treat characters as sparse vectors, and then one can have character-processing nets with the number of neurons independent of the size of the alphabet.
For example, our preprint https://arxiv.org/abs/1606.09470 has a simple network with 9 neurons and 10 non-zero weights solving a standard coding interview problem with character strings (Section 1 explains the problem and Section 3, pages 8 and 9, solves it). It seems that this problem cannot be solved in a traditional RNN in such a way that the size of the RNN would be independent of the size of the alphabet (at least, there is no simple solution of this kind). This is one of the first indications that DMMs are indeed a more powerful programming platform than RNNs.