Anhinga anhinga

OpenAI Codex (статья года), JuliaCon 2021 starts on July 20

2021-07-19T09:09:29Z

"Evaluating Large Language Models Trained on Code", arxiv.org/abs/2107.03374

A detailed description of early (pre-GitHub Copilot) versions of OpenAI Codex. This is the "paper of the year" so far: we finally have real progress in AI-assisted computer programming (and difficulties of computer programming form the key bottleneck limiting the speed of progress).

See comments in dmm.dreamwidth.org/44860.html for details.

JuliaCon 2021 starts on July 20 with 8 days of workshops followed by 3 days of main conference. JuliaCon 2020 was great, this is likely to be even better.

This is a fully virtual conference for the second year in a row; the registration is free and needed to access interactive features, poster sessions, and such, but the bulk of materials will be accessible via YouTube without registration. I created a post dmm.dreamwidth.org/46160.html with links and I'll keep populating it with various comments as the conference progresses.

Cross-post: anhinga-anhinga.livejournal.com/85003.html

comments

9 months since GPT-3 revolution

2021-02-28T08:44:10Z

On May 28, 2020 OpenAI published the GPT-3 paper, "Language Models are Few-Shot Learners", arxiv.org/abs/2005.14165

This was the "AlexNet moment of the Transformer Revolution", and the qualitative jump was even more significant than the AlexNet 2012 jump.

One extremely strange and remarkable property of GPT-3 is that purely linguistic knowledge in this model is often sufficient to guess a piece of correct computer code from a natural language description of a problem (even though we don't think this model "truly understands programming").

There was already quite a boom in these novel models (invented as recently as 2017), after BERT and GPT-2, but now the field had just exploded: "efficient transformers", "vision transformers", "multimodal transformers", etc.

And tons of interesting work were done in hybrid models which combined transformers and other attention-based models with all kinds of other techniques. Hybrids of all kinds of methods with transformers and other attention-based models are probably the future. For example, the famous Alpha Fold 2 by DeepMind which "solved" protein folding in November was a hybrid model with an attention-based component at its center: deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology

All this means two things: 1) "True AI" can emerge any moment; we are seeing a flood of breakthroughs now, and one of them (or a short sequence of them) might result in a much more radical shift than anything we've seen so far. I don't know if it happens this year, but it's a possibility (from invention of convolutional neural nets in 1989 it took more than 20 years to AlexNet, from invention of Transformers in 2017 it took only 3 years to GPT-3, things can really start happening very fast).

2) If you are a practitioner in any field (especially if this field is machine learning of some kind), it makes sense to ponder hybrids between your favorite methods and "attention" (which is just a linear combination of high-dimensional vectors [sometimes with all coefficients being non-negative and summing up to 1]), hybrids between your favorite methods and matrix multiplication (which is just a way to compute a lot of linear combinations of high-dimensional vectors rapidly), hybrids between your favorite methods and Transformers (a certain way of arranging those matrix multiplications and interleaving them with modest neural connectors). This is likely to be a very fruitful thing, and this is how you can supercharge your favorite methods and produce novel results.

Cross-post: anhinga-anhinga.livejournal.com/84392.html

comments

Julia programming language

2020-02-16T06:43:45Z

Julia is an unusual language. It is based around the idea of "eating your cake and having it too, again and again". Flexible and very fast at the same time, friendly readable syntax and Lisp-strength macros and multiple dispatch, etc:

https://julialang.org/blog/2012/02/why-we-created-julia/

Julia Flux is trying to become the next generation machine learning framework, and is also characterized by this approach of "eating your cake and having it too". If TensorFlow 1.0 is the past, and PyTorch is the leading state-of-the-art framework of the present, Julia Flux is quite likely to become the machine learning framework of the future; see the first comment in this blog post for details:

https://dmm.dreamwidth.org/23453.html

Does anyone here use Julia, or does anyone here knows someone who uses Julia?

Crosspost: anhinga-anhinga.livejournal.com/84046.html

comments

2019: shaders, dreamwidth, and more

2019-12-31T14:00:12Z

I hope for us all to have a creative and safe New Year.

Computer art news: I started to play with OpenGL shaders and with Shadertoy.com: dmm.dreamwidth.org/20076.html

Other news: books and stories, my own texts, open source activity and software experiments, employment change, etc: dmm.dreamwidth.org/23061.html

I generally shifted quite a bit towards dreamwidth and this blog during this year, and away from livejournal; this was not planned, but just happened "organically". Most of my activity this year was at dmm.dreamwidth.org

Crosspost: anhinga-anhinga.livejournal.com/83885.html

comments

Self-referential neural nets in 2018

2018-12-05T18:03:16Z

Two series of experiments with self-referential neural nets with vector flows ("dataflow matrix machines") were done by us in 2018.

The ability of a neural net to modify itself on the fly was used to edit it interactively while it is running ("livecoding"). This also opens the way to have populations of neural nets editing each other.

Emerging "sleep-wake" behavior and other emerging bistability patterns were observed in randomly initialized neural nets (May 2019 update: a couple of video recordings of those behaviors are posted: https://youtu.be/_mZVVU8x3bs and https://youtu.be/CKVwsQEMNjY ). There is no theoretical understanding of this emerging dynamics yet.

( Details )
Crosspost: anhinga-anhinga.livejournal.com/83697.html

Other blogs by this author:

https://dmm.dreamwidth.org/ (partial mirror: https://anhinga-travel.livejournal.com/ )
https://anhinga-drafts.livejournal.com/ (mirror: https://anhinga-drafts.dreamwidth.org/ )

comments

Dataflow matrix machines as a bridge between programs and neural nets

2017-12-31T17:58:51Z

Continuing the last couple of posts

https://anhinga-anhinga.livejournal.com/82953.html

and( Read more... )

comments