Hacker News

by hahnchenon 12/15/2023, 10:19 AMwith 49 comments

I used to be into ML back in the R-CNN, GAN, ResNet era and would read papers/blogs.

Seems like ML is taking off recently and I want to get back into it! So far on my list I have attention is all you need, qlora, llama’s and q learning. Suggestions?

by hapaninon 12/15/2023, 3:59 PM

Since nobody is actually recommending papers, here's an incomplete reading list that I sent out to some masters students I work with so they can understand the current research (academic) my little team is doing:

Paper reference / main takeaways / link

instructGPT / main concepts of instruction tuning / https://proceedings.neurips.cc/paper_files/paper/2022/hash/b...

self-instruct / bootstrap off models own generations / https://arxiv.org/pdf/2212.10560.pdf

Alpaca / how alpaca was trained / https://crfm.stanford.edu/2023/03/13/alpaca.html

Llama 2 / probably the best chat model we can train on, focus on training method. / https://arxiv.org/abs/2307.09288

LongAlpaca / One of many ways to extend context, and a useful dataset / https://arxiv.org/abs/2309.12307

PPO / important training method / idk just watch a youtube video

Obviously these are specific to my work and are out of date by ~3-4 months but I think they do capture the spirit of "how do we train LLMs on a single GPU and no annotation team" and are frequently referenced simply by what I put in the "paper reference" column.

by kozikowon 12/15/2023, 1:02 PM

My view is to focus on doing stuff. That's what I did. Pick up some task you want the model to do, try finetuning llama, playing with APIs from OpenAI, etc. Googling and asking GPT along the way.

Foundational model training got so expensive that unless you can get hired by "owns nuclear power plant of GPUs" you are not going to get any "research" done. And as the area got white-hot those companies have more available talent than hardware nowadays. So just getting into the practitioner area is the best way to get productive with those models. And you improve as a practitioner by practicing, not by reading papers.

If you're at the computer, your time is best spent writing code and interacting with those models in my opinion. If you cannot (e.g. commute) I listen to some stuff (e.g. https://www.youtube.com/watch?v=zjkBMFhNj_g - Anything from Karpathy on youtube, or https://www.youtube.com/@YannicKilcher channel).

by carlossouzaon 12/15/2023, 1:57 PM

https://trendingpapers.com/

This tool can help you find what's new & relevant to read. It's updated every day (based on ArXiv).

You can filter by category (Computer Vision, Machine Learning, NLP, etc), by release date, but most importantly, you can rank by PageRank (proxy of influence/readership), PageRank growth (to see the fastest growing papers in terms of influence), total # of citations, etc...

by kaspernion 12/15/2023, 11:43 AM

Maybe this tweet by John Carmack can help you:

This is a great little book to take you from “vaguely understand neural networks” to the modern broad state of practice. I saw very little to quibble with. https://fleuret.org/francois/lbdl.html

by d_burfooton 12/15/2023, 2:49 PM

Bear in mind that ML skillset is now bifurcating into two components. On the one side are the people who work at places like OpenAI/DeepMind/Mistral/etc, who have billion dollar compute budgets. They are the ones who will create the foundational models. At this point a lot of this work is very technically narrow, dealing with CUDA, GPU issues, numerical stability, etc. On the other side are people who are using the models through the APIs in various ways. This is much more open-ended and potentially creative, but you don't need to know how QLearning works to do this.

It's a bit analogous to the situation with microprocessors. There is a ton of deep technical knowledge about how chips work, but most of this knowledge isn't critical for mainstream programming.

by magoghmon 12/15/2023, 2:05 PM

The book that just came out, "Understanding Deep Learning", is an excellent overview of the current state of AI: https://udlbook.github.io/udlbook/

Read that first, then to keep up to date you can follow up with any papers that seem interesting to you. A good way to be aware of the interesting papers that come out is to follow @_akhaliq on X: https://twitter.com/_akhaliq

by jpduson 12/15/2023, 11:54 AM

Hey, imho best overall technical intro to LLMs (I guess that´s your main interest as you mentioned qlora + llama) is by Simon Willis [1]. Additionally or if you prefer videos, the recent 1h "busy persons intro" by Andrei Karpathy is great + dense as well [2].

[1] https://simonwillison.net/2023/Aug/3/weird-world-of-llms/ [2] https://youtu.be/zjkBMFhNj_g?si=M6pRX66NrRyPM8x-

EDIT: Maybe I misunderstood as you asked about papers, not general intros. I don´t think that reading papers is the best way to "catch up" as the pace is rapid and knowledge very decentralized. I can confirm what Andrej recently wrote on X [3]:

"Unknown to many people, a growing amount of alpha is now outside of Arxiv, sources include but are not limited to:

- https://github.com/trending

- HN

- that niche Discord server

- anime profile picture anons on X

- reddit"

[3] https://twitter.com/karpathy/status/1733968385472704548

by antirezon 12/15/2023, 12:19 PM

This one is very good, and will provide certain key insights on the way you should think at NNs. -> https://www.amazon.it/Deep-Learning-Python-Francois-Chollet/...

This is a good explanation of the Transformer details -> https://www.youtube.com/watch?v=bCz4OMemCcA&ab_channel=UmarJ...

This is old but covers a lot of background that you needs to know to understand very well the rest. What I like of this book is that it often explains in a very intuitive way the motivations behind certain choices. -> https://www.amazon.it/Natural-Language-Processing-Pytorch-Ap...

by knbknbon 12/15/2023, 2:26 PM

Once a week (at least!) some research group publishes another review paper to the cs.AI section on ArXiv. Look for new [papers with "survey" in the title](https://arxiv-sanity-lite.com/?q=survey&rank=time&tags=cs.AI...). You'll get surveys on every conceivable subtopic of ML/AI.

by lukeinator42on 12/15/2023, 3:12 PM

I'd also add "Deep reinforcement learning from human preferences" https://proceedings.neurips.cc/paper_files/paper/2017/file/d... and "Training language models to follow instructions with human feedback" https://proceedings.neurips.cc/paper_files/paper/2022/file/b....

These papers outline the approach of reinforcement learning from human feedback which is being used to train lots of these LLMs such as ChatGPT.

by andyjohnson0on 12/15/2023, 2:04 PM

I kind of despair of keeping up to date with ML, at least to the extent that I might ever get current enough to be paid to work with it. I did Andrew Ng's Coursera specialisation a few years back - and I've worked through some of the developer-oriented courses, implemented some stuff. read more than a few books, read papers (the ones I might have a hope of understanding), and tried to get a former employer to take it seriously. But its seeming like unless you have a PhD or big-co experience then its very difficult to keep up to date by working in the field.

Notwithstanding the above, I'd agree with others here who suggest learning by doing/implementing, not reading papers.

by gschoenion 12/15/2023, 9:52 PM

I put together a reading list for Andrej Karpathy's intro to LLMs that would be helpful for all of the latest LLM and multi-modal architectures:

https://blog.oxen.ai/reading-list-for-andrej-karpathys-intro...

by cs702on 12/15/2023, 2:13 PM

Build something of personal interest to you. Start by looking for similar open-source projects online. Look at the online posts of the authors. Then look for the papers that you think will be useful for your project. Before you know it, you'll become an expert in your area of interest.

Above all, be wary of programmatic lists that claim to track the most important recent papers. There's a ridiculous amount of hype/propaganda and citation hacking surrounding new AI research, making it hard to discern what will truly stand the test of time. Tomas Mikolov just posted about this.[a]

---

[a] https://news.ycombinator.com/item?id=38654038

by maxlambon 12/15/2023, 11:55 AM

Part 2 of the fast.ai course might be a good start: https://course.fast.ai/Lessons/part2.html

by auntienomenon 12/15/2023, 1:16 PM

I found Cosma Shalizi's notes on the subject pretty insightful.

http://bactra.org/notebooks/nn-attention-and-transformers.ht...

Definitely read through to the last section.

by eurekinon 12/15/2023, 12:32 PM

https://www.youtube.com/@algorithmicsimplicity - that series cleared up the fundamental question about transformers I couldn't find an answer for in many recommended materials.

Here's also nice tour de building blocks, which could also double as transformers/tensorflow API reference documentation: https://www.youtube.com/watch?v=eMXuk97NeSI&t=207s

The #1 visualization of architecture and size progression: https://bbycroft.net/llm

by gurovichon 12/15/2023, 4:39 PM

This resource has been invaluable to me: https://paperswithcode.com/

From the past examples you give it sounds like you were into computer vision. There’s been a ton of developments since then, and I think you’d really enjoy the applications of some of those classic convolutional and variational encoder techniques in combination with transformers. A state of the art multimodal non-autoregressive neural net model such as Google’s Muse is a nice paper to work up to, since it exposes a breadth of approaches.

by sgt101on 12/15/2023, 4:07 PM

No emergence

[2304.15004] Are Emergent Abilities of Large Language Models a Mirage? - arXiv https://arxiv.org/abs/2304.15004

Can't plan

https://openreview.net/forum?id=X6dEqXIsEW

No compositionality https://openreview.net/forum?id=Fkckkr3ya8

Apart from that it's great

by sthowardon 12/15/2023, 9:52 PM

Would suggest our weekly paper club called Arxiv Dive - https://lu.ma/oxenbookclub. You can see past ones on our blog (https://blog.oxen.ai/) - have covered papers like Mamba, CLIP, Attention is all you need, and more. We also do a "hands on" session with live code, models, and real world data on Fridays!

by aakashg99on 12/18/2023, 6:38 AM

I recently started reading research papers related to GPTs and LLMs. I have listed them here, along with my short synopsis and links to their code and datasets

https://www.thinkevolveconsulting.com/large-language-models-...

by youngprogrammeron 12/16/2023, 1:16 AM

Little late to this thread but from my list:

LLM (foundational papers)

* Attention is all you need - transformers + self attention

* BERT - first masked LM using transformers + self attention

* GPT3 - big LLM decoder (Basis of gpt4 and most LLM)

* Instruct GPT or TKInstruct (instruction tuning enables improved zero shot learning)

* Chain of Thought (improve performance via prompting)

some other papers which are become trendy depending on your interest

* RLHF - RL using human feedback

* Lora - make models smaller

* MoE - kind of ensembling

* self instruct - self label data

* constitutional ai - self alignment

* tree of thought - like CoT but a tree

* FastAttention,Longformer - optimized attention mechanisms

* React - agents

by lysecreton 12/15/2023, 2:56 PM

The good (and some might say bad thing) is that when it comes to fundamental technologies there are only 2 that are relevant:

1. Transformers 2. Diffusion

The benefit is that, focus on understanding them both reeaaalllyy well and you are at the forefront of research;)

Also, what is the reason you want to do this? If it is about building some kind of AI enabled app, you don't have to read anything. Get an API key and let's go the barrier has never been lower.

by pomaticon 12/15/2023, 2:16 PM

Posted in another thread, but sadly I got no replies...

Related question: how can I learn how to read the mathematical notation used in AI/ML papers? Is there a definitive work that describes the basics? I am a post-grad Engineer, so I know the fundamentals, but I'm really struggling with a lot of the Arxiv papers. Any pointers hugely appreciated.

by nedumaon 12/16/2023, 4:09 AM

From ChatGPT:

>> To catch up with the current state of Artificial Intelligence and Machine Learning, it's essential to look at the latest and most influential research papers. Here are some categories and specific papers you might consider:

1. *Foundational Models and Large Language Models*: - Papers on GPT (Generative Pre-trained Transformer) series, particularly the latest like GPT-4, which detail the advancements in language models. - Research on BERT (Bidirectional Encoder Representations from Transformers) and its variants, which are pivotal in understanding natural language processing.

2. *Computer Vision*: - Look into papers on Convolutional Neural Networks (CNNs) and their advancements. - Research on object detection, image classification, and generative models like Generative Adversarial Networks (GANs).

3. *Reinforcement Learning*: - Papers from DeepMind, like those on AlphaGo and AlphaZero, showcasing advances in reinforcement learning. - Research on advanced model-free algorithms like Proximal Policy Optimization (PPO).

4. *Ethics and Fairness in AI*: - Papers discussing the ethical implications and biases in AI, including work on fairness, accountability, and transparency in machine learning.

5. *Quantum Machine Learning*: - Research on the integration of quantum computing with machine learning, exploring how quantum algorithms can enhance ML models.

6. *Healthcare and Bioinformatics Applications*: - Papers on AI applications in healthcare, including drug discovery, medical imaging, and personalized medicine.

7. *Robotics and Autonomous Systems*: - Research on the intersection of AI and robotics, including autonomous vehicles and drone technology.

8. *AI in Climate Change*: - Papers discussing the use of AI in modeling, predicting, and combating climate change.

9. *Interpretable and Explainable AI*: - Research focusing on making AI models more interpretable and explainable to users.

10. *Emerging Areas*: - Papers on new and emerging areas in AI, such as AI in creative arts, AI for social good, and the integration of AI with other emerging technologies like the Internet of Things (IoT).

To find these papers, you can check academic journals like "Journal of Machine Learning Research," "Neural Information Processing Systems (NeurIPS)," and "International Conference on Machine Learning (ICML)," or platforms like arXiv, Google Scholar, and ResearchGate. Additionally, following key AI research labs like OpenAI, DeepMind, Facebook AI Research, and university research groups can provide insights into the latest developments.

by ricklamerson 12/15/2023, 12:19 PM

If you want good up to date resources on the applied side I’d recommend checking out https://hamel.dev/notes/

by hoerzuon 12/15/2023, 1:48 PM

At the Twitter section at the bottom there is usually good papers https://news.mioses.com

by yieldcrvon 12/15/2023, 1:45 PM

you don’t need papers, Arxiv are self aggrandizement from some meme in East Asia

just join communities on discord or locallama on reddit

by voidz7on 12/15/2023, 1:42 PM

can I get some insights on ai and robotics some papers to implement and get my hands dirty

Ask HN: AI/ML papers to catch up with current state of AI?