Unrelated to AI:
See my latest Around the Web newsletter.
Watch my latest video essay: From Elephants to Metropolises.
[AI/ML + Bio] 🤖 🦠 🧬
[I]: 🦠 scGPT: toward building a foundation model for single-cell multi-omics using generative AI.
Summary: This paper introduces scGPT, a foundation model for single-cell multi-omics using generative AI, designed to advance cellular biology and genetic research by leveraging over 33 million single-cell sequencing data points.
Technical details: scGPT employs a generative pretraining transformer architecture tailored for non-sequential omics data, with inputs being single-cell RNA sequencing (scRNA-seq) data from 33 million cells. The model uses a self-supervised learning approach with a specially designed attention mask and generative training pipeline, optimizing for joint optimization of cell and gene representations. Outputs include enhanced cell type annotation, multi-batch and multi-omic integration, perturbation response prediction, and gene network inference, addressing the gap in scalable and integrative analysis tools for single-cell research.
[II]: 📄 ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing
Summary: The study explores the potential of GPT-4, a large language model, in assisting with the review process of scientific papers, focusing on error detection, checklist verification, and evaluating the comparative quality of papers.
Some details: The research employs GPT-4 to undertake three distinct tasks: identifying errors in computer science papers, verifying author-provided checklists for NeurIPS 2022 papers, and choosing the "better" paper from pairs of abstracts. For error identification, the study created 13 short papers with intentional errors, finding GPT-4 could detect errors in 7, spanning mathematical and conceptual mistakes. In verifying checklists, GPT-4 achieved an 86.6% accuracy across 119 question-paper pairs. For the comparative evaluation, GPT-4 struggled, making errors in 6 out of 10 pairs. This work demonstrates the potential of large language models as assistants in specific reviewing tasks but highlights their current limitations in fully evaluating scientific papers.
[III]: 🧬 An RNA foundation model enables discovery of disease mechanisms and candidate therapeutics.
Summary: This work introduces "BigRNA," a foundation model trained on genome-matched datasets to predict RNA biology aspects such as tissue-specific expression, RNA splicing, and microRNA sites, aiming to discover disease mechanisms and design RNA-based therapeutics.
Technical details: BigRNA is trained on RNA sequencing (RNA-seq) data, employing a transformer-based architecture to learn from DNA and RNA expression data at a 128bp resolution across numerous individuals. It's designed to predict various RNA biology aspects, including RNA-binding protein (RBP) specificity and microRNA binding sites, directly from DNA sequence. The model surpasses previous approaches by identifying pathogenic non-coding variants and their effects on gene expression and splicing, achieving notable accuracy in predicting the impact of steric blocking oligonucleotides (SBOs) on gene expression and splicing across multiple genes. BigRNA's training involved thousands of genome-matched RNA-seq samples, covering 51 tissues from 70 individuals, with subsequent fine-tuning for specific tasks such as RBP and microRNA binding site prediction. Its performance was evaluated against known pathogenic variants, demonstrating superior predictive capabilities for variant impact on RNA biology compared to existing models, highlighting its potential in personalized RNA therapeutic discovery.
[AI X Industry + Products] 🤖🖥👨🏿💻
[I]: ♊Gemini 1.5 Pro is a Game Changer.
I mentioned Gemini 1.5 briefly in my blog on transformers. I am quite excited about this one.
“It shows dramatic improvements across a number of dimensions and 1.5 Pro achieves comparable quality to 1.0 Ultra, while using less compute… This new generation also delivers a breakthrough in long-context understanding. We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.”
For folks who don’t really get what that ‘1 million tokens’ mean, watch the following demo: Gemini 1.5 for summarization and code.
[II]: 📑 NotebookLM: AI Powered Note Taking
I have been using NotebookLM from Google for some time now, and now I think it’s open to the public, at least in the United States. It has been very useful for writing essays for my video essay project.
Here is a nice tutorial on how to use it.
[III]: 📹 OpenAI Sora: Text-to-Video
Sora various demos, which I am sure you might have seen, is mind-blowing.
“We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.
Introducing Sora, our text-to-video model. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.”
[IV]: 🕸 LangGraph from LangChain
LangChain is perhaps the most powerful library for genAI apps out there. They recently released a new library for “building stateful, multi-actor applications with LLMs, built on top of (and intended to be used with) LangChain.”
It took a little while to wrap my head around what was going on. Then I stumbled on an article – this essay explains LangGraph. Such a nice framework to build agentic apps!
[V]: 🎟 Google One AI Premium Plan
“…Now we’re bringing even more value to the Google One AI Premium plan with Gemini in Gmail, Docs, Slides, Sheets and Meet (formerly Duet AI). […] you’ll be able to access Gemini capabilities directly within the Google products you’re already using, and get more done without jumping between tabs or apps.”
First 2 month is free. I suppose there is no harm in trying.
[AI + Commentary] 📝🤖📰
[I]: 💻 Andrej Karpathy: Fun LLM Challenge
“Fun LLM challenge that I'm thinking about: take my 2h13m tokenizer video and translate the video into the format of a book chapter (or a blog post) on tokenization. Something like:
1. Whisper the video
2. Chop up into segments of aligned images and text
3. Prompt engineer an LLM to translate piece by piece
4. Export as a page, with links citing parts of original video
More generally, a workflow like this could be applied to any input video and auto-generate "companion guides" for various tutorials in a more readable, skimmable, searchable format. Feels tractable but non-trivial.”
[II]: 💼AI agents as a new distribution channel
“Every new platform satisfies two criteria: 1) it gives us new capabilities, and 2) it enables new distribution patterns (new user types, new channels, new ways of buying).”
Hence, “AI agents are the next big product distribution channel”, the author argued.
[III]: 🧠 OpenAI's Bet on a Cognitive Architecture
This essay from LangChain explains how to think of cognitive architectures in the agentic future that is looming, using OpenAI’s GPTs and Assistant API to guide the discussion.
[IV] 📱Harrison Chase on LangChain and LLM Apps
I am betting on LangChain to be the vehicle for the most powerful generative AI apps. I am currently building something with the library, and it’s pretty amazing what you can do.
[V]: 💻 Jeff Dean (Google): Exciting Trends in Machine Learning
Great (and clear) presentation on general purpose machine learning (including applications in science and medicine), with a major highlight of work at Google.
Abstract: “In this talk I’ll highlight several exciting trends in the field of AI and machine learning. Through a combination of improved algorithms and major efficiency improvements in ML-specialized hardware, we are now able to build much more capable, general purpose machine learning systems than ever before. As one example of this, I’ll give an overview of the Gemini family of multimodal models and their capabilities. These new models and approaches have dramatic implications for applying ML to many problems in the world, and I’ll highlight some of these applications in science, engineering, and health. This talk will present work done by many people at Google.”