ε Pulse: Issue #12
Humanoid Robots 🦾, Agentic Workflows 👮♂, and Designing Antibiotics with Generative AI🧫.
Unrelated to AI:
See my latest Around the Web newsletter.
Watch my latest video essay: : The Middle Ages: Not So Dark
[AI/ML + Bio] 🤖 🦠 🧬
[I]: 🧬 Evo: DNA foundation modeling from molecular to genome scale
Summary: This paper introduces Evo, a genomic foundation model that excels in both prediction and generation tasks across the molecular to genome scale. Using the StripedHyena architecture and trained on the comprehensive OpenGenome dataset, Evo demonstrates significant advancements in modeling long sequences at single-nucleotide resolution and in generating synthetic DNA sequences for complex biological applications. (See associated blog post)
Technical details: Evo leverages a hybrid architecture called StripedHyena, comprising 29 layers of data-controlled convolutional operators and 3 layers of multi-head attention with rotary position embeddings, achieving a model size of 7 billion parameters with a context length of up to 131 kilobases. The model is trained on the OpenGenome dataset, which includes over 80,000 bacterial and archaeal genomes and millions of predicted prokaryotic phage and plasmid sequences, covering 300 billion nucleotide tokens.
Evo's training employs a next-token prediction objective, requiring the model to learn the distribution of genomic data and identify biological sequence motifs. Through a unique scaling laws analysis, the StripedHyena architecture was identified as optimally scalable for DNA sequence modeling, outperforming several other architectures, including the Transformer architecture. In zero-shot function prediction tasks, Evo matches or surpasses domain-specific language models in predicting mutational effects on protein and ncRNA functions and regulatory DNA activity.
Furthermore, Evo demonstrates an unprecedented ability to generate biologically plausible CRISPR-Cas molecular complexes, transposable biological systems, and long DNA sequences with high coding density, illustrating its potential for advanced genomic design and research.
[II]: 🧫 Generative AI for designing and validating easily synthesizable and structurally novel antibiotics.
Summary: This paper introduces SyntheMol, a generative model that designs new antibiotics by exploring a vast chemical space of nearly 30 billion molecules, identifying compounds that are both novel and easily synthesizable. The model's utility is demonstrated through the design, synthesis, and validation of molecules with antibacterial activity against Acinetobacter baumannii and other pathogens, showcasing a significant advance in the application of AI for drug discovery.
Some details: The study employs SyntheMol to tackle the challenge of designing structurally novel antibiotics that are easily synthesizable. SyntheMol navigates a chemical space encompassing nearly 30 billion molecules by leveraging molecular building blocks and well-established chemical reactions. This exploration is guided by property prediction models that assess the potential antibacterial activity of the generated compounds. Specifically, the model operates within a framework where:
Inputs are derived from a curated set of approximately 13,000 molecules with known bioactivity against A. baumannii, alongside molecular features and a combinatorial chemical space constructed from about 132,000 molecular building blocks and 13 chemical reactions.
Modeling aspects include the use of a Monte Carlo tree search (MCTS) mechanism, guided by property prediction models (Chemprop, Chemprop-RDKit, and a random forest model), to iteratively select and combine building blocks through chemical reactions. This process yields molecules with predicted antibacterial activity, which are then synthesized and experimentally validated.
Outputs of the model are novel molecular structures with potential antibacterial efficacy, highlighted by the synthesis and validation of 58 molecules, six of which showed significant activity against A. baumannii and a broad spectrum of bacterial pathogens.
This work fills a critical gap by providing a scalable approach to drug discovery, overcoming limitations of existing AI methods that either scale poorly to vast chemical spaces or produce molecules with challenging synthesis paths.
[AI X Industry + Products] 🤖🖥👨🏿💻
[I]: 🏦Microsoft Copilot for Finance.
“Introducing Microsoft Copilot for Finance, the AI assistant for finance professionals. Access insight while in the flow of work to support strategic decision-making and reduce the time spent on manual, repetitive work. By harnessing next-generation AI, Copilot automates time consuming tasks, like data consolidation, to empower finance professionals to focus on what truly matters – driving business performance.”
[II]: 🎼 Suno AI: AI generated music
“Suno is building a future where anyone can make great music. Whether you're a shower singer or a charting artist, we break barriers between you and the song you dream of making. No instrument needed, just imagination. From your mind to music.”
[III]: 💻 Automating Software Engineering
A startup, Cognition, recently released the first AI software engineer. See demo, AI trains AI.
I like Andrej Karpathy’s Commentary on this development:
“In my mind, automating software engineering will look similar to automating driving. E.g. in self-driving the progression of increasing autonomy and higher abstraction looks something like:
1. first the human performs all driving actions manually 2. then the AI helps keep the lane 3. then it slows for the car ahead 4. then it also does lane changes and takes forks 5. then it also stops at signs/lights and takes turns. 6. eventually you take a feature complete solution and grind on the quality until you achieve full self-driving.
There is a progression of the AI doing more and the human doing less, but still providing oversight. In Software engineering, the progression is shaping up similar: 1. first the human writes the code manually. 2. then GitHub Copilot autocompletes a few lines 3. then ChatGPT writes chunks of code 4. then you move to larger and larger code diffs (e.g. Cursor copilot++ style, nice demo here) 5.... Devin is an impressive demo of what perhaps follows next: coordinating a number of tools that a developer needs to string together to write code: a Terminal, a Browser, a Code editor, etc., and human oversight that moves to increasingly higher level of abstraction. There is a lot of work not just on the AI part but also the UI/UX part. How does a human provide oversight? What are they looking at? How do they nudge the AI down a different path? How do they debug what went wrong? It is very likely that we will have to change up the code editor, substantially. In any case, software engineering is on track to change substantially. And it will look a lot more like supervising the automation, while pitching in high-level commands, ideas or progression strategies, in English.”
[IV]: 🦾 Humanoid Robots can now have Conversation with People.
Figure 01 humanoid robot can now have full conversations with people via Integrating OpenAI text and vision capabilities. I suppose it’s only a matter of time.
[IV]: 📱 Top 100 GenAI Consumer App.
“Six of the seven productivity apps on this list either offer or operate entirely through a Google Chrome extension. We expect more AI productivity tools to operate “in the flow” with the work users are already doing, removing the need to copy and paste a prompt and output between your workspace and an assistant like ChatGPT.”
“For companion products that do have mobile apps, engagement is unusually high. The most successful products in this category become a core part of the user’s daily life, becoming as commonplace as texting a friend (if not more so!).”
Link to report.
[V]: ♊Google’s Gemini 1.5
This is a really good thread on the capabilities of Google’s Gemini 1.5 Pro
[AI + Commentary] 📝🤖📰
[I]: 🎙 Nice podcast on GenAI in the programming market space.
[II]: 💼Marc Andreesen on AI and Dynamism
[III]: 🏗 Building a Bootstrapped AI SaaS
This is a rather long presentation on bootstrapping AI SaaS, but it’s a very good resource particularly for folks new to AI SaaS. It has nice content on Tech Stack (~18min mark) and What you need to master to build, maintain and scale an AI SaaS (~32 min).
[IV]: 🧬 Daphne Koller: The Convergence of A.I. and Digital Biology
“A brilliant pioneer in A.I. and its role in life science, and the Founder and CEO of insitro, shares the excitement for how these fields are coming together to change the future of medicine.”
[V]: 👨💻 AI startups require new strategies.
A fine essay on (re)thinking AI strategy for startups vs incumbent.
[VI]: 👮♂ Andrew Ng: What's next for AI agentic workflows
Ng talked about reflection, tool use, planning, and multiagent collaboration. He recommends the following papers: Self-Refine, Reflexion, reflection agent, Gorilla, MM-ReAct, HuggingGPT, Chain-Of-Thoughts, planning agent, AutoGen, and Communicative Agents for Software Development.
[VII]: RAG for long context LLMs
With the emergence of long context Large Language Models (LLMs), some of argued that Retrieval-Augmented Generation (RAG) systems are dead. However, given that the retrieval of facts by long context LLM is not optimal, this talk argues that RAG is still relevant even though the context window of LLMs is getting larger.
Some key points:
RAG overview: RAG is a technique where a large language model (LLM) retrieves relevant documents from a database in response to a query, then reasons about those documents to generate an answer.
Challenges of long-context LLMs for RAG: With larger context windows, LLMs might be able to perform retrieval and reasoning on their own, making RAG seem redundant.
Experiment with GPT-4: An experiment was conducted where GPT-4 was asked to retrieve and reason over multiple facts (needles) placed at different locations within a 120,000 token context window. The results showed that GPT-4 performed worse on retrieving needles placed earlier in the context window, suggesting that recent information is more easily accessible to the LLM.
Future of RAG: The presenter argues that RAG will remain important even with long-context LLMs. Here are some of the reasons why:
Focus on full documents: Instead of chunking documents into smaller pieces, future RAG systems might work with full documents.
Multi-representation indexing: This technique involves creating multiple representations (e.g., summaries) of a document and indexing them for retrieval.
Hierarchical indexing: This technique involves clustering documents and creating summaries at different levels of granularity, allowing the system to efficiently find relevant information.
Reasoning in retrieval and generation: Future RAG systems might incorporate reasoning throughout the process, including pre-retrieval (query analysis), retrieval (grading retrieved documents), and post-retrieval (grading generated responses).
Overall, the talk suggests that RAG will evolve to adapt to the capabilities of long-context LLMs and play an important role in future information retrieval systems.