ε Pulse: Issue #15

Context Caching ⌨, Autonomous Agents in Chemistry ⚛, and The Five Stages of AI Grief ⚱.

Olatomiwa Bifarin

Jul 13, 2024

[LLM in Science et al] 🤖 🦠 🧬

[I]: 🏥 CoQuest: Exploring Research Question Co-Creation with an LLM-based Agent

Interesting paper on leveraging LLM-based agents for generating research questions.

I read through the backend implementation of the agentic AI framework, it’s based on AutoGPT.

In brief, The CoQuest system uses an LLM-based agent to generate research questions (RQs). The agent follows the ReAct framework, which adapts the "Think-Act-Observe" framework.

Think: The agent analyzes user input and context to determine the next action, simulating human research methods. It generates a chain of thought before deciding on the action.
Act and Observe: The agent executes the chosen action, such as searching and summarizing related works, hypothesizing use cases, refining RQs, or evaluating RQs through comparison. These actions are performed through API calls and Python functions. The results of the action are added to the context.
Create RQs: Based on the updated context and predefined prompts, the agent generates new RQs.

This iterative process allows the agent to refine and generate RQs based on user input and the results of previous actions.

You can check out the CoQuest demo on the author's GitHub page.

[II]: ⚛ A Review of Large Language Models and Autonomous Agents in Chemistry

I have read a tiny little bit from this review article, but going through the outline it covers a fair range of topics that are even relevant outside of chemistry.

[AI Engineering] 🤖🖥⚙

[I]: 🔖Context Caching for Gemini API

“In a typical AI workflow, you might pass the same input tokens over and over to a model. Using the Gemini API context caching feature, you can pass some content to the model once, cache the input tokens, and then refer to the cached tokens for subsequent requests. At certain volumes, using cached tokens is lower cost than passing in the same corpus of tokens repeatedly.”

[II] 👨‍💻On AI engineering, and AI-failed processes.

This essay discusses the challenges of traditional machine learning (ML) and the potential benefits of LLMs. Traditional ML is difficult because it requires a lot of data preparation. LLMs are easier to use because they require less data preparation. However, LLM projects can also fail if unrealistic expectations are set or if the LLM is not continually monitored and improved.

LangSmith appears to be a tool that is helpful in this regard. I haven’t used the tool a great deal, but I plan to use it in the future for LLM monitoring, testing and debugging (For more on this, see (VI) in this section).

[III] ⛓LangGraph Agents - Human-In-The-Loop (HIL) Breakpoints

I have implemented some HIL while coding agents in LangGraph in the past, but I hard coded it, so I was glad to stumble on this video tutorial, even though it’s not exactly what I need. The tutorial (from folks at LangChain), is about breakpoints, a feature in Langchain agents. Breakpoints allow human approval intervention in the agent's workflow.

The presenter first explains the concept of breakpoints with checkpoints and threads. Then it demonstrates how to code breakpoints in Langraph agents. Breakpoints allow agents to stop before a sensitive action and get human approval to proceed. This is useful for situations where the agent has access to sensitive tools.

[IV] 👮‍♂AI Agentic Workflows And Their Potential For Driving AI Progress

This talk about agentic AI was given by Andrew Ng at the Snowflake Summit 2024.

He discusses the limitations of zero-shot prompting and introduces agentic workflows as a more effective solution. In brief, agentic workflows break down complex tasks into smaller steps, allowing the AI to perform research and revise its work iteratively, resulting in a much better final product.

Ng also talks about Landing AI's recently open-sourced vision agent, which can write code to complete tasks based on a prompt. For example, given a prompt to calculate the distance between a surfer and the nearest surfboard in a video, the vision agent can write code to automatically generate the desired outcome. See the code base and app.

[VI]: 🕵️ Building and Testing Reliable Agents

This presentation articulates for me why I have decided to stick to LangGraph (even though it is not the easiest framework in the world to master) instead of, say, CrewAI.

In the talk, Lance from LangChain, compares two different approaches to build a RAG app: ReAct agent and a custom LangGraph agent. LangGraph agents are a promising approach for building reliable agents, especially for production applications where the control flow can be predetermined. However, ReAct agents may still be preferable for open-ended tasks that require a high degree of flexibility.

[AI X Industry + Products] 🤖🖥👨🏿‍💻

[I]: 🦿Elon Musk on Humanoid Robot Market

“Elon Musk says the Market for Humanoid robots like the Tesla Optimus Robot will be 1 Billion units per year, and Tesla can make $1 Trillion a year selling them. He predicts that they will ultimately sell for around $20,000 at a cost of $10,000, and Tesla will take at least a 10% market share of the humanoid robot market. Pretty impressive and the world is about to change. Musk believes that there will be at least 1 robot per human and perhaps much more than that.”

[II]: 🔊Eleven Labs: Text to Sound Effects

“Text to Sound is here. Our newest AI Audio model can generate sound effects, short instrumental tracks, soundscapes, and a wide variety of character voices, all from a text prompt.”

I have been using this feature to power my video essays. I love it.

[III]: 📹Runway ML: Gen3, Text to Video

I tried the earlier versions of Runway’s text to video model, and I was left disappointed that I had to cancel my subscription. Now they have a new model out: Gen 3. I am yet to try it but I hope it’s great, especially considering it doesn’t appear OpenAI’s Sora will be available for the general public anytime soon.

[IV]: 🎧Deep Mind: Video to Audio

One important protocol that I have been trying recently that brings my video essay alive is generating audio such as sound effects to add to the video (as I have noted earlier). It turns out that Deep Mind has built a model for generating audio, including dialogues from videos, even though they might not be releasing it soon, if at all. My takeaway is that given all these building blocks it only a matter of time that we get a reliable essay to video, or, who knows, a script to film model soon.

Brave new world.

[V]: 🏥OpenAI push into Healthcare

“OpenAI is working with startup Color Health to expand the use of artificial intelligence in healthcare by applying its AI models to cancer screening and treatment.”

[AI + Commentary] 📝🤖📰

[I]: 👝LLMs won’t Lead to AGI

This podcast is about a new benchmark and prize for AI research. Francois Chollet, the guest, argues that current LLMs are not capable of achieving true intelligence. He proposes a benchmark called AI2 Reasoning Challenge (ARC), which is designed to be resistant to memorization and focus on the ability to solve novel problems.

The highlight of the discussion for me was teasing apart memorization from intelligence. It turns out some of what I will call intelligence is actually more of memorization, if we go by Chollet’s definition of intelligence (and I think he is right).

Some key takeaways:

Current LLMs are good at memorizing information but lack the ability to reason and solve new problems.
Chollet believes that true intelligence requires the ability to adapt to new situations and learn from small amounts of data.
The ARC benchmark is designed to test these abilities by presenting AI systems with novel puzzles that they have not seen before.
A $1 million prize is being offered to the first team that can solve the ARC benchmark.

In brief, Chollet believes that solving ARC will be a major milestone on the path to achieving artificial general intelligence (AGI).

Link to the ARC dataset.

[II]: 🕴How to Win At Enterprise AI

This essay is one of the most deeply researched and insightful content I am sharing in this newsletter. Although it takes a while to read it, I have not finished it myself, but it’s very good.

In brief, the author discusses the challenges and opportunities that exist in enterprise AI. Enterprises are unsure where to invest in AI, but AI has the potential to improve workflows. The author argues that the key to success is focusing on workflow capture, not just merely automation. AI can perform many tasks that people currently do, including knowledge work and managerial work. This will change the way enterprises operate. There are challenges, however, such as ensuring AI performs well in terms of accuracy and speed.

[III]: 💻Five Stages of AI Grief

This essay is a bit long, I didn’t finish but it’s of high quality. I do not seem to agree with some of the author's talking points, but it’s hard not to appreciate high quality thinking and writing when you come across one. Here are a few sections that I find quotable.

“Since the paleolithic cognitive revolution, human intelligence has artificialized many things — shelter, heat, food, energy, images, sounds, even life itself — but now, that intelligence itself is artificializable.”

“The stages of AI grief do not go in any order. This is not a psychological diagnosis; it is mere typology. The positions of real people in the real world don’t stay put inside simple categories. For example, AI Denial and AI Anger can overlap, as they often do for critics who claim in the same sentence that AI is not real and yet must be stopped at all costs.”

“As a kind of populist variation of the Turing Test, it compares human experience to a machine’s and concludes that the obvious differences between them are the precise measure of how unintelligent AI is.”

“The premise is that modern governments as we know them are the executives of the transformations to come and not an institutional form that will be overhauled if not absorbed by them. For better or worse, the latter scenario may be more plausible.”

[IV] 🎙 Podcast on AI and GenAI

A few of the podcast episode picks I listened to over the past few weeks.

The Epsilon

Discussion about this post