In this newsletter:
[LLM/AI for Science et al] 🤖 🦠 🧬
[I]: Will AI Become our Co-PI
It's already clear that knowledge works are going to undergo a massive restructuring, many (most notably in coding) are already well under way. And knowledge work does not preclude running a lab & doing science. This perspective article asked the question "Will AI become the de-facto lab Co-PI?"
The authors explore the transformative potential of LLMs in biomedical research. They argue that AI is poised to revolutionize the lab by taking over rote, "aggregation-type" tasks like literature reviews and data extraction, which currently consume a significant amount of researchers' time. This automation promises not only to accelerate data collection and reduce human error but also to free up trainees and principal investigators (PIs) to focus on higher-value "synthesis-type" work, such as experimental design, critical thinking, and interpreting complex findings. The authors suggest that by handling the burdensome aspects of research, AI can help realign scientific efforts toward more meaningful and innovative discovery.
Furthermore, they propose a future where AI evolves from a mere tool for aggregation into a true "Co-PI." They also wrote about AI's growing capabilities in "synthesis-type" research, including its use in drug discovery and its potential to guide robotic systems for autonomous experimentation. The article posits that LLMs, having processed vast amounts of scientific literature, could generate novel hypotheses by identifying patterns and connections that are beyond human capacity to see. While acknowledging significant risks (such as the "black box" nature of AI, data privacy concerns, and the potential for propagated bias) the authors advocate for a framework of cautious optimism and responsible integration.
[II]: Personal Health LLM for Sleep and Fitness Coaching
It is really worth stressing that the base model used for fine tuning in this case is Gemini Ultra 1.0, imagine the life-changing capabilities when these methods are paired with next-generation models. Link to publication.
Some details: Inputs are text encodings of demographics plus up to 30 days of daily and aggregated metrics; for fitness, per‑exercise logs and synthetic readiness notes are included. For Patient-Reported-Outcomes (PRO) prediction, 20 wearable features over at least 15 days per participant are summarized (20×15 matrix) with binary targets from 16 PROMIS sleep items. The model builds on Gemini Ultra 1.0 via two stages: (i) full‑model supervised finetuning on expert‑curated long‑form case studies, and (ii) training a small multilayer‑perceptron adapter that maps sensor‑feature summaries into PH‑LLM’s latent token space for native multimodal conditioning during classification. Case‑study targets are expert texts (insights, etiologies, recommendations); PRO outputs are per‑item binary predictions evaluated by AUROC/AUPRC, where the adapter outperforms prompt‑only baselines and is competitive with logistic regression, while a CNN on limited data underperforms. The paper fills a gap: general/medical LLMs rarely exploit continuous wearable data for individualized coaching, and benchmarks/rubrics for this setting have been lacking; the authors contribute datasets, rubrics, and a rigorous human/auto‑evaluation framework.
[III]: Publishing for Machines
In this thought-provoking essay, Andrew White highlights the urgent need to reinvent scientific publishing as AI agents become both prolific writers and readers of research. With over 10 million papers published annually and more to come from machine-driven science, traditional systems—relying on PDFs, locked metadata, and slow human-centric processes—are no longer sustainable. White envisions a future where publications are machine-accessible, emphasize hypothesis-driven narratives, provide "raw-ish" data, and encourage rapid, low-latency updates, making science more dynamic and transparent. He stresses the value of keeping human perspective in the loop, not just reducing papers to data and code, and sees automated review and AI-powered curation as vital for handling the tidal wave of new work. Established citation systems may give way to machine-first platforms, challenging traditional measures of impact and making it harder for researchers to navigate the swelling scientific corpus. Ultimately, he calls for community-driven, open, and adaptive structures—learning from experiments like arXiv and Wikipedia—to ensure science remains a robust, accessible dialogue for both machines and humans in the AI era.
[AI/LLM Engineering] 🤖🖥⚙
[I]: Open Deep Research
LangChain introduces an open-source deep research agent designed to conduct thorough research on a given topic. The video demonstrates the agent's capabilities by planning a hypothetical trip to Amsterdam and Norway. The agent is able to ask clarifying questions and conduct parallel research on subtopics, showcasing its ability to break down a complex problem into smaller, manageable parts. The video also highlights the agent's three-phase architecture: scoping the problem, research, and writing the report.
There were also discussions around the technical aspects of the agent, explaining how to run it locally and customize its features. Users can configure the agent to use different search tools, connect to various servers, and select different models for each step of the research process. The user interface used for the demonstration, the Open Agent Platform, is also introduced as an easy way to try out this powerful research tool.
[II]: Deep Agents
Deep Agents, a next-generation Python package recently released by LangChain empowers developers to build deep agents, intelligent assistants that go far beyond simple tool loops. Unlike typical agents that perform quick, repetitive tasks, Deep Agents are crafted to handle complex, multi-step workflows by leveraging four core innovations: a detailed system prompt guiding behavior, a planning tool that helps structure long-term tasks, support for sub-agent delegation to manage task complexity, and an integrated virtual file system for persistent memory and state management. Inspired by Claude Code’s architecture yet designed to be broadly customizable, the deepagents package allows you to configure your own agents with custom prompts, tools, and sub-agents—opening the door to powerful, domain-specific automation for research, coding, analytics, and more. See, blog, video intro, technical walkthrough, codebase.
Deep agents UI.
[III]: How Harvey Built Reliable AI Agents
One of the domain-specific AI startups that caught my eye at the beginning of this LLM revolution is Harvey. Harvey VP of Engineering gave this talk at the latest LangChain Chain Conference. In brief, he talked about the complexities of building and evaluating legal AI. He outlined the challenges inherent in this field, such as the intricate nature of legal documents and the high stakes involved, where errors can have significant career repercussions. Wilds emphasized that the quality of legal AI is often subjective and nuanced, making automatic assessment a difficult task. Furthermore, the confidential nature of legal work presents obstacles to obtaining reliable data and feedback.
To address these challenges, Harvey has developed a multi-faceted approach to product building and evaluation. He explained their three primary evaluation methods: human preference judgments, model-based auto-evaluations (LLM-as-a-judge) using their internal "Big Law Bench," and a step-by-step evaluation for workflows and agents. He stressed the importance of combining rigorous evaluation with human judgment and taste. For his hot takes, he highlighted the need for "process data" – information on how tasks are actually performed within firms – to drive future advancements in agentic systems.
[IV]: Applying Context Engineering
Here Lance from LangChain unpacks “context engineering”, the art of feeding AI agents just the right information at the right time to maximize performance while avoiding failure modes like context poisoning, distraction, and clash. Building on Drew Breunig’s “How to Fix Your Context,” Lance demonstrates six techniques in LangGraph: RAG for targeted retrieval, Tool Loadout for on-demand capability selection, Context Pruning and Summarization to cut bloat, Context Offloading to external memory for recall across sessions, and Quarantine to split work into multiple focused agents. Along the way, he shares code examples, performance considerations (like token creep), and trade-offs—cautioning that compression risks information loss, and multi-agent setups can lead to conflicting outputs unless tasks are loosely coupled. The takeaway: these six strategies can dramatically improve LLM agent reliability, but thoughtful implementation makes all the difference.
[V]: Opal: Google Labs
"Opal," Google Labs’ is the newest entry into the no-code AI sphere. Here, Sam Witteveen takes us through Opal’s capabilities—a user-friendly app builder that lets anyone create LLM and generative AI workflows using intuitive drag-and-drop tools and natural language. Think of it as Google’s answer to platforms like n8n and Lindy, but with seamless integration of Google’s powerful Gemini models and tools. The video demo showcases how easily users can chain tasks like research, writing, and image generation, personalize workflows by editing nodes, and rapidly build or remix apps for tasks like blog generation or literature reviews. At the time of the demo, Opal is currently in US-only preview, but its drag-and-drop workflow creation, code-free interface, and built-in AI integrations make it a game-changer for automating tasks and prototyping mini-apps. This is a big deal, I have tried it myself and I am impressed.
[AI X Industry + Products] 🤖🖥👨🏿💻
[I]: GPT 5 (OpenAI)
OpenAI has officially launched GPT‑5, its most advanced AI model to date, accessible to all ChatGPT users. According to OpenAI, GPT‑5 delivers substantial improvements in reasoning, coding, language generation, and multimodal integration, supporting extended context windows, enhanced safety via “safe completions,” and a reduction in hallucinated or overly flattering responses. The rollout introduces lighter “mini” and “nano” versions for efficiency, alongside the standard variant, equipping developers and everyday users with a powerful and more grounded AI experience. Watch the live demo here.
Other interesting things: you can use study mode + voice. It is also available in Azure AI foundry. I can’t wait to unleash it in some of my projects.
[II]: Eleven Music
On August 5, 2025, ElevenLabs unveiled Eleven Music, an AI‑powered music generation service that transforms text prompts into full tracks, complete with vocals and instrumentation, ready for use. What sets it apart is its proactive licensing: the platform secured agreements with Merlin and Kobalt to draw from legal repertoires, sidestepping the copyright disputes currently challenging rivals like Suno and Udio. This forward‑thinking approach offers creators a compelling, rights‑safe route to streamlining music production through AI. More here.
I have tried it out, it works like magic.
[III]: chatGPT Agent
OpenAI has unveiled the ChatGPT Agent, a significant step forward in AI that transforms the chatbot into an active assistant capable of performing complex tasks (See Blog, Announcement). This new product can go beyond generating text and can now browse the web, run code, interact with APIs, and work with various applications. It is designed to handle multi-step tasks from start to finish, such as planning events, analyzing data to create presentations.. The agent operates within its own virtual computer environment, allowing it to seamlessly switch between different tools to accomplish the goals set by the user.
While the agent is powerful, OpenAI emphasizes that user control and safety are paramount. The system is designed to ask for permission before taking significant actions, and users can interrupt or take control at any point. This "human-in-the-loop" approach ensures that the AI's actions align with the user's intentions, mitigating risks associated with autonomous systems. The introduction of the ChatGPT Agent marks a pivotal moment in the evolution of AI, moving from a tool for information retrieval to a partner in productivity.
When I got access to chatGPT agent, my first query was to o3(+ search): “Based on what you know about me, and what smart people are using chatGPT agent for, how should I use chatGPT agent, what are the queries I should be sending to it.” It saved me a great deal of time to get started. In my personal opinion, given that this is the first version, the product is a game-changer. However, I suspect that vertical versions of the product will do best for niched workflows.
[IV]: Meta Personal Superintelligence
In a recent essay, Mark Zuckerberg outlines Meta's vision for the future of AI, centering on the concept of personal superintelligence. He expresses optimism that this technology will not only accelerate human progress in fields like science and health but also empower individuals on a personal level. Zuckerberg argues for a future where superintelligence assists people in achieving their own goals and aspirations, enhancing their daily lives and capabilities.
Zuckerberg draws a distinction between Meta's approach and that of other industry players. He advocates for providing individuals with their own personal superintelligence, rather than a centralized system that automates tasks and distributes the results. He envisions personal devices like smart glasses, which can understand our context, as the primary way we will interact with this technology, while acknowledging the safety concerns surrounding superintelligence. He suggests the next decade will be pivotal in shaping the development of this technology, which is a no-brainer for anyone who has been following the AI train.
[V]: chatGPT Study Mode
OpenAI’s new Study Mode, now live, marks a major pivot from vanilla answering questions to nudging towards ‘automated’ understanding. Designed in collaboration with educators and learning scientists, Study Mode steers students through complex topics using questioning, skill-level assessment, interactive quizzes, and structured hints to guide them toward deeper engagement, rather than handing over immediate answers. It’s built with intentional friction to prompt reflection, discourage shortcuts, and reshape ChatGPT into a personal tutor aimed at enhancing critical thinking and retention. I am currently learning some web programming languages and frameworks, and I have been using study mode. I can only say good things about the product at this point.
[VI]: Notebook LM Video Overview
Google’s Notebook LM has just rolled out Video Overviews, a visually rich format that transforms your document notes into narrated slide presentations complete with diagrams, images, quotes, and data visuals. Built to complement existing audio and text formats in the Studio, these new video summaries offer learners different narrative styles, especially helpful when digesting complex studies or workflow processes.
Additionally, the Studio panel has been redesigned to accommodate a lot more functionalities like the mind map, reports, audio overview, and of course now the video review. I use NotebookLM for some of my writings, especially my long essays, and it's great. I recently tried the video overview, it wasn’t perfect for my use cases, but it is a fantastic product, considering it is v1.
[VII]: Runway Aleph
Runway Aleph is a next‑generation, in‑context video AI model that revolutionizes post‑production workflows by enabling transformative edits directly on real footage, including changing camera angles, adding or removing objects, adjusting lighting or style, and recoloring scenes, all from simple text prompts.
Rather than generating content from scratch, Aleph gives you “endless coverage,” letting creators reshape scenes with cinematic flexibility: genre shifts, aging characters, green‑screen masking, or entirely new compositions—all seamlessly integrated.
See for yourself here, pretty mindblowing product if you ask me. I don’t have a Runway subscription at the moment so I haven't tried it out yet, but here is a YouTuber trying it out.
[VIII]: Comet: AI Browser
Perplexity’s bold new AI browser, Comet, is restructuring how we use the web, turning browsing into a fluid, conversational experience. Built on Chromium and as at the time of this writing, currently in limited release, Comet integrates a deeply contextual AI assistant that can summarize content, manage multiple tabs, draft emails, schedule meetings, shop intelligently, and even execute multi-step workflows—essentially acting as a second brain for your online activity. I tried it out and I have to say its agentic efficiency is worthy of praise; the only thing standing it is way, to my mind, is the status quo bias.
[AI + Commentary] 📝🤖📰
[I]: AI as an Ultimate form of Leverage
In this brilliant lecture, OpenAI researcher Hyung Won Chung explores the profound impact of AI, suggesting that its transformative power is widely underestimated. He introduces the concept of AI as a form of leverage. This idea is illustrated through a comparison to the gradual, almost imperceptible growth of a flower, which eventually blossoms into something beautiful and complex. Hyung asserts that AI provides this kind of leverage not only for individuals (or teams) but for humanity as a whole, amplifying our capabilities in unprecedented ways.
He elaborates on this by referencing Naval Ravikant's three traditional types of leverage: human labor, capital, and the more recent additions of code and media. He then presents AI as a new, powerful category of leverage that is rapidly expanding. At the individual level, AI tools like ChatGPT make learning more efficient by simplifying complex topics and reducing the time needed to acquire new knowledge. Looking ahead, Kan envisions AI agents that combine the leverage of both human labor and code, enabling small teams to generate immense value and driving a new wave of innovation in startups. He concludes by highlighting what he believes to be AI's most critical role: accelerating scientific advancement by connecting disparate fields of human knowledge and fostering new discoveries.
[II]: Could AI Slow Science?
Could AI Slow Science? The authors of this brilliant essay argue yes and for 'good' reasons (with the caveat that it might be field dependent, and definitely not a foregone conclusion). AI is already poised to restructure the scientific enterprise, which begs the question, what exactly is being restructured.
They wrote plainly about the production-progress paradox, namely scientific discovery has been slowing down, despite huge literal paper outputs. In fact one theory is that slower progress is indeed being caused by faster production. So how will AI impact this? They wrote: "Most obviously, automating parts of the scientific process will make it even easier for scientists to chase meaningless productivity metrics..."
And there is more to worry about:
In brief, as AI accelerates paper production, it risks cementing established ideas while sidelining novel ones. Poor software practices in science only amplify this, raising the chance of undetected errors creeping into the record. And because AI prioritizes prediction over understanding, it may prop up flawed theories longer than necessary, delaying the kinds of paradigm shifts that drive true progress. Most critically, by bypassing human understanding, AI threatens to erode the tacit knowledge that sustains scientific creativity. And again, in a system already wired to reward paper counts over insight, this could mean even more output, and even less meaning.
A scathing, clear-eyed analysis.
[III]: Eric Schmidt on Software Moat
I recently listened to this podcast, lots of interesting things were discussed. Here is an interesting take on software moat, summarized below:
The modern software moat isn't built on traditional barriers like patents or manufacturing costs, but on the intelligent use of data and speed, he rightfully pointed out. The core concept is a virtuous cycle of learning that becomes a formidable competitive advantage.
A more detailed breakdown:
- The Learning Loop (the engine): This is the process where a company uses its product to gather data from user interactions, instantly analyzes that data to gain insights, and then uses those insights to improve the product or service. This isn't a slow, quarterly review process; it's an automated, instantaneous feedback mechanism. The product literally gets smarter with every user click, search, or interaction.
- Scale (the fuel): This learning engine is most powerful when fueled by a large volume of data, which is why this model is particularly effective for consumer-scale businesses. More users mean more data. More data leads to faster, more accurate learning. This superior intelligence allows the company to refine its product, making it more useful and appealing, which in turn attracts even more users. This creates a powerful network effect where the value of the service increases as more people use it, not just because of user connections, but because the core service itself becomes more intelligent.
- An Unstoppable Advantage (the result): A new competitor entering the market starts with zero data and a "dumb" product. The established company, however, has been learning from millions of users for months or years. Its learning slope is so steep that its advantage is not just in its current features but in its rate of improvement. For a competitor to catch up, they would need to not only replicate the product but also somehow replicate the entire history of learning, which is practically impossible. This creates a deep, defensible moat where the leader's advantage in understanding the customer and the market becomes insurmountable.
[IV]: Large language models as disrupters of misinformation
In a recent Nature Medicine article, Thomas Costello examines the disruptive potential of LLMs in the landscape of medical misinformation. He notes that while society has always relied on intermediaries like doctors, search engines, and social media to process complex information, this has made us vulnerable to polarization and distrust. As people increasingly turn from traditional sources to AI for convenience, Costello pushes back against pessimism, arguing for cautious optimism. He posits that unlike previous technologies that democratized information production (allowing anyone to have a voice), LLMs democratize the synthesis of knowledge, distilling information from vast libraries of expert documents rather than just the loudest headlines.
Costello’s optimism is rooted in the unique ability of LLMs to show users the evidence and reasoning behind expert conclusions, which could help bridge the "epistemic fragmentation" that divides society. He cites compelling evidence, including randomized trials where AI chatbots significantly increased vaccine uptake in Argentina and the US by engaging with user concerns directly. While acknowledging the significant risks of AI, such as bias, "hallucinations," and potential for misuse, Costello believes these are solvable problems.
[V]: AI As Profoundly Abnormal Technology
I have covered in this newsletter an approximation of the two prominent views on AI projection. Folks at AI Future Projects, and the authors of AI as a Normal Technology.
Here is a debate between members of the two camps from back in May hosted by Americans for Responsible Innovation. There are lots of subject matter discussed, but in brief, Sayash Kapoor (of AI as a Normal Technology) and Eli Lifland diverge significantly in their views on the pace of AI progress and its impact on work. Sayash envisions a gradual evolution driven by real-world deployment, feedback loops, and slow diffusion, with AI augmenting rather than replacing human roles over decades. He emphasizes that meaningful adoption requires experimentation, infrastructure, and trust, which inherently takes time. In other words, complementary AI innovation is not going to be a cake walk. In contrast, Eli (from AI Future Project) anticipates a faster trajectory, driven by increasing data efficiency and simulation capabilities, leading to AIs that rapidly acquire new skills and eventually outperform humans across most cognitive domains (ASI). While Sayash sees a long-term redefinition of work involving human oversight and system design, Eli views such roles as transitional, expecting that even verification and direction tasks will ultimately be automated.
Another, even more detailed resource is a recently published essay, "AI As Profoundly Abnormal Technology," by Scott Alexander (of AI Future Project). He argues against the view that AI's development and diffusion will be slow and manageable, akin to previous technological shifts. He directly counters the "AI As Normal Technology" (AIANT) team's thesis by presenting evidence of AI's uniquely rapid and widespread adoption. Alexander points to the massive, institution-by passing uptake of LLMs by professionals in medicine, law, and programming as proof that diffusion is happening at an unprecedented rate, driven by the most aggressive adopters rather than the most cautious institutions. He refutes the idea of strict "speed limits" on AI progress, asserting that capabilities like data efficiency are parameters that will improve, not fixed barriers, and that there is no logical reason to assume human-level performance in complex tasks like forecasting represents a theoretical peak.
Alexander extends this argument to the critical issue of safety, contending that the AIANT team's focus on control measures like auditing and monitoring is dangerously inadequate. Using the "Mossad vs. not-Mossad" cybersecurity analogy, he writes that a superintelligent AI is not a standard threat and cannot be managed with simple safeguards; it requires a focus on "alignment" to ensure it doesn't become an adversary in the first place. He strongly criticizes the dismissal of catastrophic outcomes as overly "speculative" or "non-immediate," comparing this stance to the failed reasoning of those who initially downplayed the COVID-19 pandemic. Alexander concludes that even if transformative AI is decades away, the magnitude of the challenge demands that we act now, viewing it as our generation's responsibility to prepare for what may be the greatest crisis humanity has ever faced.
[VI]: The Tiny Team Playbook
Folks at latent space put this Tiny Teams Playbook together from their recent AI engineering conference, essentially outlining the next major evolution in organizational design for the AI era. They define "Tiny Teams" as small, highly efficient groups that leverage AI to achieve outsized results, aspiring to a metric of having more m (millions) in Annual Recurring Revenue than employees (m ARR>employees). These nimble teams become the key to speed, resilience, and adaptability. This shift prioritizes inter-human trust and efficiency over sheer size, making it a critical new model for both startups and large corporations.
Drawing lessons from a survey of top-performing tiny teams like Gamma, Gumloop, and Bolt.new, they compiled a playbook of their shared principles. In hiring, the focus is on a small crew of senior generalists, selected through rigorous paid work trials and offered top-of-market salaries to foster a culture of low ego and high trust. Operationally, these teams run on almost no meetings, embrace ruthless prioritization ("Let Fires Burn"), and use AI agents as a "Chief of Staff" to automate tasks. Their product and tech philosophy is similarly streamlined, favoring simple, modular tech stacks and using internal benchmarks not only to improve but also to market their products effectively.
[X] 🎙 Podcast on AI and GenAI
(Additional) podcast episodes I listened to over the past few weeks:
Please share this newsletter with your friends and network if you found it informative!