[LLM/AI for Science et al] š¤ š¦ š§¬
A Gemma Model Helped Discover a New Potential Cancer Therapy Pathway
Google DeepMind and Yale University recently released Cell2Sentence-Scale 27B (C2S-Scale), a cutting-edge 27-billion-parameter model designed to decode the language of individual cells. Built on Googleās open Gemma family, C2S-Scaleās supercharged reasoning recently predicted a novel drug synergy: the kinase CK2 inhibitor silmitasertib, when combined with low-dose interferon, dramatically boosts antigen presentation in tumor cells, potentially making ācoldā tumors visible to the immune system and paving new paths for immunotherapy. Lab tests confirmed the modelās hypothesis, marking a milestone in biological discovery, where scaled AI models can not only improve existing science, but generate transformative new ideas. See blogā & model. C2S-Scale was trained by pretraining with over 1 billion tokens comprising cell sentences (i.e. ranked gene names from scRNA-seq data), biomedical text and metadata, and then fine-tuned for predictive and generative biological tasks.
AI Designed Phages
Researchers at Arc Institute and Stanford have made a groundbreaking leap by using generative AI models to design entirely new, viable bacteriophage genomesāviruses that infect E. coli. Unlike previous genome projects, this study validated AI-designed phages in the lab, with 16 out of hundreds of candidates proving infectious, and some even outperforming their natural counterparts. Leveraging the Evo 2 genome language model, scientists fine-tuned the system on nearly 15,000 viral genomes to expand biological possibilities far beyond existing species, offering a blueprint for future bioengineering. While the innovation signals exciting potential for customized living systems and synthetic biology, it also raises biosecurity questions, as open-source AI tools could potentially one day be used to build organisms with unprecedented traits. See blog from Asimov.
BenchSci: Map of Biomedical Reasoning
In todayās rapidly evolving biomedical research landscape, BenchSciās ASCEND platform tackles the overwhelming challenge of scientific data overload with a hybrid of AI and knowledge graphs. Instead of relying on generic language models, often opaque, error-prone, and context-blind, ASCEND fuses generative AI with a rigorously structured, human-curated knowledge graph encompassing 400 million entities and over a billion relationships. This creates a transparent, traceable map that helps scientists instantly synthesize findings, verify evidence, and surface meaningful insights across vast, fragmented literature. ASCEND, they say, empowers teams to ask complex biological questions in natural language, like identifying markers in cancer research, and receive evidence-backed answers, with each assertion linked to its original source.
Asta DataVoyager
Asta DataVoyager from AI2, designed to assist in scientific discovery by giving researchers intuitive, trustworthy AI tools for analyzing structured data. Tailored for scientists who face tedious manual data wrangling, the platform, they say, allows users to directly query CSVs, spreadsheets, and more using natural language, receiving concise answers with transparent code, copyable visuals, and clear methods documentation. The Cancer AI Alliance is already piloting Asta DataVoyager for federated cancer research, enabling secure, privacy-preserving analysis across institutions. Researchers can follow up with deeper questions, and results remain reproducible and fully under usersā controlāwhether on the cloud or private servers. In essence, scientific data science automation. I believe Future House has a similar product, data analysis crow, I think.
Predicting the Onset of Diseases
Link to Paper.
[AI/LLM Engineering] š¤š„ā
Agentic Coding and Agentic Software Testing
Agentic coding is revolutionizing software development by deploying AI to accelerate coding, but its reliability remains a key challenge, making robust agentic software testing essential. In his September 17 letter, Andrew Ng highlights that while AI coding agents boost productivity, theyāre prone to introduce subtle bugs, sometimes even security issues or āreward hacking,ā where agents alter test code to pass checks. He notes that while front-end bugs can be rapidly identified and fixed, back-end and infrastructure bugs are much harder to spot and can cause persistent downstream issues if not rigorously tested. Andrew stresses the importance of prioritizing tests, especially deep in the software stack, and encourages leveraging AIās efficiency in generating tests. The big takeaway: as agentic coding takes center stage, agentic testing must evolve alongside to ensure software is both innovative and stable, echoing Metaās mantra: āMove fast with stable infrastructure.ā
Human in the Loop for Deep Agents
Folks at LangChain are building out their DeepAgents library. This time with practical āhuman-in-the-loopā integration, empowering developers to add approval, editing, or custom response steps for AI agents operating sensitive real-world tools. Through clear Python examples, their tutorial shows how interrupt configurations allow humans to accept, revise, or mock tool actions before theyāre executedābacked by robust checkpointing to maintain agent state and ensure flexible oversight. The workflow brings transparency and control to agent operationsāmaking DeepAgents viable for potential real business automation scenarios that demand responsive human supervision.
AI Commerce with Agent Payment Protocol (AP2)
Google Cloud recently unveiled the Agent Payments Protocol (AP2), an open standard designed to empower AI agents to securely and seamlessly initiate payments across a wide array of platforms and payment methods. Co-developed with over 60 industry leadersāincluding Mastercard, PayPal, Coinbase, and SalesforceāAP2 aims to address key challenges like authorization, authenticity, and accountability in agent-led commerce. By leveraging cryptographically-secured digital Mandates and verifiable credentials, AP2 creates an auditable trail for transactions, supporting both real-time and delegated purchases. Its flexible, payment-agnostic framework not only enables smarter shopping experiences and crypto payments, but also lays the groundwork for future B2B applications and third-party agent marketplaces. Looks like the future of Agent Web is shaping up: MCP, A2A, now AP2. Code base.
Orchestrating Agents at Scale with AgentKit
FromOpenAIās 2025 Dev Day: orchestrating agents at scale with its new AgentKit platform. AgentKit introduces a seamless, no-code approach to designing, deploying, and optimizing sophisticated agentic workflows. The platform shines with a drag-and-drop Agent Builder, ChatKit for instant UI embedding, and advanced evaluation tools for real-time tracing and grading of agent performance (evals). AgentKit enables both cloud and local deploymentāexportable as Python or JavaScript. The highlight? Rapid iterationāfrom workflow tweaks to production deploymentāin minutes, and built-in optimization tools powered by real user data and AI-driven graders to ensure agents get smarter at scale. This will likely transform what used to take weeks into a process thatās visually intuitive, auditable, and lightning fast. Read the full blog & agent builder 101.
Shipping with Codex
From OpenAIās 2025 Dev Day: In the session on Codex, OpenAI engineers offer an inside look at how they are leveraging Codex (a powerful AI software engineer) to transform the coding process from ideation to deployment. Codex now thrives in a range of environments from local IDEs to the cloud, simplifying pair programming, refactoring, and code review across platforms like VSCode, Cursor, and GitHub. The team shares real workflows, including how Codex accelerates feature builds, automatically verifies code visually and through tests, and performs thorough code reviews. With nearly all technical staff at OpenAI now using Codex daily, the AI agent is credited for dramatically increasing pull requests and fostering a culture of shipping faster and safer.
Here is an excellent playlist on Codex.
Context Engineering & Coding Agents with Cursor.
From OpenAIās 2025 Dev Day: OpenAIās Cursor team showcased the dramatic evolution of codingāfrom punch cards to the latest in autonomous coding agents. Their keynote traces how context engineering has rapidly advanced, shifting from basic next-action predictions to agents that can plan, search, and edit code across entire projects. With new agent harness optimizations, semantic search, and parallel agent orchestration, Cursor positions itself as more than just an IDEāitās a collaborative platform where AI does the heavy lifting and engineers can focus on invention and creativity. The future preview? they argued, agents that not only write but deeply understand your codebase, automate tedious work, and leave you with more time for the creative challenges.
[AI X Industry + Products] š¤š„šØšæāš»
Sora 2
OpenAI has recently launched Sora 2, its next-generation video and audio generation model that brings major leaps in realism, physics simulation, and user control. Sora 2 can follow multi-shot instructions, create synchronized dialogue and soundscapes, and cinematic scenes. The debut comes alongside an invite-based iOS app powered by Sora 2 at the time of writing, which lets users create, remix, and even insert themselves or friends directly into AI-generated videos with high-fidelity likeness and voice. I recently tried it, and it feels like Tik-Tok 2.0, So. See Samās blog about it. The API is also out.
alphaXiv: The Dialogue Platform for Research Papers
alphaXiv has been on my radar for a very long time now, since last year, but I am just getting the chance to try it out. The platform transforms how scholars engage with preprints by allowing in-line, paper-specific commentary on works hosted on preprint server arXiv. Simply swap āarXivā in a paperās URL to āalphaXivā and youāre able to leave questions, ask for clarifications, highlight methods, or interact with authors and fellow readers. Its goal: to bridge the gap between informal social-media debates and formal peer-review, by offering a structured, public forum where technical clarification, conceptual discussion, and feedback can happen openly. I like it very much, and I believe this should be a blueprint for all academic publications.
ChatGPT Pulse: Your AI-Driven Morning Briefing
Launched by OpenAI in September 2025, ChatGPT Pulse transforms the assistant from purely reactive to proactive. Each night it reviews your chat history, memories, feedback, and if you opt-in, connected apps like your calendar or email, then delivers a curated set of five to ten visual ācardsā the next morning. These cards cover topics personal to you from travel suggestions, workout ideas, quick news digests, to reminders tied to your schedule. The goal: help you start the day informed and focused, rather than buried in a flood of content. I havenāt gotten the chance to use it as it is currently available in preview to only Pro mobile users at the time of writing.
[AI + Commentary] šš¤š°
Sequoia Capital: $10 Trillion AI Revolution
Konstantine Buhler frames AI as a force moving even faster than the Industrial Revolution, compressing over 100 years of change into a handful of years, ushering in a $10 trillion cognitive revolution. The biggest opportunities lie in automating the vast US services sector, currently only 0.2% AI-powered, as startups and innovators race to build specialized, AI-driven solutions. Five immediate investment trends stand out: leveraging uncertainty through AI agents, emphasizing real-world performance over academic benchmarks, harnessing reinforcement learning, driving AI into physical-world processes, and treating compute as the new production engine. Sequoia also identifies five themes shaping the future: persistent memory, seamless AI communication protocols, AI voice, AI security, and open source models. The message is clear: those who embrace the cognitive revolution now will define the next era of transformative business and technology.
Software is Eating Labor
In this a16z LP Summit talk, partner Alex Rampell explores how software is entering a new phase, not just digitizing records, but fundamentally transforming the nature of labor and economic value itself. Rampell shares case studies from industries that moved from filing cabinets to databases, tracing softwareās evolution towards intelligent AI agents that can perform jobs end-to-end. With the US labor market valued at $13 trillion (far outstripping the software sector) he illustrates how outcome-based software is beginning to eat entire categories of labor, from customer support to bookkeeping and compliance. Through real-world examples, including autonomous negotiation and collections calls, Rampell argues that AI isnāt only a cost-saver but expands markets by tackling demoralizing jobs, intermittent demand, and multi-language accessibility.
GDPval
OpenAI has recently introduced GDPval, an evaluation framework measuring how AI models perform on real-world, economically valuable tasks across 44 key occupations in the U.S. economy. By collaborating with seasoned professionals, GDPval crafts and assesses tasks reflecting daily knowledge work in sectors like healthcare, law, engineering, finance, retail, and more. Unlike previous academic benchmarks, GDPval focuses on deliverables derived from actual job products, think legal briefs, engineering designs, or customer support dialogs, and blends human and AI grading to gauge model outputs against industry standards. Recent results reveal that frontier models, especially Claude Opus 4.1 and GPT-5, are rapidly closing the gap with human experts, completing tasks up to 100 times faster and cheaper in controlled settings.
Learning in the Age of AI
In an era when AI-driven optimization shapes how we think about learning, Cosmos Instituteās guest essayist Zachary Gartenberg asks us to look deeper than adaptive algorithms and performance metrics. Drawing on Plato and Augustine, the piece contends that authentic learning is not just iterative improvement, but a transformative postureāa readiness to wrestle with difficulty, assume responsibility, and tie down beliefs with reasons. While machine learning models simulate learning through feedback and error correction, human understanding requires agency and recognition, seeing beyond the frictionless promise of self-optimization apps and adaptive platforms. The essay argues for re-centering education around practices that cultivate inquiry, embrace productive struggle, and foster personal ownership of knowledge.
How People Use ChatGPT
Recent Publication from OpenAI, Duke, and Harvard.
Abstract:
Despite the rapid adoption of LLM chatbots, little is known about how they are used. We document the growth of ChatGPTās consumer product from its launch in November 2022 through July 2025, when it had been adopted by around 10% of the worldās adult population. Early adopters were disproportionately male but the gender gap has narrowed dramatically, and we find higher growth rates in lower-income countries. Using a privacy-preserving automated pipeline, we classify usage patterns within a representative sample of ChatGPT conversations. We find steady growth in work-related messages but even faster growth in non-work-related messages, which have grown from 53% to more than 70% of all usage. Work usage is more common for educated users in highly-paid professional occupations. We classify messages by conversation topic and find that āPractical Guidance,ā āSeeking Information,ā and āWritingā are the three most common topics and collectively account for nearly 80% of all conversations. Writing dominates work-related tasks, highlighting chatbotsā unique ability to generate digital outputs compared to traditional search engines. Computer programming and self-expression both represent relatively small shares of use. Overall, we find that ChatGPT provides economic value through decision support, which is especially important in knowledge-intensive jobs.
The Future of AI Coding
In this insightful discussion from Google I/O Connect China, Sam Witteveen interviews Aja Hammerly, developer relations head at Firebase Studio, on the evolving landscape of AI-powered coding. Hammerly shares her personal journey from initial skepticism to embracing AI tools, highlighting how platforms like Firebase Studio democratize development for both seasoned coders and non-coders through prototype-driven, collaborative experiences. They spoke about practical workflowsāemphasizing that AI coding is not simply āone-shotā prompting but an iterative, pair-programming-like process. Key topics include effective prompting, the role of auto agent mode, integrating tools like MCPs for deeper context, and the future of customizable, user-driven app creation.
š Podcast on AI and GenAI
(Additional) podcast episodes I listened to over the past few weeks:




This article comes at the perfect time, realy highlighting AI's incredible potential in science. It's so inspiring to see models like C2S-Scale generating new hypotheses for cancer therapy. How do you define the 'language of individual cells'? Fantastic work!