This is the inaugural newsletter of The Epsilon, and I have decided to call it the Pulse, in short, The ε Pulse. The goal here is to share contents I find enjoyable in the world of AI, machine learning and data science. The content covered might evolve over time, but for now, I will be covering AI/ML + Bio, industry news about AI, general data science, and insightful commentaries on AI and the revolution that is underway. My goal is to publish an issue of ε pulse once a month. Also, if you have not, feel free to check out my first illustrated essay on this blog, it’s a primer on machine learning, well-suited for beginners.
[AI + Bio] 🤖 🦠 🧬
[I]: ♂Explainable AI meets Metabolomics: discriminating biological sex with urine.
Perhaps it’s befitting to start this newsletter with one of my recent papers. In this work, I utilizied Tree-based SHapley Additive exPlanations (Tree SHAP) to interpret machine learning classification of biological sex using urine-based metabolomic data (amongst other things). Testosterone glucuronide was the most important metabolite for discriminating between male and female in this analysis. This work has been accepted for publication in PLOS one, and should be out in a week. In the meantime, here is the manuscript on bioarxiv.
[II]: 🦠Characterizing the interaction conformation between T-cell receptors and epitopes with deep learning.
This study focuses on creating a new computational method to predict and understand the interactions between T-cell receptors (TCRs) and epitopes, which is crucial for advancements in immunotherapy and antigen discovery. The researchers developed a deep-learning-based framework called TCR-Epitope Interaction Modelling at Residue Level (TEIM-Res), which uses TCR and epitope sequences to predict their interactions at the molecular level. To overcome the issue of limited data, the team employed a few-shot learning strategy that combines sequence-level binding information. The model demonstrated good prediction performance and potential applications, such as analyzing mutant TCR-epitope pairs, identifying key contacts, and discovering binding rules and patterns. Overall, this new model can help gain a better understanding of TCR-epitope interactions and their underlying binding mechanisms.
And why is this important: Unlocking the immune system's secrets is vital for tackling autoimmune, cancer, & infectious diseases. Demystifying T cell biology is crucial for developing safer & more effective cell therapies like CAR-T.
[III]: 🧪Enzyme function prediction using contrastive learning.
A new AI tool, CLEAN, developed by a team of researchers led by Huimin Zhao at the University of Illinois Urbana-Champaign, has shown the ability to predict enzyme functions based on amino acid sequences with enhanced accuracy and reliability compared to existing methods. By employing contrastive learning for making predictions, CLEAN offers potential benefits to various fields, including genomics, medicine, and industrial materials. The tool will allow researchers to conveniently input sequences and obtain results. Future plans for the team include expanding the AI's capabilities to characterize other proteins and refining its algorithms to suggest appropriate enzymes for specific reactions. See original paper.
[IV]: 💊 Diffusion steps, twists, and turns for molecular docking.
A team of MIT researchers has developed a molecular docking model called DiffDock, which utilizes diffusion generative models to accelerate drug development and reduce the likelihood of adverse side effects. The model represents a paradigm shift in computational drug design. Traditional molecular docking methods can be time-consuming and costly, with most drug candidates failing clinical trials. DiffDock takes a "blind docking" approach, generating multiple binding sites on proteins, allowing for more accurate and efficient drug development. Combined with protein folding techniques, DiffDock could enable large parts of the drug development process to be conducted in silico, identifying potential off-target side effects before clinical trials take place. This innovative approach has the potential to revolutionize biological research and drug discovery, making drug target identification more accessible and cost-effective. See original paper.
[Data Science] 📊ℹ💻
[I]: ℹPassive Aggressive Classifier
I have been working lately on applying autoML to my work; so I have buried my head in all sort of automated ML strategies (Bayesian optimization, evolutionary algorithm, meta learning) and their respective libraries. Anyways, I built an ensemble the other day and I saw that it contains something called `Passive Aggressive` Talk about surprise. It was the first time I came across the algorithm. And how does it work? It works by updating its weights iteratively, adjusting them based on the margin between the predicted class and the true class while maintaining a balance between aggressiveness and passivity. The algorithm is considered passive when the prediction is correct, and it becomes aggressive when the prediction is incorrect, making larger updates to the weights to minimize the loss. Here is the original paper from 2006.
[II]: 〽Observable Plot Network Graphs
I have always wanted to pick up visualization with Observable Plot, but I have never had the time. It’s a really good data viz library. Here is video tutorial on Hands-on Visualization. Also see an introduction to Graph visualization. Absolutely stunning!
[AI X Industry] 🤖🖥👨🏿💻
[I]: 🏢How A.I. businesses will scale and succeed
A.I. innovation is surging, but long-term utility remains unclear. To achieve mass adoption, focus on micro-utility, understand nuance in intent & emotion, and prioritize customer needs over A.I. hype.
[II]: 📹Text2Video
Text2Video-Zero can turn written text into videos without needing any extra training or adjustments. These are precursors to the kind of technology that will revolutionize content creation, making it more accessible and affordable for everyone. On the flip side, there will be misuse. See also Runway’s Gen2.
[III]: 👨Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI
A 2.5 hours chat between Lex Fridman and Sam.
[IV]: 🏥Microsoft Research head Peter Lee on the applications of GPT-4 in medicine and life sciences
Peter Lee discussed the potential of OpenAI's GPT-4 in revolutionizing the healthcare sector. Lee believes that GPT-4 could increase efficiency, empathy, and boost biomedical research. The AI-powered tool could support diagnoses, improve doctor-patient conversations, and reduce paperwork. Despite its limitations, such as providing false responses or subtle mistakes, GPT-4 is capable of correcting its errors when asked to review its output. Microsoft subsidiary Nuance is already incorporating GPT-4 into a medical note-taking system. Other potential use cases include generating orders for lab tests, prescriptions, and aiding in data interoperability. However, understanding GPT-4's underlying algorithms and training process remains a challenge for researchers.
[AI + Commentary] 📝🤖📰
[I]: 🤖 Tyler Cowen: Existential risk, AI, and the inevitable turn in human history
Tyler argues that we are entering "moving history" with AI, we face uncertainties & challenges. He says we should embrace the tech for its benefits & to tackle existential risks, and that society's resilience & self-confidence are key in confronting a future shaped by AI. Not sure I agree with all his argument. Mark Tegmark takes a different position. My intuition/position is closer to Tegmarks'
[II]: 🧑🏭Goldman Sachs AI report on the labor market
Commentary on the recent AI report from Goldman Sachs: An interesting take for me was the author’s projection on cost cutting strategies in the new era: middle-tier employees might be targeted for cost-cutting, with AI-enabled junior workers collaborating with a much leaner group of senior professionals. Furthermore, there is an anticipated 7% productivity growth leading to a faster-growing economy, generating new demands and job opportunities. However, rapid AI deployment could result in job losses outpacing new role creation (this has largely been my concern which I presented in my essay).
[III]: 🦾AI as an Omni-use technology
See Tweet
[IV]: 🧠Cognition for all
In this essay, the author delves into the transformative impact of large language models (LLMs) and AI in various fields, predicting that in the next few years, AI will become increasingly accessible, running on personal devices, creating art and media, automating mundane tasks, and revolutionizing professions. The author also expresses concerns about potential misuse and the need for responsible development, while acknowledging AI's potential to improve individual processing capacity and drive societal change.
[V]: 😕The A.I. Dilemma
Earlier this year, I wrote out my thoughts on AI highlighting Murphy’s law. But what does a Biochemist know about these things? You could say. Fast-forward ~3 months later and you get this extremely well-articulated presentation on the AI dilemma, illustrating Murphy’s law in the context of what they called “Gollem-class” AIs. Everyone should watch it. The video is the most important content I have shared in this newsletter.