From Granite to Generative Computing
IBM’s Foundational Models, Unstructured Document Processing, and Generative Computing.
A couple days ago, I attended the IBM Tech Exchange Dev Day at Georgia Tech on Open Source AI. I stayed for most of the programming, and I ended up learning more than I thought I would.
First, Granite: IBM’s Foundational Models
The workshop unfolded in three chunks. The first was on Granite, a family of open, enterprise-focused foundation models developed by IBM. It was the first time I had heard of Granite, perhaps because I haven’t worked on many projects that required open-source models (apart from a few chemistry foundation models).
And a point was made in an opening presentation with respect to the ‘openness’ of open source models. I hadn’t really thought about this until I listened to Marc Andreessen a while back on TBPN, discussing open source vs closed source AI. I had wrongly assume when they say open source, they meant open ’everything’
Case in point, for example, IBM Granite is more open and transparent than OpenAI's GPT-OSS, especially regarding training data and documentation. So properly speaking, one would say OpenAI GPT-OSS is an open-weight model.
Anyways, you use Granite for what you use LLMs for: they are language models. In the workshop we walked through 5 labs, which you can find here.
In brief, they demonstrate how to perform key NLP tasks. For example: summarize long documents by breaking them down into chapters, summarizing each one, and then creating a final summary from those pieces. The labs also build RAG systems. This begins with a text-only RAG and advances to a powerful multimodal version that can process text, tables, and even images within a PDF, using a vision model to understand the visual content.
Beyond these tasks, the series explores other AI applications. One lab was about entity extraction from text using Granite and Pydantic class-based entity definition. Another workshop shifts to a completely different domain, using a specialized Granite model (Tiny Time Mixer) for time-series forecasting to predict future energy demand.
Second, Docling: “Pandas” for Document AI
This workshop was my favorite, largely because Docling is such a workhorse library. Docling (see paper) can parse a wide range of document formats—including PDFs, images, and Microsoft Office files—into a unified, richly structured format. Over a year ago, I briefly experimented with the unstructured library for a short while, but aside from that, I don’t have much mileage in unstructured document processing.
The first lab built the foundation: converting PDFs into structured, AI-ready formats like Markdown and JSON. It showed how to extract tables, images, and text, and even introduced advanced features such as using vision models to automatically generate descriptions for images. within a document.
The second lab focuses on the critical step of intelligent chunking, preparing the extracted text for RAG. It compares different strategies, such as the structure-aware HierarchicalChunker
and the size-balancing HybridChunker
, and explains how adding contextual information to chunks improves retrieval quality.
Another lab integrated all these concepts into a multimodal RAG system with a unique feature: visual grounding. Using Granite, the system processed text, tables, and generated image descriptions, storing everything in a Milvus vector database. Crucially, Docling preserved the exact location (page number and bounding box) of each piece of information. When a user asked a question, the system retrieved relevant content and generated an answer—while also displaying an image of the original document page with sources visually highlighted. Transparent and verifiable.
I am itching to start a project that leverages Docling.
You can find the workshop on GitHub.
Lastly, Mellea: Framework for Generative Computing.
Much of this workshop was new to me, but first, let’s set up the stage.
Today’s way of working with LLMs is often hampered by ad-hoc prompting and the tendency to treat AI like a conversational partner. Persona-driven interfaces lead to unpredictable and fragile solutions. IBM argues for a shift in perspective: LLMs should be treated as programmable computational engines—just like any other software component.
By thinking this way, we can build more seamless and reliable workflows where models are steered by structured, repeatable instructions rather than brittle prompts.
To advance this vision, IBM introduced the concept of generative computing—a programming paradigm that integrates models into software as deeply as traditional components. Their new open-source library, Mellea, embodies this principle: it allows developers to specify high-level goals and constraints, validate outcomes, and programmatically repair errors. The aim is to move beyond prompt engineering toward processes that are robust, testable, and maintainable. The broader implication is a shift from imperative and inductive programming to generative programming, where software and AI work as tightly coupled systems.
And talking about the actual workshop, the tutorial introduces the core "Instruct-Validate-Repair" design pattern. This pattern allows developers to specify requirements for an LLM's output, validate the response, and automatically attempt to fix it if it fails. For building more complex, modular programs, the workshop demonstrates how to use the @generative
decorator to encapsulate LLM calls within standard Python functions, creating reusable libraries of "generative stubs" that can be composed together.
The tutorial then progresses to more advanced applications, for example, it introduces an object-oriented paradigm through the MObject
pattern and the @mify
decorator. This feature allows developers to turn regular Python classes into "generative objects," giving the LLM structured access to not only the object's data but also its methods, effectively providing the model with a set of tools to operate on the data. Pretty neat.
You can find the tutorials on Mellea in this GitHub Repo.