Caption This: Teaching Machines to Narrate Charts

Every chart is a chain of compressions. A dataset becomes a visual. A visual becomes an insight. Each step throws away information deliberately, keeping only what matters. The last step, turning a chart into a caption worth reading, is the hardest. And it is the one that language models handle worst.

Jaidev has spent the last year trying to fix that. Not by generating more captions, which turns out to be straightforward, but by figuring out how to curate the right one. The problem is that a model can produce dozens of plausible captions for any chart. Knowing which one is accurate, useful, and meant for this audience, in this context, still takes a human.

His response is an extended PlotCaptions: an open dataset of over 150,000 charts with accompanying captions, built to serve two purposes. First, as a retrieval corpus to ground generation, giving models precedent rather than asking them to guess. Second, as a record of human judgment about what good captions look like, used to train ranking.

The pipeline that emerges is straightforward: retrieve similar charts and their best captions, generate candidates, rank against what humans have consistently preferred. This talk walks through what that system gets right, what it still gets wrong, and where the curation gap remains open.

The dataset, code, and trained models will be released before the talk. The audience is invited to use them, test them, and beat the benchmarks.

Caption This: Teaching Machines to Narrate Charts

About this session

About the speaker