A good chart compresses data. A good insight compresses the chart. This chain of lossy compressions, from dataset to visual to meaning, is what data visualization is fundamentally about.
The question Jaidev has been working on is whether a machine can learn to do it well.
His hypothesis is that the three elements of any chart, the raw pixels, the declarative specification in a language like Vega, and the insights or captions drawn from it, are not three separate things. They are one thing seen from three angles. He calls this the chart triplet. A skilled analyst with any one of the three can construct reasonable approximations of the other two. The question is whether a machine can learn to do the same.
This talk walks through a series of experiments testing that hypothesis: building dense, unified representations of the chart triplet, training visual and language transformer models to navigate between pixels, specs, and insights, and evaluating whether the outputs clear the bar from generated to genuinely useful.
The problem this is ultimately trying to solve is not generation. LLMs can already build charts. The problem is curation: distinguishing the chart worth shipping from the nine others that are technically fine. Jaidev's bet is that a meaningful representation of the chart triplet is what makes that distinction learnable.
Before the talk, he will release an open dataset of over 1,50,000 charts, each with its Vega spec and a million accompanying captions, along with all code and trained models.