Submissions | VizChitra 2026
Sarkaar Data Nahi Deta: Constructing Visualizable Datasets from Indian Government Data
Karnav
Associate•Boston Consulting Group (BCG)
Description
What is your session about,
This session is about the variety of data and information with varying degrees of usability that the Indian government publishes across different platforms, and how the community can turn it into useful datasets, narratives and insights.
For two years as an undergrad at Ashoka, I worked with their Centre for Digitalization, AI and Society on a broad project to take various Indian government platforms and extract meaningful data from it for social science research. This has given me deep context into a series of problems and interesting points which are fertile ground for the community to come together to discuss and solve:
Indian open data is fragmented, it's process to publication is opaque and arbitrary, and often it is made available on the internet but not at all accessible.
The key constraint to making this data accessible is often UI/UX and visualization: by definition these are massive datasets, often underprocessed, and difficult to visualize. How can this data be visualized and presented in a way that the common Indian can consume?
OVer the past few years, platforms like Data for India and Ashoka's Center for Economic Data and Analysis have made massive progress in visualizing key Indian datapoints and crafting narratives. Often, their analysis has been limited by the amount of work it takes to make obscure datasets usable and visualizable. How can their approach be expanded to more complex datasets require lots of work and compute? Is an open-source approach possible?
My particular focus area is on data with a geospatial aspect: land use, environmental clearances, real estate and urban governance and expenditure, as that's the data that I have worked with extensively.
In particular, I'd like to anchor the discussion on two rich sources of open data published by the government which we were able to compile and generate useful insights from: the BhuNaksha platform (https://app4bhunakshaodisha.nic.in:8443/bhunaksha/) holding plot-by-plot land ownership and characteristic data, and the Parivesh platform (https://environmentclearance.nic.in/proposal_status_new1.aspx) holding records of all environmental clearances applied for and granted to industrial projects in India. I have been able to scrape, clean, match, process and visualize these datasets to some success and can discuss the learnings that came from the projects - and the next steps for more data like this
These topics are relevant to data viz because there are vast collections of data that the government is willingly providing to the common Indian, with the primary barrier to their consumption being our inability to convert large tables into easy-to-consume graphs, exploratory websites and factoids. Data viz has the power to solve this problem, to the benefit of both the community and the nation
I think the ideal way to structure the session is as an introduction and short talk, followed by a guided discussion, and an interactive activity, yielding to free-flowing discussion (either open floor or in groups)
5-10 mins: Introduction, context-setting and scope of discussion 5-10 mins: Elaboration on problem and case studies (BhuNaksha, Parivesh, Data for India and 1-2 examples outside India) 10-15 mins: Guided discussion on people's experiences working in the domain, problems and solutions, and opinions on the path going forward 10-20 mins: Interactive activity (Stickies and cards to be handed out with participants to mark their opinions on most interesting Stakeholders, Problems and Domains, Approaches, and Tools/Platforms) followed by discussion
The intended audience is primarily data journalists, as well as professionals and hobbyists in policy, open-source data, and across the social sciences and impact ecosystem interested in Indian government-scale data and evaluation
==============
This session is on two case studies on the use of land and environmental records with varying degrees of usability that the Indian government publishes across different platforms, and how it can e turned into useful datasets, narratives and insights.
For two years as an undergrad at Ashoka, I worked with their Centre for Digitalization, AI and Society on a broad project to take various Indian government platforms and extract meaningful data from it for social science research. My particular focus area was on data with a geospatial aspect: land use, environmental clearances, real estate and urban governance and expenditure.
In particular, I'd like to anchor the discussion on two case studies on rich sources of open data published by the government which we were able to compile and generate useful insights from: the BhuNaksha platform (https://app4bhunakshaodisha.nic.in:8443/bhunaksha/) holding plot-by-plot land ownership and characteristic data, and the Parivesh platform (https://environmentclearance.nic.in/proposal_status_new1.aspx) holding records of all environmental clearances applied for and granted to industrial projects in India. I have been able to scrape, clean, match, process and visualize these datasets to some success and can discuss the learnings that came from the projects - and the next steps for more data like this.
I'd like to talk about the challenges in scraping and processing this data, navigating the conflicting sources of information about the platforms; interacting with government officials on the ground in charge of collecting, digitizing and uploading the data onto the platform; common features and problems with Indian government platforms and our learnings from this undertaking. I'd also like to talk through the specifics of this domain a little bit and discuss the newly available rich data that is emerging publicly from the government
These topics are relevant to data viz because there are vast collections of data that the government is willingly providing to the common Indian, with the primary barrier to their consumption being our inability to convert large tables into easy-to-consume graphs, exploratory websites and factoids. Data viz has the power to solve this problem, to the benefit of both the community and the nation
The intended audience is primarily data journalists, as well as professionals and hobbyists in policy, open-source data, and across the social sciences and impact ecosystem interested in Indian government-scale data and evaluation