LokDhaba: Acquiring, Visualizing and Disseminating Data on Indian Elections

ACM Compass 2020


Despite the importance of elections in India, the world’s largest democracy, data on Indian electoral outcomes has not been easily available for political analysis in the past. This has been due to the problems inherent in assembling any data archive of social and political data spanning many decades. In this paper, we shed some light on these problems and present some solutions in the context of a system we built called LokDhaba. LokDhaba includes the first freely-available, structured and cleaned data archive on Indian electoral outcomes at the national or state level from 1962 onwards. To build this archive, we overcame the challenges of data scraping, parsing, cleaning, consistency checking and integration between multiple sources, with the help of some novel tools. LokDhaba is being used extensively by political scientists, researchers, journalists and others to better understand long-term electoral trends in India.