Categories
Projects

Surf

An Entity Mapping and Resolution System for Indian Names

Please make sure that you have Java installed on your system to run the file. Visit localhost:9099 on your browser to access Surf if it does not load automatically on opening the .jar file.

Surf is a freely available open-source program for entity resolution. It was specifically designed to resolve ambiguous or misspelled name values in datasets by assigning them unique IDs. Surf incorporates string-matching algorithms to cluster these values and a human-in-the-loop system to resolve the clusters. Surf has been used extensively over several years to generate the Individual Incumbency Dataset (TCPD-IID). Anyone can easily upload their own dataset to Surf to perform similar entity resolution and ID assignment.

The name of the tool is a namesake of a popular washing detergent called Surf in India except that our tool is used for cleaning strings (such as names of people and places) instead of clothes!

The following video covers how files are loaded into Surf, the various algorithms used and Surf’s user interface.

Contact us if you are interested in using Surf for other datasets.