Inferring a webpage's geoprovenance

Geoprovenance Inference Algorithm

This page guesses the country that is primarily responsible for the information on a web page using a geoprovenance inference algorithm. The publisher of each URL that you input in the search box below is geolocated by querying the domain contact from a whois search, by looking up the headquarters of the company on Wikidata and using the domain suffix (e.g. '.uk') as well as three other algorithms to determine where the domain is from. When combined, these six different methods achieve 91% accuracy.

Details can be found in our paper, and reference implementations in Python and Javascript are available in our GitHub Repo. Shilad Sen, Heather Ford, Dave Musicant, Mark Graham, Os Keyes, and Brent Hecht. 2015. Barriers to the Localness of Volunteered Geographic Information. Proceedings of CHI 2015. New York: ACM Press. (** Note that this page shows a simplified algorithm with lower accuracy than the version we used in our paper.)

Enter a URL below or click a suggested URL to see our algorithm in action.

Component algorithms:

Language algorithm: (61% accuracy)

Detects the language of a webpage and looks up the countries most likely to speak it.

WhoIs algorithm: (80% accuracy)

Performs a WhoIs query to obtain the contact address associated with the web domain.

Prior algorithm: (30% accuracy)

Represents the overall likelihood of a country to produce sources.

Wikidata algorithm: (93% accuracy)

Looks up the headquarters country for organizations present in Wikidata.

TLD algorithm: (98% accuracy)

Identifies the country associated with URLs top-level-domains (TLDs) such as .uk, .de, and .cn.

GeoIP algorithm: (45% accuracy)

Geocodes the web server's IP address.

Loading data files...