Methodology

How we built the dataset behind the Famous People Map

Data Source

Our dataset is derived from the cross-verified database of notable people, 3500 BC – 2018 AD, a peer-reviewed dataset published in Scientific Data (Nature, 2022) by Laouenan, Bhargava, Eyméoud, Gergaud, Plique & Wasmer.

The researchers assembled 2.29 million unique notable individuals by cross-referencing Wikidata with seven language editions of Wikipedia (English, French, German, Italian, Spanish, Portuguese, and Swedish). Information was deduplicated and cross-verified across sources to minimize errors, with manual checks confirming error rates well below 1% for the most documented individuals.

How Notability Is Ranked

Each person in the database receives a notability index computed from five dimensions:

  1. Wikipedia editions — number of language editions with a biography
  2. Biography length — total word count across all available biographies
  3. Page views — average annual views (2015–2018) across all editions
  4. Completeness — number of non-missing fields (birth date, gender, occupation)
  5. External references — total external links and sources from Wikidata

The quantile value for each dimension is computed, and these five quantile scores are summed to produce the final notability ranking. This composite approach balances global recognition (page views, editions) with depth of documentation (biography length, references, completeness).

Note: The ranking is designed for comparing groups and distributions — not individual head-to-head comparisons. Contemporary figures naturally score higher on page views, and certain fields (culture, sports) are overrepresented relative to business or governance.

What We Show on the Map

To create the interactive map, we applied three processing steps to the original 2.29 million records:

1

Filter to geolocated records

Only individuals with known birth coordinates (latitude & longitude) are included. This reduced the dataset from 2,291,817 to 1,704,190 records.

2

Deduplicate by location

Many individuals share the exact same birth coordinates (typically at city level). When multiple people map to the same point, we keep only the highest-ranked person per coordinate to prevent visual clutter.

3

Final dataset

After deduplication, 152,953 unique individuals appear on the map — each representing the most notable person born at that specific location.

Step Count
Total records in source database 2,291,817
Records with birth coordinates 1,704,190
Final unique-coordinate records (on map) 152,953

Occupation Categories

The original research classifies individuals into domains of influence. We group these into the profession categories shown in the map's filter:

Athletes — Football, Baseball, Basketball, etc.
Politicians — Politician, Activist, Minister, etc.
Actors — Actor, Director, Presenter, etc.
Writers — Writer, Novelist, Playwright, etc.
Musicians — Singer, Composer, Musician, etc.
Scientists — Physicist, Chemist, Biologist, etc.
Artists — Painter, Sculptor, Photographer, etc.
Academics — Professor, Historian, Philosopher, etc.
Religious — Priest, Bishop, Missionary, etc.
Military — Officer, Commander, Soldier, etc.
Jurists — Lawyer, Judge, Diplomat, etc.
Business — Entrepreneur, Merchant, Manager, etc.
Poets — Poet, Lyricist, etc.
Nobility — Aristocrat, Monarch, Sovereign, etc.
Explorers — Explorer, Inventor, Engineer, etc.

Citizenship

Country assignments are determined by cross-verifying citizenship data from Wikipedia and Wikidata. Historical entities (e.g., "Russian Empire", "Ming Dynasty") are mapped to their modern equivalent countries. When Wikipedia and Wikidata conflict, priority is given to structured Infobox data. In 95% of cases, both sources agree.

Citation

If you use or reference this data, please cite the original research:

Laouenan, M., Bhargava, P., Eyméoud, J.‑B., Gergaud, O., Plique, G. & Wasmer, E. "A cross-verified database of notable people, 3500BC-2018AD." Scientific Data 9, 290 (2022). doi.org/10.1038/s41597-022-01369-4