About Forebears Names
Forebears Names (accessible here and here) is a free service providing access to the largest geospatial database of forename and surname distribution and demographics. It provides the approximate incidence of forenames and surnames produced from a database of 4,044,546,938 people (55.5% of living people in 2014). As of September 2019 it covers 27,662,801 forenames and 27,206,821 surnames in 236 jurisdictions. The geospatial data can be viewed on an interactive map and in table form. Statistics can be viewed in a global, continental, georegional, national and (multi-)regional scope.
Brief History
Forebears Names was introduced to Forebears in June 2012, with the launch of the website. At that time the scope was limited to England, Scotland, Wales and The Channel Islands; covering around 425,000 surnames listed in the 1881 census of The United Kingdom. In April 2013 over 64,000 surnames were added with the addition of data from the Ireland census of 1901. This initial version was presented using HighMaps.
From April 2013 Forebears began the long process of compiling data for the first global surname mapping facility. At this time the most expansive source was Public Profiler's Worldnames project, which covers twenty-six countries with a sample of 300 million people. Due to multiple data sources; differences in formating, writing scripts; and a whole host of other problems, the facility was not launched until September 2014. The data was derived from a sample of 1,587,475,724 people, covering 227 sovereign states, dependencies and territories. An update occurred a few months later, fixing a number of issues and adding a small amount of new data.
Immediately work commenced on an expansive update, including more jurisdictions, more depth and a larger sample. The update was initially projected for February 2016, but tasks always took longer than anticipated and new tasks regularly presented themselves. As there was no standard format to the data sources being used, extracting them and arranging them in a universal manner took up to two months in the case of one country.
Once the data was compiled for the update it took a further six months to correctly assign individuals to a identifiable administrative divisions. This was owing to the source data being from hundreds of individual sources that didn't use a universal way of denoting location.
A further two months were required to re-build the geospatial statistics generation script, a new website and mapping interface. The second version of global surname mapping was released on the 5th of September 2018. This build was from a sample of 3,936,342,242 people, covering over 26,211,602 surnames and 236 jurisdictions. This update saw the visual presentation of data move to Leaflet. This update saw the addition of first level administrative divisions for many countries, which was later bolstered with second level administrative divisions.
Work on another major update commenced in December 2018, focusing on the addition of forename data. It took longer than expected owing to using disparate data sources that took time to fashion into a single format. This is the current version of th project, produced from a global sample of 4,044,546,938 people; and covering 27,662,801 forenames and 27,206,821 surnames in 236 jurisdictions. This was released in late August 2019 and included
In early 2019 Forebears began a process of adding demographic data for surnames, such as the distribution of religious faith and average income. This was added for forenames in August 2019.
Going forward the emphasis of the project is to increase the data sample, develop and expand an API that predicts demographic factors from a name input, more demographic data and to a lesser extent add historic data.
Forebears Names data has been used by publicly traded companies, banks, national security contractors, marketers, The Federal Reserve and has been cited in over 60 academic studies.
Process
The creation of geospatial data for names has three stages: the extraction of data from sources and conversion to a universal database format, sanitising the resultant databases and referencing people to geographic regions and the compilation of the geospatial data itself.
1) Database Creation
The first stage in the process of producing geospatial data is the importation of data sources (of which there are over 350) to an individual database table in a universal format. The basic format is: forename, middle name and surname. The source data has come in many formats. Some easy to import, such as CSVs, Excel spreadsheets, database dumps and standalone databases. While others have been problematic, specifically PDFs, of which around 40-50 million pages have been parsed. The character encoding of each source is checked and if need be converted to UTF-8, which is the encoding used for all data.
2) Sanitisation and Geospatial Referencing
Once a source has been imported to a database table, various facets of its data are sanitised and assessed for their integrity. Specifically the name is sanitised to remove any character other than Latin alphabetic characters, hyphens (-), spaces ( ) and apostrophes ('). Various changes are made to fix common errors, such as the name McDonald appearing as “Mc Donald”; names beginning with “Dr”, “Mr”, “Mrs” etc. and names beginning with hyphens.
In some cases source data has only included a single name string and not a specifically defined forename and surname. In these cases the name parts are extracted, taking into account particles, such as “de la”, “bin” and “van”.
Forenames are assessed to ascertain if they contain more than one name and any extra components are assigned to a middle name. They may only contain one name, including particles in names like “Abd Rahaman” and “La Toya”. One current exception is when the forename was derived from a writing script other than Latin and it was determined the forename should have a space. This will be changed in a future update.
Forenames are always derived from the first part of a given name. In some cases the forename is an initial, in which case the initial is assigned to the middle name and the forename is blank.
Surnames from the Spanish and Portuguese traditions, where individuals usually have a surname from their mother and father, are stored separately so far as Forebears has been able to discern from the source data.
Multiple sources were obtained in a corrupted format, specifically diacritic marks (or accents). In these cases the data was recovered with reference to other sources for the country in question.
Some sources encoded names in ASCII without diacritic marks, when there should be. In such cases diacritics have been inserted as they should be.
A limited number of sources had a significant minority of names back-to-front (surname as forename and vice versa). In these cases Forebears has correctly arranged the names as much as possible. It is also an occasional human data entry error in any database.
Some sources were in writing scripts other than Latin. This presents an issue in that it is not known how each individual may convert their name into the Latin alphabet. This process of conversion is known as transliteration or Romanisation. Further the majority will not have a Latin rendering of their name. The solution Forebears has used, as much as possible, is to use the most prevalent trends in transliteration to systematically convert all names in a given writing script to Latin.
Forebears uses the following methods for transliteration:
- Arabic (Hassaniya): conversion tables, Government of Mauritania
- Arabic (standard): Forebears proprietary
- Armenian: ICU modified
- Azerbaijani: 'ə$' => 'e', 'Ə' => 'A', 'ə' => 'a'
- Bengali: conversion tables, Government of West Bengal
- Bulgarian: Forebears proprietary
- Burmese: Forebears proprietary
- Chinese: ICU
- Dhivehi: conversion tables, Government of The Maldives
- Farsi: Forebears proprietary
- Georgian: conversion tables, Government of Georgia
- Greek: Forebears proprietary
- Gujarati: conversion tables, Government of Gujarat
- Hebrew: Forebears proprietary
- Hindi: conversion tables, Governments of Uttar Pradesh and Rajasthan
- Japanese: jTalk
- Kannada: ICU modified
- Khmer: Forebears proprietary
- Korean: ICU
- Macedonian: Forebears proprietary
- Marathi: conversion tables, Government of Maharashtra
- Mongolian: Forebears proprietary
- Nepali: ICU modified
- Oriya: conversion tables, Government of Odisha
- Russian: Forebears proprietary
- Serbian: Forebears proprietary
- Thai: RTGS
- Tibetan: conversion tables, Government of Bhutan
- Ukrainian: Forebears proprietary
- Urdu: Forebears proprietary
- Uzbek: Forebears proprietary
The gender of individuals are sanitised to only include male, female and in a minority of cases, other. X and Y are used to denote the gender of individuals who appear with no forename or their forename is an initial. This is to maintain the sex ratio of usable forenames for producing statistics.
Dates of birth are checked to be valid and within a reasonable time period (i.e. not born in 1500).
These are the basic functions that are regularly performed on data. Many sources required specific attention, such as extracting names from elaborate strings including patronymic and matrimonial references and Hungary where many women appeared with their husband's forename and a suffix denoting “wife of”.
In countries where many people do not have surnames (Indonesia, Myanmar) the part of the name that would be used to create a surname from in a Western context has been considered as a surname.
As of September 2019, 145 jurisdictions appear with at least one level of administrative divisions within it. For example within The United States name distribution statistics can be viewed at a state and county/independent city level. Assigning individuals to administrative divisions was often simple, as many sources delineated individuals as such. Others had to be inferred from postal code and/or city, which was not always a simple task due to changes in postal codes, administrative boundaries and a variety of other issues. Administrative divisions are assigned from GeoPostcodes's global postal code database.
Once administrative divisions are assigned, the resultant taxonomy is verified against GeoPostcodes' data and other sources to ensure there are no omissions, duplications or erroneous additions.
The incidence of a name in a jurisdiction's administrative division may be lower than in the jurisdiction owing to some individuals not being assigned to a division.
A small minority of administrative divisions are missing from the source data and appear as empty.
The percentage of each administrative division's population that is represented can vary.
Forebears have assigned administrative divisions based on those used at the time individuals were referenced to administrative divisions. Administrative divisions will not be updated to account for future changes.
A small number of individuals are not assigned to a place owing to insufficient or ambiguous geographic references. A number of people who could be individually assigned to administrative divisions but would require being individually catalouged have not been assigned to divisions, owing to it being a extremely inefficient use of resources that would have considerably delayed the project.
3) Compilation
This process is followed for each jurisdiction.
1) Firstly individuals are grouped by diacritic-sensitive name by their lowest level administrative division (so in The United States, counties) or by no division if they are not assigned to one. When compiling for forenames gender is added to the grouping. Only the lowest administrative division for each individual is used because some jurisdictions do not have a universal structure for delineating divisions.
1a) Due to an unequal gender ratio in data for Macedonia, Tajikistan, Turkmenistan and Uzbekistan surnames are adjusted to the sex ratio of the country.
1b) A small number of jurisdictions with a small sample have the names immigrants moved to another table, so their incidence is not scaled up to find the approximate number of people with that name.
1c) Due to an over-representation of guest workers in the following countries: Bahrain, Hong Kong, Kuwait, Macau , Oman, Qatar, Saudi Arabia, Singapore and Taiwan, the incidence of names is modified to be in line with the representation of various ethnicities.
1d) Western forenames of Chinese people in China, Hong Kong, Macau, Singapore and Taiwan are ignored, e.g. Toby Ng.
2) The built data is re-combined to be case-insensitive.
3) When building forenames empty forenames are removed prior to building statistics.
4) When building forename statistics each administrative division (or the entire jurisdiction if no division) is assessed against the sex ratio of the country and the incidence of forenames is adjusted to bring it in line with the jurisdiction's sex ratio if need be.
5) When building forename statistics names are merged to combine incidence for the same name with different genders, e.g. males and females with the name Alex.
6) The population of each administrative division within the current jurisdiction is called from a database table. This is used to find the multiplier the sample for each administrative division needs to be adjusted by. Those within divisions are adjusted, while any not within division are left as it.
7) Any names ignored in step 1b) are reintroduced.
8) When building surname statistics any blank surnames are deleted. They are deleted at this stage as there are a number of countries where many people have no surname, such as India.
9) If the current jurisdiction has administrative divisions the built statistics are now used to create the higher level administrative divisions (including the jurisdiction) from the lowest division.
10) With statistics built the percentage share of all names and the rank of each name is calculated for each administrative division and the jurisdiction. The ordinal ranking method is been used to produce the rankings. The method ranks the name that occurs most in the area first. Name are then ranked in descending order of their incidence with an increment of one. When two or more names occur the same number of times, they share the same rank. Successive rank is incremented by the total preceding name.
| Surname | Incidence | Rank |
|---|---|---|
| Wang | 100 | 1 |
| Li | 90 | 2 |
| Chong | 90 | 2 |
| Chen | 80 | 4 |
With each jurisdiction built, they are compiled to produce the incidence, percentage of all names and rank at a global, continental, georegional (e.g. Western Europe) and onoregional* level. Finally each name is assessed to determine the country in which it has the highest incidence and is most numerous compared to other names.
*Onoregions are regions delineated by Forebears denoting areas within georegions that share similar naming traditions.
Limitations
The primary limitation of Forebears Names is the inability to obtain data on all living individuals. Approximations derived from a small percentage of a population miss many names and can produce moderate inaccuracies in rankings in balanced samples, much larger in imbalanced samples. Forebears seek to address this by continually seeking new source data. However, Forebears is the largest geospatial names database, produced from a sample eight times larger than the nearest comparable service. As such Forebears provides the most comprehensive data for most jurisdictions; and the only data in many cases. There is also very little publicly available data on the distribution of forenames. To Forebears' knowledge there are three country-level services built from a larger sample than Forebears, which are listed in Appendix III.
2) Beyond censuses, which are typically conducted every ten years and not made publicly available, the currency of sources varies. The most commonly used source is voter lists, which cover most or a large portion of a country's adult population. However these include deceased people, sometimes in small but notable quantities. They also may not be updated after someone moves.
3) Some sources may be biased towards certain ethnic groups or those with higher incomes, which is not distributed equally by name.
4) Source data may contain human data-input errors and some data is self-reported, which may include names like “fghfghfghf” or “Jones Brothers Ltd”. Where these have been identified they have been removed.
5) Due to human data-input errors names occasionally occur back to front, e.g. Smith as a forename and John as a surname. In some databases from developing countries this was more common and in those cases Forebears attempted to rectify the issue as much as is possible.
6) Some sources are biased towards certain age groups, either the young (5-18), adults (18+) or those more economically active (25/30-60/70). This will most notably cause inaccuracies in forename distribution, as trends in naming babies can move dramatically over a generation. It will also affect accuracies in surname distribution where immigration is a factor.
Appendix I: Sample Sizes
Below is a table showing the percentage of each jurisdiction's population that appears in Forebears Names' source data.
| Country | Sample Size (%) |
|---|---|
| Georgia | 100 |
| Spain | 100 |
| Israel | 100 |
| Armenia | 100 |
| Czech Republic | 100 |
| United States | 100 |
| Pitcairn Islands | 100 |
| Ukraine | 99.6281 |
| Taiwan | 99.6114 |
| Abkhazia | 98.9871 |
| Sweden | 97.2559 |
| Bulgaria | 96.517 |
| Kosovo | 96.4718 |
| China | 94.6195 |
| Saint Lucia | 94.2909 |
| Norway | 93.5677 |
| South Korea | 92.9869 |
| Trinidad and Tobago | 91.5853 |
| Slovenia | 89.6028 |
| Finland | 88.354 |
| Chile | 88.0213 |
| Indonesia | 87.6476 |
| Anguilla | 86.1433 |
| Saint Vincent and The Grenadines | 85.9841 |
| Poland | 83.9834 |
| Marshall Islands | 79.4265 |
| Lesotho | 78.8843 |
| Philippines | 76.5383 |
| Cook Islands | 73.0529 |
| Peru | 72.0211 |
| Costa Rica | 71.8452 |
| Monaco | 71.4844 |
| Scotland | 70.9169 |
| Croatia | 69.1766 |
| United States Virgin Islands | 68.9192 |
| Mexico | 68.8096 |
| Turkey | 66.8737 |
| Grenada | 66.7275 |
| Norfolk Island | 66.4639 |
| Maldives | 65.8779 |
| Guyana | 65.5974 |
| Nauru | 65.2068 |
| Lebanon | 65.2032 |
| Saint Kitts and Nevis | 65.1228 |
| Argentina | 64.9276 |
| Venezuela | 64.7 |
| Iceland | 63.928 |
| Puerto Rico | 63.6864 |
| Saint Pierre and Miquelon | 62.4733 |
| El Salvador | 62.4351 |
| India | 62.3176 |
| Nicaragua | 61.8202 |
| Jersey | 60.9373 |
| Panama | 60.911 |
| Slovakia | 60.4236 |
| British Virgin Islands | 58.8361 |
| England | 57.8076 |
| Azerbaijan | 57.2106 |
| Canada | 56.9876 |
| Denmark | 56.8946 |
| Cayman Islands | 56.6993 |
| Wales | 55.6053 |
| Papua New Guinea | 54.6311 |
| Australia | 54.0804 |
| Honduras | 52.9872 |
| Cape Verde | 52.9174 |
| Sao Tome and Principe | 51.9546 |
| Uruguay | 51.3979 |
| Cambodia | 51.1247 |
| Kyrgyzstan | 50.9016 |
| Bhutan | 50.4298 |
| Russia | 49.0728 |
| Liechtenstein | 48.1295 |
| Montserrat | 48.0699 |
| Brazil | 47.4398 |
| Saint Helena Ascension and Tristan Da Cunha | 47.1741 |
| Switzerland | 46.5344 |
| Bermuda | 45.4084 |
| Macedonia | 45.1393 |
| Belarus | 44.9081 |
| Belize | 44.7883 |
| Montenegro | 44.6058 |
| Netherlands | 44.5946 |
| Moldova | 44.5832 |
| Pakistan | 44.4554 |
| San Marino | 44.3631 |
| Benin | 44.3528 |
| Nepal | 44.0495 |
| Isle of Man | 43.9562 |
| Paraguay | 43.45 |
| New Zealand | 42.6687 |
| Vietnam | 42.1542 |
| American Samoa | 42.0328 |
| South Ossetia | 41.1361 |
| South Africa | 40.8694 |
| Palestine | 40.6801 |
| Turks and Caicos Islands | 40.4197 |
| Niger | 39.9727 |
| Belgium | 39.9002 |
| Northern Ireland | 39.816 |
| Colombia | 39.6764 |
| Niue | 39.4296 |
| Senegal | 38.9956 |
| Liberia | 38.6844 |
| Nigeria | 38.4484 |
| Uganda | 37.9698 |
| Gibraltar | 37.6763 |
| Northern Mariana Islands | 37.0039 |
| Germany | 36.8024 |
| Antigua and Barbuda | 36.608 |
| Latvia | 36.2917 |
| Barbados | 35.2443 |
| Mauritania | 32.4168 |
| Austria | 32.3281 |
| Ireland | 32.3002 |
| Jordan | 32.077 |
| Solomon Islands | 32.0136 |
| Ecuador | 31.5766 |
| Zimbabwe | 31.4489 |
| Andorra | 31.3692 |
| Luxembourg | 31.2874 |
| Greenland | 30.8668 |
| Iran | 30.8147 |
| Jamaica | 30.6644 |
| Transnistria | 30.6377 |
| Ivory Coast | 30.1093 |
| France | 29.7655 |
| Falkland Islands | 29.2581 |
| Cyprus | 28.9219 |
| Estonia | 28.3213 |
| Hungary | 27.4044 |
| Guam | 26.9932 |
| United Arab Emirates | 26.7623 |
| Bosnia and Herzegovina | 26.6208 |
| Serbia | 26.3722 |
| Aruba | 25.9586 |
| Burkina Faso | 25.8066 |
| Cameroon | 25.7179 |
| New Caledonia | 24.4795 |
| Greece | 23.8838 |
| Yemen | 23.4968 |
| Italy | 23.2548 |
| Bahamas | 22.074 |
| Kazakhstan | 21.8617 |
| French Polynesia | 20.754 |
| Singapore | 20.0904 |
| Dominica | 19.9777 |
| Guernsey | 19.904 |
| Botswana | 19.5054 |
| Iraq | 19.2098 |
| Malta | 18.9377 |
| Oman | 18.8746 |
| Lithuania | 18.8726 |
| Zambia | 18.6179 |
| Mauritius | 17.981 |
| Saint Barthelemy | 17.5029 |
| Suriname | 16.9967 |
| Mongolia | 16.4875 |
| Namibia | 15.7669 |
| Romania | 15.3673 |
| DRCongo | 15.2121 |
| Malaysia | 15.0674 |
| Algeria | 14.2567 |
| Dominican Republic | 13.9566 |
| Japan | 13.8864 |
| Brunei | 13.6021 |
| Faroe Islands | 12.8317 |
| Seychelles | 12.1355 |
| Micronesia | 11.4133 |
| Portugal | 11.1799 |
| Kuwait | 11.1786 |
| Thailand | 11.0061 |
| Qatar | 10.0262 |
| Kenya | 9.7568 |
| Wallis and Futuna | 9.5775 |
| Albania | 9.2685 |
| Hong Kong | 8.439 |
| Haiti | 8.2689 |
| Tuvalu | 7.9927 |
| Palau | 7.6766 |
| Bahrain | 7.6359 |
| Guatemala | 7.4689 |
| Tanzania | 7.0847 |
| Saint Martin | 6.9815 |
| Cuba | 6.0307 |
| Bolivia | 5.8926 |
| Kiribati | 5.555 |
| Vanuatu | 5.4279 |
| Tonga | 5.0643 |
| Tunisia | 5.0363 |
| Syria | 4.7243 |
| Fiji | 4.7199 |
| Djibouti | 4.5811 |
| Swaziland | 4.393 |
| Malawi | 4.2594 |
| Samoa | 4.0158 |
| Macau | 3.9076 |
| Morocco | 3.5919 |
| Gabon | 3.4761 |
| Uzbekistan | 2.8366 |
| Saudi Arabia | 2.7967 |
| Somalia | 2.2936 |
| Afghanistan | 2.2096 |
| Tajikistan | 2.0175 |
| Ghana | 1.7606 |
| Northern Cyprus | 1.6708 |
| Sri Lanka | 1.6422 |
| Mali | 1.6245 |
| Turkmenistan | 1.5976 |
| Togo | 1.3964 |
| Bangladesh | 1.1586 |
| Gambia | 0.9721 |
| Egypt | 0.9354 |
| Equatorial Guinea | 0.8192 |
| Libya | 0.763 |
| Ethiopia | 0.7492 |
| Angola | 0.7433 |
| Rwanda | 0.5641 |
| Sudan | 0.5336 |
| Comoros | 0.4863 |
| East Timor | 0.4733 |
| Madagascar | 0.4584 |
| Congo | 0.4549 |
| Myanmar | 0.4233 |
| Guinea | 0.3532 |
| South Sudan | 0.3144 |
| Mozambique | 0.2871 |
| Sierra Leone | 0.2744 |
| Burundi | 0.2134 |
| Laos | 0.2033 |
| Guinea Bissau | 0.1708 |
| Chad | 0.1692 |
| Central African Republic | 0.1564 |
| Eritrea | 0.1473 |
| North Korea | 0.0095 |
Appendix II: Sources
Owing to its propriety nature, Forebears does not cite sources used unless required to by law or the data is historic.
Partial list of sources:
- Hagstova Føroya. (2015). Boys names 2001-2014. Retrieved from URL
- Hagstova Føroya. (2015). Female names 2001-2014. Retrieved from URL
- Hagstova Føroya. (2015). Surnames 2001-2014. Retrieved from URL
- Instituto Nacional de Estadistica. (2018). Frecuencias de apellidos. Retrieved from URL
- Instituto Nacional de Estadistica. (2018). Frecuencias de nombres. Retrieved from URL
- Ministerstvo Vnitra České Republiky. (2016). Četnost jmen a příjmení. Retrieved from URL
- Statistični urad Republike Slovenije. (2018). Imena dečkov, Slovenija, letno.
- Statistični urad Republike Slovenije. (2018). Imena deklic, Slovenija, letno.
- Statistični urad Republike Slovenije. (2018). Priimki, Slovenija, letno.
- The Church of Jesus Christ of Latter Day Saints. (2001). 1880 United States Census.
- The Church of Jesus Christ of Latter Day Saints. (2001). 1881 British Census.
- The National Archives of Ireland. (2007). Ireland 1901 Census.
- 전자가족관계등록시스템. (2019). 아기 이름 빈도.
- 통계청. (2015). 성, 가족 기원 및 종교 관련 항목 조사
Appendix III: Services With a Larger Sample
The following services provide surname distribution statistics with a larger sample than Forebears.
Belgium: Familienaam provides distribution data from Belgium's 1998 and 2008 population registers.
France: Filae.com provides distribution data based on birth data from l'Institut National de la Statistique et des Études Économiques. This data is based on births and not where people were living in a given year.
Netherlands: The Meertens Instituut provides distribution data on all people with Dutch nationality who lived in The Netherlands in 2007.
Forebears provides data on name distribution and is not involved in the delineation of nations. The website does not specifically define any of the following disputed territories as nations or part of nations:
Abkhazia, Artsakh, Ceuta and Melilla, Crimea, Golan Heights, Israel, Kashmir, Kosovo, Northern Cyprus, Palestine, Sahrawi Republic, Somaliland, South Ossetia, Taiwan and Transnistria.
They all appear as they are controlled de facto. This decision was taken as the records Forebears uses are obtained from sources within the de facto jurisdiction; the de jure or other claimants have few or no record as to the names of who lives in them. The administrative divisions within disputed territories also often differ form the de jure administrative divisions. This is current as of April 2014.
The delineations have not been made for political reasons and Forebears does not comment on boundary disputes.
The Donetsk People's Republic and The Luhansk People's Republic are not included as they were not established at the time of creation; and it is unclear of the situation in this area. The Autonomous Administration of North and East Syria is not included as no data has been obtained covering its boundaries. It appears as part of Syria.
Forebears uses a database of around four billion people to produce statistics relating to forenames and surnames. Thus they are approximations.
Due to the amount of resources it would require to keep an accurate record of everyone alive Forebears does not seek to make edits to the underlying data.
Names containing diacritical marks (or accents) will be considered different to those that don't. For example the name Öztürk (568,848 bearers) is treated differently to Ozturk (7,920 bearers).
Names may contain any character in the Latin Unicode range as well as apostrophe ('), hyphen (-) and space ( ). These are also considered differently. For example, Jones-Williams is considered different to Jones Williams.
Due to the copyright status of various sources used and Forebears' own privacy concerns, no details on living people will be disbursed under any circumstances. If you wish to locate people with a given name, it is recommend you consult white pages or hire a private investigator.
The source data used to compile incidence does not always list a place, or region of residence. This generally relates to prisoners, military, police and foreign nationals living in a country; and to a lesser degree people whose specified place of residence could not be determined. For this reason a country may list, say ten people with a name, but only a total of nine in all the regions of the country.
As of the 4th of September 2018 the Forebears database contains 26,445,869 surnames, of which 234,267 are considered extinct.
The previous edition of the database released on the 15th of September 2014 included 11,303,059 surnames. Before that the database released on the 26th of April 2013 included 488,661 surnames. The initial database launched on the 20th of June 2012 included 424,349 surnames.
Forebears does not remove names from the website. Names cannot be owned and the factual information relating to the meaning or distribution of a surname cannot be subject to copyright.
- Some data sources used to produce the name statistics contain both a Latin and non-Latin rendering of a name. In these cases the Latin rendering was used, as is
- Latin and non-Latin data was used for a number of countries
- In the case that a standardised transliteration method was used, non-Latin forms were consistently transliterated to the same Latin rendering
Yes. Data can be accessed via API, CSV/Excel upload and web interface at OnoGraph.
Forebears does not assist those who want to mine data from the website. Due to huge levels of data mining that peaked at over 75% of total requests to the website, the site now uses a hard firewall to block such requests and a soft one to return random data when unauthorised robot access is detected.
No; though a commercially available API is planned.

