Facebook’s Data Team released a very interesting post on Facebook’s blog late last night that reviled the ethnic make-up of Facebook as well as trends in adoption over the past several years. How you ask, especially when Facebook doesn’t request demographic information from it’s users upon registration? They used a mixed modeling method based on surnames. The whole experiment was started to answer the question “How diverse are the ethnic backgrounds of the people using Facebook?” In the blog post they explain more about how surnames we used to determine race:
Comparing people’s surnames on Facebook with data collected by the U.S. Census Bureau, we are able to estimate the racial breakdown of Facebook users over the history of the site.
They are calling this “Friendship Diversity.” They outlined the methodology in the post as well:
The U.S. Census Bureau’s Genealogy Project publishes a data set containing the frequency of popular surnames along with a breakdown by race and ethnicity. These data are the key to our analysis, so we will spend some time describing them in some detail. An example of the raw data is shown below for the three most-frequent surnames in the census: Smith, Johnson and Williams. These data provide the rank in the population, the total count of people with the name, their proportion per 100,000 Americans, and the percent for various races: White, Black, Asian/Pacific Islander, American-Indian/Alaskan Native, two or more races and Hispanic respectively . This data set allows us to predict what a person’s race is based solely on his or her surname. While these predictions will be often be wrong, in aggregate they will be correct. For example, suppose you select 10,000 people with the name Smith from the U.S. population at random. The data above suggest that 7,335 of them will be White, 2,222 will be Black and so on. Certain names will be more predictive of a certain race, while others will predict a wide array of ethnic backgrounds. The table below shows the top three names within the top 1,000 ordered by the percent in a given group. It shows that some ethnicities have distinctive surnames while others do not. For instance, 98.1% of individuals with the name Yoder are White while the most predictive name for American Indian / Alaskan Native individuals only has 4.4% in that group. For this reason, we will only look at White, Black, Asian/Pacific Islander and Hispanic predictions in our analysis….Finally, we adjust the estimates in our analyses with Internet adoption rates based on values from the National Telecommunications and Information Administration report on the Networked Nation. We use the percent of households with Internet access as a proxy for the addressable Internet population of each race or ethnicity.
That’s a mouth full. You can read more on their methodology on the original blog post from the data team there. What is interesting is the results. Facebook’s users are 11% African-American and saw a surge in adoption in 2009, up from 7% in 2005. The African-American US population is 12%. When measuring Latinos using this methodology it was a bit more off. Currently they make up 15% of the US population, on Facebook they represent 9%. In late 2005 Latino’s represented 3% of Facebook’s users. Pew recently released a study saying 44% of Twitter’s users are African-American and Latino.
The data team at Facebook plans to use First names in the future as well as friend connections the get a better understanding of the “diversity of interpersonal relationships.”