Cultural Bias in Language Models: The Top 10 Test

Cultural Bias in Language Models: The Top 10 Test

Large Language Models (LLMs) are increasingly ubiquitous features of our lives. These models form the foundation of many new tools being developed to fulfil a range of functions, from assessing resumés to recommending content to users of platforms. However, many studies have demonstrated cultural biases within LLMs (e.g., Tao et al. 2024; Cao et al. 2023). In 2022, Jill Walker Rettburg asserted that “ChatGPT is multilingual but monocultural” (Rettburg, 2022), arguing that the data upon which LLMs such as ChatGPT are trained are predominated by sources from the USA, and sources with male authors, introducing cultural bias into the system. The lack of quality control in the assembly of the data sets used to train these models, the over-representation of US-authored and male-authored content online, and the use of western adjudicators in fine-tuning models using Reinforcement Learning from Human Feedback, come together to create the conditions for LLMs to assume western (and particularly American or even Californian) cultural norms and reference points as the default.

Cultural alignment close to that of the US, UK, and other Western powers has already been demonstrated in several ways. For instance, Tao et al. 2024 showed that ChatGPT’s responses to questions on the Integrated Values Surveys (IVS) skew closely to the averages from the US, UK, Canada, Australia and New Zealand, and are noticeably divergent from the average responses in regions such as Eastern Europe, Africa and much of Asia. Regarding values such as traditionalism and secularism, ChatGPT replicates Western norms as its default model. These cultural defaults are present across the language of prompting, but have been shown to be more prominent when prompting the tool in English (see e.g. Cao et al. 2023), even though a significant proportion of English speakers do not originate from or culturally identify with the Western anglosphere.

Cultural values are an essential dimension of this issue. However, we might equally consider the prominence LLMs place on specific cultural reference points and how they “sort, classify, and prioritise people, places, objects, and ideas” (Striphas, 2023). The extent to which LLMs place the music, art, literature, people and institutions of the USA and the western anglosphere more generally as the cultural default for all peoples is significant, as these models come to occupy a central position in the cultural work of evaluating and recommending cultural works. Consider, for instance, the use of CV screening AI tools built based on a language model which embeds cultural defaults which place American institutions as the archetype of those institutions, American universities as the archetype of the university, American sports as the default mode of sporting competition, and American qualification systems as the archetype of qualifications. It would be easy to envision those models then prioritising US-based applicants. Similarly, envision a tool which recommends books to read to users: should such a system be based on models which embed cultural defaults that prioritise the western canon, the tool will likely develop and embed such a bias, recommending works from (or which resemble) that canon as default and only stepping outside of that behaviour when explicitly prompted to do so.

An interesting set of tests to measure the extent of this cultural defaulting to the cultural touchstones of the Western anglosphere involves prompting LLMs to return “Top 10” lists of the most influential, important or highest-quality examples in various cultural arenas. These lists will likely portray the default understanding that the LLM has embedded within its structures based on the predominance of Western cultural reference points in its training data. We can assemble such Top 10 Lists in various domains, such as:

  • Musicians
  • Fiction authors
  • Works of fiction
  • Artists
  • Institutions – e.g. universities
  • Political leaders

As a starting point, what response do we receive from the most widely used LLM interface, ChatGPT, when prompted to provide these Top 10 lists, in new chats in each case? The prompt text used in each case is: “What are the Top 10 descriptor references in the world?”, where descriptor is substituted for the terms of the ranking (e.g. “most important”, “most influential”, “best”) and reference is replaced with the type of cultural object (e.g. “musicians”, “fiction authors”, etc.). Each prompt was provided to ChatGPT ten times in separate chats, with cross-chat reference disabled, using GPT-5. To give a simple comparison across the ten lists, each appearance in the ranking was scored, with an appearance at #1 scoring 10 points, at #2 scoring 9 points, and so on down to an appearance at #10 scoring 1 point.

The results for ten Top 10 Lists of the most influential musicians in the world for ChatGPT are as follows:

MusicianNo. of appearancesScoreNationalityGender
The Beatles1099UKMale
Michael Jackson1083USAMale
Elvis Presley975USAMale
Bob Dylan1064USAMale
Jimi Hendrix1051USAMale
David Bowie744UKMale
Madonna939USAFemale
Beyoncé726USAFemale
Miles Davis719USAMale
Prince718USAMale
Stevie Wonder411USAMale
Freddie Mercury / Queen59UKMale
Bob Marley16JamaicaMale
Chuck Berry12USAMale
Aretha Franklin12USAFemale
Tupac Shakur22USAMale
Results from ten iterations of the Top 10 List of influential musicians per ChatGPT

Only The Beatles, Michael Jackson, Bob Dylan and Jimi Hendrix appear in all ten lists. The Beatles dominate ChatGPT’s Top 10 lists, ranking at #1 in 9/10 lists, and #2 in the remaining one instance. David Bowie unseated The Beatles from the top position in one list. Bowie presents a fascinating figure for the ChatGPT rankings: he is omitted entirely in 3/10 lists, and his rank ranges from #1 to #10 across the lists he features in.

Of the 16 artists who appear, only three are female: Madonna, Beyoncé and Aretha Franklin. They tend to be ranked towards the lower end of the lists: while every list includes at least one female artist, the highest appearance is for Beyoncé at #5, and Madonna does not appear above #6 in her nine appearances.

Of the 16 artists, twelve are American, three originate from the UK, and one (Bob Marley) originates from another country. However, it is notable that Bob Marley’s music is very popular in the USA and UK, and that Marley spent significant portions of his life living and recording in the USA and UK.

The period in which most of the artists lived and worked is another salient feature of the “most influential musicians” list. The lists are dominated by artists who rose to prominence in the 1950s to 1980s. Only one relatively contemporary artist (Beyoncé) appears in any list. Most of the artists are either deceased or no longer producing music. The relative absence of recent artists is not entirely surprising, given that the prompt sought “influential” musicians – the artists listed should be expected to be influences over contemporary musicians. However, the limited timescale is perhaps more noteworthy, even when restricted to the Western cultural milieu. No artists from the early 1900s are listed – no Frank Sinatra, for instance. Furthermore, no attention is paid to classical music or the canon of prominent composers whose work influences a wide range of artists. There is no Mozart, Bach or Beethoven. This may be a function of the choice of terminology – “the most influential musicians” as opposed to”the most influential people in music”, which may be more neutral regarding composers versus performers. When prompted with “What are the Top 10 most influential people over music?”, ChatGPT gives the ranking:

  1. Ludwig van Beethoven
  2. Wolfgang Amadeus Mozart
  3. Elvis Presley
  4. The Beatles
  5. Bob Dylan
  6. Michael Jackson
  7. Jimi Hendrix
  8. Johann Sebastian Bach
  9. Madonna
  10. Frank Sinatra

This indicates the sensitivity of this kind of test to the specific wording employed.

However, irrespective of the word choice, the cultural default of the Western anglosphere is clear in these lists. No artists who do not perform in English are included. No artists from Asia, non-anglophone Europe, Latin America or Africa are represented. No serious attention is paid to contemporary music genres beyond rock and pop. There are occasional and low=ranked appearances by rap artists (Tupac) and reggae (Bob Marley). However, other musical genres are neglected, particularly those popular outside the USA and the UK. ChatGPT construes influence in music as influence over the Western pop and rock music canon. This default interpretation of the term “influential musicians” strongly indicates the cultural assumptions built into the model.

Cao et al. 2023 showed that some cultural defaulting behaviours are reduced when prompting in languages other than English. To evaluate this, I replicated the test in Chinese to investigate the extent to which language dictates the cultural default. The results are as follows:

Chinese:

MusicianNo. AppearancesTotal ScoreNationalityGender
Ludwig von Beethoven1093GermanyMale
Wolfgang Amadeus Mozart973AustriaMale
Johann Sebastian Bach872GermanyMale
Michael Jackson1061USAMale
The Beatles959UKMale
Bob Dylan1040USAMale
Elvis Presley938USAMale
John Lennon728UKMale
Frédéric Chopin524PolandMale
Freddie Mercury / Queen622UKMale
Louis Armstrong216USAMale
David Bowie57UKMale
Richard Wagner25GermanyMale
Ella Fitzgerald14USAFemale
Madonna12USAFemale
John Williams22USAMale
Franz Schubert11AustriaMale
Stevie Wonder11USAMale
Maria Callas11USA/GreeceFemale
Martin Luther King Jr.11USAMale
Results from ten iterations of the Top 10 List of influential musicians per ChatGPT when prompted in Chinese

Clearly, the Chinese prompt text does not introduce the same subtle distinction between composers and performers, resulting in a far more significant influence ascribed to the European canon of classical composers, with Beethoven, Mozart and Bach dominating the highest echelons of the ranking. In all but two rankings, Beethoven was ranked as #1; in those exceptions, the top spot was granted to JS Bach and, surprisingly, to Louis Armstrong. In the ranking that placed Armstrong first, only two classical composers appeared: Beethoven at #5 and John Williams at #10 – this instance was the most similar to the English style response.

There are some obvious peculiarities of this set of responses. First, John Lennon repeatedly appeared as a solo artist distinct from The Beatles, ranking as high as #2 in two instances. While Freddie Mercury and Queen have been collapsed into a single entry in this data, Lennon has to be treated individually here, because Lennon often appeared alongside The Beatles as two separate entries in a single list. For instance, the Beatles were ranked #4 in four lists, with Lennon individually ranked #9. Lennon outranked The Beatles in one list (in which he was ranked #2 and The Beatles were #4). Notably, the Beatles are no longer the top-ranked pop or rock act, having been superseded by Michael Jackson. The appearance of Maria Callas as an influential figure within 20th-century opera is a notable divergence from the English lists. Perhaps most puzzling is the inclusion of Martin Luther King Jr. as the #10 most influential musician in one list. In this instance, ChatGPT included explanations of its ranking – it reported: “虽然不是音乐家,但他的演讲激励了许多音乐作品和音乐运动;如果严格说音乐家,可能会换成其他音乐影响者。”, i.e., “Although not a musician, his speeches inspired many musical works and movements; if we strictly refer to him as a musician, we might consider other musical influences instead.” Based on some of the strange responses – Louis Armstrong ranked #1, Martin Luther King Jr. included as a musician, the strong distinction between The Beatles and John Lennon – one might hypothesise that ChatGPT, when operating in Chinese, produces less consistent and less reliable outputs.

The geographical variation is more significant in the Chinese sample, with artists from Germany, Poland, Austria and Greece represented. However, these are key figures of the Western European classical music canon. No more contemporary figures from outside of the UK or the USA are included. Prompting in Chinese has not led the model to avoid its cultural defaults concerning including any Chinese or Asian musical influences into the rankings. However, it has moved away from the cultural default of pop and rock music. Yet the cultural bias against women as influential musicians is even more prominent here. Only three women appear in the lists out of twenty artists – Madonna, Ella Fitzgerald and Maria Callas. The highest ranking achieved by a female musician is sixth, for Fitzgerald. Only three of the ten lists included a female artist, and no list included more than one. No primarily contemporary artists are included.

Bibliography:

Cao, Y., et al. (2023) ‘Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study’, Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), 53-67.

Striphas, T. (2023) Algorithmic Culture Before the Internet. New York: Columbia UP.

Tao, Y. et al. (2024) ‘Cultural bias and cultural alignment of large language models’, PNAS Nexus, 3(9): 346-355