The Ethics Of Natural Language Models

February 18, 2021

Joshua Powers

In our previous blog post on exploring Artificial Intelligence (AI) ethics, we used a discipline of AI called Natural Language Processing (NLP). NLP is in wide use in the automation of many tasks involving written and spoken language. But how does NLP that is trained with human-written text handle both intended and unintended biases present in these texts, and what can we do about it?

AI researchers generally view natural language as a deterministic tool that humans use consciously and intentionally to communicate information with each other. The literature of NLP and computational linguistics is filled with assumptions about the neutral straightforwardness of speech and text. The truth of a declarative statement, if considered at all, is usually assumed to be easily discerned and represented, without reference to the context in which it was written or spoken. In reality, these assumptions do not give language the proper respect that it deserves. In fact, language can also ‘use’ humans as tools by propagating ideas and influencing behavior. If language did not have this ability, we would not have poets or advertising copywriters.

This is all rather philosophical—what does it mean for the ethics of building a language model using machine learning (ML), and putting it to work in AI technologies like chatbots, search engines, and artificial authors? Machine learned models are built from human-generated linguistic utterances. The big ones—BERT, RoBERTa, XLNet—are trained on huge collections of documents from open source data such as Wikipedia, Google Books, and Reddit.[1] These sources represent a large collection of language usage within the English-speaking world. However, they are not necessarily going to represent the kind of language you would want, for instance, a customer service representative using with your clients; prominently missing from the training of these models is the intent an author had when they produced their words.

As an example, the intent of the Wikipedia article on Racism in the United States is to describe the history and scope of discrimination against groups based on their race or ethnicity in the US. However, when we open the hood on a language modeling algorithm, we find that it is not interested in the fact that the page accurately and fairly meets its intent. Instead, most algorithms see this text as a series of 10-word windows. Each time a word co-occurs with another inside this 10-word window, the model learns something about the language. When the algorithm repeats this process over billions of utterances, the model has learned what AI researchers believe is enough to perform various tasks in the language.

Here is a look at some of the information which such a modeling algorithm would consume and learn from when reading this Wikipedia page:

Words appearing within a 10-word window of ‘black’ and not appearing within a 10-word window of ‘white’ in the Wikipedia article on Racism in the United States	Words appearing within a 10-word window of ‘white’ and not appearing within a 10-word window of ‘black’ in the Wikipedia article on Racism in the United States.
angry assaulted bloody conflict convict demeaning diminished disease disrespect explosion gangs guilt gun intrinsically low-skilled mammy misconduct offenses poor poorly prison robbed terror	able-bodied affluent beneficiaries citizens civilization college democracy economy educational electoral elite faculty guardian illuminating independence normative officer organized persistent person scientific senators winning

This Wikipedia article is descriptive of a tragic history and state of affairs. It accurately records historical eras, racially charged incidents, and structural inequities. However, none of this context is available to the algorithm which is learning about language one 10-word window at a time. Not one word in the right-hand column is associated with ‘black’ by such an algorithm reading this content. Obviously, NLP algorithms are trained on much more than a single Wikipedia article, but it is hardly controversial to observe that more Internet-available content has been written about Black people in prison, gangs, and conflict than about White people in prison, gangs, and conflict.

In conclusion, it is likely that your favorite massively trained NLP model reflects both intended and unintended biases present in human-written texts. This is because the model has no ability to or interest in modeling the intent of the utterances within the text and understanding why a text would have certain characteristics.

The appropriate near-term response is to be skeptical about applying a model trained in this way to a task that requires an ethical context for successful execution. The longer-term response is for AI researchers and practitioners to begin to develop models of the intent behind texts and utterances. In applying the models, practitioners need to think about the likely consequences such models of language could have on their recipients. This is not an easy task, and unfortunately, companies such as Google and Facebook do not seem to be on board with such an undertaking. Instead, it is up to the people and organizations spending their dollars on these advanced NLP technologies to insist that their construction and application follow ethical guidelines such as those we analyzed in our previous post.

[1] Dev Technology used the RoBERTa model of natural language to win the GSA’s 2020 AI/ML Challenge

Joshua Powers

Technical Director of AI & Machine Learning

Dev Technology