Large Language Models (LLM or Large Language Model) are not only very energy intensive, but can also reproduce the biases and stereotypes acquired during their training. Microsoft researchers have developed open-source tools and datasets for testing content moderation systems: (De)ToxiGen and AdaTest. These could lead to more reliable LLMs or models similar to OpenAI’s GPT-3 that can parse and generate text with human-like sophistication. Their work was presented at the 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022).
However, the large language models (LLM), if they can adapt to a wide range of applications, involve risks due to the fact that their construction has been done on a mass of texts written by humans and taken from the Internet. As a result, they can produce inappropriate and harmful language that reproduces the stereotypes conveyed by the authors of these texts. Content moderation tools have been developed to flag or filter such language in specific contexts, but the datasets available to train these tools often do not capture the complexities of potentially inappropriate and toxic language, particularly hate speech.
(De)ToxiGen: Leveraging large language models to develop more robust hate speech detection tools
To address this toxicity issue, a team of researchers from Microsoft, MIT, the Allen Institute for AI, Carnegie Mellon University, and the University of Washington ToxiGen developed a dataset to train content moderation tools that can be used to flag malicious language and published theirs Study titled ” ToxiGen: an extensive machine-generated dataset to detect contradictory and implicit hate speech. » on archive.
Toxic language detection systems often erroneously flag text that mentions minority groups as toxic, since these groups are often the target of online hate. “Such over-reliance on spurious correlations also causes systems to struggle to recognize implicitly toxic language. »according to the researchers who, to alleviate these problems, created ToxiGen, a new large-scale, machine-generated dataset of 274,000 toxic and benign reports across 13 minority groups.
According to Microsoft, ToxiGen is said to be one of the largest publicly available datasets on hate speech.
Ece Kamar, Partner Research Area Manager at Microsoft Research and project leader for AdaTest and (De)ToxiGen, told Techcrunch:
“We recognize that any content moderation system will have flaws, and these models constantly need to be improved. The goal of (De)ToxiGen is to enable developers of AI systems to more effectively find risks or problems in any existing content moderation technology. Our experience shows that the tool can be used to test many existing systems, and we look forward to learning from the community about new environments that would benefit from this tool. »
To generate the samples, the researchers fed an LLM with examples of neutral speech and hate speech targeting 13 minority groups, including Black, Muslim, Asian, Latino, Native American, people with physical and cognitive disabilities, and LGBTQ. The statements come from existing datasets, but also from news articles, opinion pieces, podcast transcripts, and other similar public text sources.
The team demonstrated the limitations of AI in detecting toxicity: They fooled a number of AI-powered content moderation tools with statements from (De)ToxiGen, the content filter used by OpenAI in the Open API (which provides access to GPT-3). becomes.
The team said:
“The process for creating explanations for ToxiGen, called (De)ToxiGen, was designed to uncover vulnerabilities in certain moderation tools by guiding an LLM to create explanations that may misidentify the tools. Through a study of three sets of human-written toxicity data, the team found that starting with a tool and optimizing it with ToxiGen could “significantly” improve the tool’s performance. »
AdaTest: an adaptive testing and debugging process for NLP models inspired by the test-debugging cycle in traditional software engineering
The item ” Bring people together with great language models to find and troubleshoot NLP systems.” was published by Scott Lundberg and Marco Tulio Ribeiro, both Principal Investigators. AdaTest is a process for adaptive testing and debugging of NLP models inspired by the test-debug cycle of traditional software development. AdaTest promotes a partnership between the user and a large language model (LM): the LM provides tests that are validated and organized by the user and in turn gives their opinion and directs the LM towards better tests.
AdaTest, short for Human-AI Team Approach Adaptive Testing and Debugging, debugs a model by instructing it to generate a large number of tests, while a human drives the model by running valid tests and selecting and arranging semantically related topics . The goal is to target the model to specific areas of interest and use the tests to troubleshoot and retest the model. This last step of the debugging loop is important because once the tests are used to repair the model, it is no longer test data but training data.
Ece Kamar explains:
“AdaTest is a tool that leverages the existing capabilities of large language models to bring diversity to human seed tests. In particular, AdaTest puts people at the center to initiate and guide the generation of test cases. We use unit tests as language to express appropriate or desired behavior for various inputs. This allows a person to create unit tests to express desired behavior using different inputs and pronouns. Because the ability of current large-scale models to add variety to all unit tests is diverse, there may be instances where automatically generated unit tests may need human review or correction. This is where we benefit from the fact that AdaTest is not an automation tool, but a tool that helps people investigate and identify problems. »
The research team conducted an experiment to see if AdaTest made it easier for experts trained in ML and NLP and non-experts alike to write tests and find errors in the patterns. The results showed that experts using AdaTest discovered, on average, five times more model errors per minute, while non-experts – who had no programming training – were ten times more successful in finding errors in a given template (API perspective) for content moderation.
ToxiGen and AdaTest, their dependencies and source code are available on GithHub.
Sources of the article:
POISONOUS: A large-scale machine-generated dataset to detect adversarial and implied hate speech
Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar.
ADATEST: Adaptive testing and debugging of NLP models
Scott Lundberg, Marco Tulio Ribeiro, and Ece Kamar.
#DeToxigen #AdaTest #Microsofts #tools #reliable #language #models