Anthropic's moral compass architect suggested AI overcorrection could address historical injustices

By Fox News · Fox News

One of Anthropic’s Artificial Intelligence (AI) philosophy architects argued that intentional discrimination could be a way to combat stigmas on topics of race and gender. In a 2023 paper authored alongside a number of other AI researchers, Amanda Askell, a philosopher hired by Anthropic to develop their AI’s moral compass, argued companies might benefit from a kind of overcorrection toward stereotypes. But, the paper explained, that would require human input on how to modify its answers. "Larger models can over-correct, especially as the amount of [human input] training increases. This may be desirable in certain contexts, such as those in which decisions attempt to correct for historical injustices against marginalized groups, if doing so is in accordance with local laws," Askell wrote. PALANTIR'S SHYAM SANKAR: AMERICANS ARE 'BEING LIED TO' ABOUT AI JOB DISPLACEMENT FEARS The comment referred to an experiment on how Anthropic’s models dealt with the race of students. "In the discrimination experiment, the 175B parameter model discriminates against Black versus White students by 3% in the Q condition and discriminates in favor of Black students by 7% in the Q+IF+CoT condition," the paper notes, referring to one AI trained without human corrections and a second one trained with the help of input. Askell was joined by four other authors: Deep Ganguli, Nicholas Schiefer, Thomas Kiao and Kamilė Lukošiūtė. The paper’s contents have surfaced as AI companies increasingly wrestle with the ethics their models are trained on — the presuppositions and moral determinations that inform its outputs. It also highlights the challenges engineers face in training models on human content while simultaneously trying to leave behind certain human behaviors. The question of ethics has forced Anthropic in particular into the spotlight in recent weeks. The company made headlines earlier this year for clashing with the Department of War over restrictions that prevent its technology from b…