All Stories

  1. Impersonating the Crowd: Evaluating LLMs' Ability to Replicate Human Judgment in Misinformation Assessment
  2. PILs of Knowledge: A Synthetic Benchmark for Evaluating Question Answering Systems in Healthcare
  3. Mapping and Influencing the Political Ideology of Large Language Models using Synthetic Personas
  4. The Elusiveness of Detecting Political Bias in Language Models: The Impact of Question Wording