What is it about?
Large language models such as ChatGPT have become popular tools for creative writing and ideation. While any individual model output might seem compelling and novel, reading through multiple texts produced by the same prompt can be a deflating experience. In this study, we reveal a lack of imagination in current language models – stories generated by these models show very little variation, often echoing similar plot elements across outputs. In contrast, human-written narratives tend to be one-of-a-kind, or Sui Generis. To quantify this phenomenon, we introduce the Sui Generis score, an automatic metric that measures the uniqueness of a plot element among alternative storylines generated using a large language model. Evaluating on a hundred short stories, we find that model-generated stories often contain “echoes” of plot elements that repeat across generations, while plots from the original human-written stories are rarely echoed. Interestingly, the Sui Generis score also aligns with human perceptions of surprise. This suggests that the score can be a proxy measure of surprise or interestingness, which may find uses in both model improvements and collaborative writing tools.
Featured Image
Photo by Mohamed Nohassi on Unsplash
Why is it important?
There has been an ongoing debate on the usefulness of large language models for tasks that require human-level creativity. This study contributes to that discussion by introducing a novel metric that quantifies the usefulness of such models for creative content generation. This metric can be used for building more advanced writing tools that help writers identify segments where more creativity is needed. It can also help drive progress toward large language models that generate more diverse and creative content.
Perspectives
There has been an increasing fear that human jobs, even those that require creativity, will be replaced by AI models. However, this work suggests that AI models alone still cannot generate the kinds of creative content that humans generate. I believe that good creative content in the future will be the products of hybrid systems with both humans and AI in the loop.
Weijia Xu
Microsoft Corp
Read the Original
This page is a summary of: Echoes in AI: Quantifying lack of plot diversity in LLM outputs, Proceedings of the National Academy of Sciences, August 2025, Proceedings of the National Academy of Sciences,
DOI: 10.1073/pnas.2504966122.
You can read the full text:
Resources
Contributors
The following have contributed to this page







