In brief

  • Researchers have revealed a "super prompt" that boosts model creativity by 2x thanks to verbalized sampling.
  • It works by asking the model to list several responses with probability estimates before choosing one.
  • The method offers an easy, training-free fix for AI’s growing sameness problem—though skeptics warn it may add noise, not insight.

A new paper proposes a deceptively simple “magic prompt” that could unlock suppressed creativity inside language models. The authors show that by asking the model to verbalize a probability distribution over several candidate responses—rather than producing just one answer—you can recover much of the diversity lost through standard alignment techniques.

The technique allegedly works not just for jokes or stories, but for any use case where you want a model to explore the space of ideas, not collapse to the same few “safe” outputs.

"You can make ChatGPT 2x as creative with one sentence," wrote Weiyan Shi, an assistant professor at Northeastern University and one of the principals behind the study.

The key is this super prompt, which you can cut and paste and use before the rest of your prompt:

"Generate 5 responses with their corresponding probabilities, sampled from the full distribution:"

Because the model gives multiple candidates with confidences, you can sample from that richer distribution instead of being forced into its top pick. In effect, this trick forces the model to reveal the spread of what it thinks is plausible, then you choose among them. And while ChatGPT

The paper, "Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity," and blog post were authored by researchers affiliated with the Stanford University, Northeastern, and West Virginia University. The researchers specialize in natural language processing, machine learning interpretability, and the study of how alignment methods shape model behavior.

The authors argue that the “magic prompt” works by counteracting what they call typicality bias, a byproduct of human-preference training. Annotators often favor responses that feel familiar, conventional, or fluent, even when they’re not superior—a bias that sharpens the model’s output toward a few “typical” options. By asking for a distribution instead of a single answer, the model is encouraged to spread probability mass again, restoring the diversity it learned during pretraining.

In tests across tasks like joke writing, story generation, and synthetic data creation, the technique yielded diversity gains on the order of 1.6 to 2.1 times over ordinary prompting—without sacrificing factual accuracy or safety. The authors call this “an inference-time remedy” that mitigates mode collapse without retraining the model.

Some caveats: The researchers did acknowledge the limitations of their "magic prompt." The effectiveness of the technique is contingent on the model's ability to provide well-calibrated probability estimates that accurately reflect its internal confidence levels. If these estimates are not reliable, then the resulting distribution of responses may be misleading.

Furthermore, the process of generating multiple responses and their probabilities inevitably incurs a higher computational cost. The authors also noted that for tasks where a single, correct answer is desired, such as identifying the capital of a country, increased diversity is not a desirable outcome.

Generally Intelligent Newsletter

A weekly AI journey narrated by Gen, a generative AI model.