Source: Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models
Summary
This paper introduces STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking), an LLM-based system designed to automate the “pre-writing” stage of generating grounded, long-form articles (like Wikipedia pages). The system addresses the challenges of researching a topic and preparing an outline from scratch.
Key Entities
Concepts
- Pre-writing Stage: The research and planning phase before drafting an article.
- Perspective-Guided Question Asking: Using diverse perspectives (e.g., event planner vs. layperson) to generate more in-depth research questions.
- Simulated Conversation: Multi-turn dialogue between a writer and an expert to iteratively gather information.
- FreshWiki Dataset: A curated dataset of recent, high-quality Wikipedia articles used to evaluate the system and avoid data leakage.
- Outline-driven RAG: A baseline method using retrieval-augmented generation to create outlines.
Key Claims
- STORM outperforms outline-driven RAG baselines in article organization (25% increase) and coverage (10% increase).
- Experienced Wikipedia editors found STORM helpful for the pre-writing stage, particularly for new topics.
- Challenges identified include source bias transfer, over-association of unrelated facts (red herring fallacy), and the need for better neutrality.
- The system is implemented using the DSPy framework.