Burstiness

Why real writing comes in clusters — and AI-generated text doesn't.

// The Concept

Burstiness describes the tendency of certain words, phrases, or topics to appear in clusters rather than uniformly distributed throughout a text. It's one of the most reliable statistical fingerprints that separates human writing from machine-generated output.

Human writing is naturally bursty. When you write about a topic you understand deeply, you fixate. You go deep for three paragraphs, hammering a subtopic with specific vocabulary, examples, and elaboration. Then you surface, transition, and dive into the next cluster. This creates a distinctive statistical pattern: terms bunch together in concentrated bursts, separated by gaps where different vocabulary dominates.

AI-generated text does the opposite. Language models are trained to produce text that is statistically probable across the entire training distribution. The result is a smooth, even distribution of terms — like a metronome. The model revisits themes at regular intervals, sprinkles related keywords uniformly, and maintains a consistent level of detail throughout. This uniformity is precisely what makes it detectable.

The difference is structural, not cosmetic. You can't fix AI uniformity by swapping in synonyms or adding transitional phrases. The distribution pattern is embedded at the paragraph and section level, reflecting fundamentally different generative processes.

// How It Works

Burstiness is measured as the deviation from a Poisson distribution. Under a Poisson model, word occurrences would be independent events — each word equally likely to appear at any position in the document. Real text violates this assumption dramatically.

// Burstiness measurement // Given: word W appears N times in document of length L // Poisson expectation (uniform distribution): Expected gap = L / N Variance = L / N // Poisson: variance equals mean // Actual human text: Observed variance >> Expected variance // Words cluster → long gaps + short gaps → high variance // Burstiness index B: B = (sigma - mu) / (sigma + mu) // B = -1 → perfectly periodic (anti-bursty) // B = 0 → Poisson random // B = +1 → maximally bursty (all clustered) // Typical scores: Human expert writing B = 0.4 - 0.7 // highly bursty AI-generated text B = 0.0 - 0.2 // near-uniform Academic papers B = 0.5 - 0.8 // very bursty (deep sections) Marketing copy (human) B = 0.3 - 0.5 // moderately bursty

The Katz K-mixture model formalizes this by modeling document-level word frequencies as a mixture of Poisson distributions rather than a single one. The insight: a word's probability of appearing increases after it has already appeared. Once a human writer introduces a concept, they elaborate on it — creating the burst. This "contagion" effect, where one occurrence makes the next more likely, is absent in standard language model decoding.

Church and Gale's work on Poisson mixtures showed that this bursty behavior follows a predictable pattern across languages and genres. The degree of burstiness varies by content type and topic, but its presence is a near-universal feature of human-authored text. Its absence is equally telling.

// Why It Matters for Search

AI content detectors use burstiness as a primary detection signal, often combined with perplexity to form a two-dimensional classification space. Text that is both low-perplexity (predictable) and low-burstiness (uniform) falls squarely into the AI-generated quadrant. Text that is moderate-perplexity and high-burstiness reads as authentically human.

But burstiness matters beyond detection. Google's helpful content system evaluates whether content reads like it was written by someone with genuine expertise. Expertise produces bursty text because experts go deep on subtopics before surfacing. A cardiologist writing about heart disease will spend three paragraphs on left ventricular ejection fraction, using highly technical vocabulary in a concentrated burst, then shift to patient communication in entirely different language. A generalist — or a language model — distributes medical terminology evenly across the whole piece.

This is why the "write for humans first" advice actually has a statistical basis. When you write from genuine knowledge, you naturally produce bursty text. You can't help it. The things you know deeply get deep treatment. The things you know superficially get surface treatment. That asymmetry is the burstiness signal, and it's exactly what quality evaluation systems are learning to reward.

For AI-driven search systems like Perplexity.ai and Google's AI Overviews, burstiness signals source quality. When these systems choose which sources to cite, they're evaluating coherence, depth, and authority — all of which correlate with bursty writing patterns. Content that goes deep on its core topic, rather than spreading thin across many tangentially related keywords, earns higher authority signals.

// In Practice

Write in focused bursts. This is the single most actionable takeaway from understanding burstiness. When you discuss entity schema, go DEEP for three to four paragraphs before moving on. Don't sprinkle entity-related mentions evenly through your content like seasoning — concentrate them like a focused argument.

Structure your content around topic clusters, not keyword lists. Each section should dive into one specific aspect with the vocabulary appropriate to that aspect. Let the language change as the topic changes. A section on technical implementation should sound different from a section on strategic implications — different sentence lengths, different terminology density, different levels of abstraction.

If you're using AI to assist with content, prompt it to write in focused sections rather than holistic summaries. "Write 400 words specifically about the mathematical basis of burstiness measurement" produces more bursty output than "Write a comprehensive overview of burstiness." The prompt architecture mirrors the expert writing process: depth before breadth, focused clusters before smooth transitions.

Audit your existing content for distribution patterns. Read through a piece and mark where your core topic terms appear. If they're evenly scattered, you've got a burstiness problem. Restructure so that each section creates a clear cluster of related terminology, with natural gaps in between.

Is burstiness the same as keyword density?

No, and this is a critical distinction. Keyword density measures frequency — how many times a word appears relative to total word count. Burstiness measures distribution pattern — whether those appearances cluster together or spread evenly. You could have identical keyword density in two documents with completely different burstiness scores. One clusters the keyword into focused paragraphs; the other distributes it uniformly. Same density, opposite burstiness signals.

Can AI write bursty content?

With deliberate prompt architecture, yes — but it doesn't happen by default. Standard prompts like "write an article about X" produce uniformly distributed output. To get bursty AI content, structure your prompts to mirror how human experts think: write each section independently with focused instructions, vary the depth and specificity across sections, and resist the urge to ask the model to "ensure all topics are covered throughout." That instruction is literally asking for anti-bursty writing.

Go deeper with practitioners

Join the Burstiness & Perplexity community for implementation support and weekly discussions.

Join the Community