AI detectors primarily rely on three statistical measures to distinguish human from AI writing: perplexity, burstiness, and token probability patterns. Understanding these explains both how they work and why they fail — and why no detector is perfectly reliable.
1. Perplexity: How Predictable Is the Text?
Perplexity measures how "surprising" or unpredictable text is to a language model. AI-generated text tends to choose the most statistically probable next word, creating low perplexity (easy to predict). Human writing is more random, resulting in higher perplexity.
Why perplexity fails: Well-structured academic writing and technical documents naturally have low perplexity because they use standard terminology. A chemistry paper will use predictable chemistry terms regardless of who wrote it.
2. Burstiness: How Varied Are Sentence Lengths?
Burstiness measures variation in sentence structure and length. Human writing naturally alternates between short punchy sentences and longer complex ones. AI tends to produce uniform sentence lengths — low burstiness is a key detection signal.
Why burstiness fails: Writers trained in formal writing (ESL learners, style guide followers) often produce naturally uniform text. Newer AI models have also learned to vary sentence length more convincingly.
3. Token Probability Patterns
Detectors run text through a language model and check whether each word was the "top prediction" for that position. A high match rate suggests AI authorship. The problem: common phrases and standard vocabulary will always match top predictions — writing about popular topics with standard vocabulary produces high-probability sequences regardless of the author.
The Fundamental Problem
- •AI models are trained on human writing, so they mimic human patterns by design
- •Good human writing follows conventions that AI also follows
- •The better your writing, the more "AI-like" it may appear to detectors
- •There is no reliable "signature" that definitively marks AI text
- •As AI models improve, they become harder to detect — and the arms race continues
What This Means in Practice
- •No detector is perfect — they are making educated probability guesses, not definitive determinations
- •Results need context — a high AI score is not proof of AI use
- •Multiple signals matter more than any single tool result
- •Human judgment combined with process evidence (drafts, notes) is more reliable than a detector score alone
Frequently Asked Questions
What is perplexity in AI detection?
Perplexity measures how predictable the text is to a language model. AI-generated text tends to choose the most probable next word, resulting in low perplexity. Human writing is less predictable, resulting in higher perplexity. However, formal academic writing also has low perplexity, causing false positives.
What is burstiness and why does it matter?
Burstiness measures variation in sentence length. Humans naturally mix very short and very long sentences. AI produces more uniform sentence lengths. Low burstiness is one of the key signals AI detectors use to identify machine-generated content.
Why do AI detectors produce false positives?
Because good human writing shares the same characteristics detectors look for: formal vocabulary, consistent structure, logical transitions, and predictable word choices. Non-native English speakers and academic writers are particularly vulnerable to false positives.