Introduction
vibetell is a tool designed to analyze credentials for indicators of LLM generation. It does not evaluate for cryptographic randomness; instead, it identifies signatures of LLM-generation that password strength tools often miss.
Each analysis returns one of three verdicts:
- LLM Likely — Multiple signals fired with strong agreement.
- LLM Possible — One signal fired or partial agreement; warrants review.
- Inconclusive — No detection signals fired.
These verdicts reflect signal intensity, not a conclusive determination of origin. A credential flagged as LLM Likely is consistent with LLM generation, but vibetell cannot prove how a credential was created.
The blind spot
The gap of existing tools
Modern strength meters measure what characters appear, not how they are ordered, and LLM-generated credentials score highly on all of them. Autoregressive generation appears to leave a structural fingerprint that CSPRNG passwords rarely carry — consistent character-class alternation. vibetell's core metric, the Same-Class Transition rate (SCT), measures exactly this. For the full methodology, see the paper.
G7$kL9#mQ2&xP4!wN8@v
Signal distribution
How LLM and random passwords score
The two distributions barely overlap. LLMs cycle character types so rigidly that the vast majority of their passwords have zero same-class adjacent pairs, pulling the entire distribution to the left. Toggle to multi-layered to add other signals that catch LLM-generated credentials harder to spot with structure alone.
Holdout validation
Tested on 18 models from 12 labs it was never trained on
The holdout experiment suggests the detection approach generalizes well, though recall varies across models. The low FPR makes vibetell a valuable tool in an auditing pipeline.
Why should I care?
LLMs are embedded in the tools that write your code — Copilot, Claude Code, ChatGPT. If a coding agent generates credentials for a .env file or a service configuration on its own, the result looks strong by every conventional measure. A report by security lab Irregular ↗ (2026) estimated an LLM-generated password carries roughly 27 bits of realistic entropy despite appearing to have ~98 bits — a gap large enough to make brute-force feasible. An attacker who correctly guesses a credential was LLM-generated can apply a mask brute-force attack and crack it in mere hours. The same problem applies to naive users who generate passwords with LLM tools. vibetell is the first tool to detect whether a credential seems LLM-generated.
How is vibetell useful?
- A new verification axis. Existing password strength tools measure one dimension: can this credential be guessed by a rule-based attack? None ask whether it has an anomalous structure. These are orthogonal questions. vibetell adds a missing axis to password strength assessment.
- Credential auditing. Scan codebases and config files for credentials silently generated by AI agents — ones that pass conventional strength checks but are actually weak. This can be an integration to tools like trufflehog.
- Verifying CSPRNG delegation. LLMs sometimes generate credentials directly instead of delegating to a secure generator as instructed. There is now a way to confirm that a key-generation task was most likely delegated to a secure generator.
- Breach forensics. Knowing whether a leaked credential is structurally consistent with LLM generation can narrow down how it was created and what else the same system may have produced. Ongoing testing suggests model-specific quirks might exist — particular character preferences and structural templates that differ by model.
- Research and education. A live demonstration of an easily measurable problem with LLM random sampling; and a concrete illustration of why high apparent entropy and actual strength are not the same thing.
Can vibetell detect passwords from any LLM?
Yes. The structural signal (SCT) is parameter-free — it measures deviation from a mathematical baseline, not a fitted profile. The vocabulary signal (LLR) was built from Claude and GPT output but fires on models outside that set. Our most likely hypothesis is that it's not capturing model-specific quirks — it's capturing what instruction-tuned autoregressive generation does to character preferences in general. In the same sense that no one would say zxcvbn is "fitted to specific humans" because it was built on human password data, vibetell isn't "fitted to specific LLMs".
Why does this work on a single credential?
vibetell isn't measuring randomness — it's detecting indicators of autoregressive generation. One sample is enough when you know what pattern to look for, the same way malware signatures or EXIF metadata work. Most LLM passwords carry a specific, measurable structural fingerprint that genuinely random passwords almost rarely produce by chance.
Does this only apply to passwords?
The tool uses "password" throughout, but the detection applies to any gibberish looking credential — API keys, secret tokens, .env values, signing keys, and so on. The structural bias is a property of how LLMs generate character sequences, not of how those sequences are used. If an LLM produced it, the fingerprint is there regardless of what it's called.
Why doesn't vibetell analyze passwords containing words or recognizable patterns?
vibetell is designed specifically for gibberish credentials — strings that look random to the eye. Passwords built from words, phrases, or word-plus-number combinations occupy a completely different structural space and require different detection methods. More importantly, they're already caught by existing tools: zxcvbn and similar analyzers are excellent at identifying dictionary words, keyboard walks, and predictable substitutions. The blind spot vibetell fills is the credential that defeats all of those checks — pure gibberish that scores maximum entropy everywhere yet was produced by an LLM.
What does INCONCLUSIVE mean?
No indicators of autoregressive generation were found. It does not mean the password is random or safe — the tool detects specific patterns, and their absence is honest silence, not a certificate of randomness.
Why LLM_POSSIBLE instead of LLM_LIKELY?
LLM_LIKELY requires both signals to agree. LLM_POSSIBLE means one fired without the other — usually the structure is LLM-like but the character choices don't match expected vocabulary, or vice versa. It's a real signal, not a near-miss.
Could a genuine random password get flagged?
Yes, but rarely. At LLM_LIKELY fewer than 1 in 100,000 genuinely random passwords are flagged — a threshold deliberately tuned for precision, so that a LIKELY verdict can be acted on directly. At LLM_POSSIBLE the net is wider by design, catching more LLM passwords at the cost of a higher false positive rate of about 3 in 1,000 across mixed lengths. The FPR is highest at shorter lengths and lower at longer lengths.
Does it work on short passwords?
Detection degrades below 16 characters — fewer adjacent pairs means less structural signal. The tool stays conservative rather than false-alarming: at length 12, the FPR at LLM_LIKELY is still near zero. The minimum supported length is 12 characters. Below that, the tool returns no verdict rather than guess.
What if someone tries to evade detection?
The realistic threat is AI coding agents generating credentials silently, with no evasion intent. For deliberate evasion, the simplest path is just calling secrets.token_urlsafe() — which vibetell correctly classifies as random, so the problem solves itself. In our testing, explicitly instructing models to avoid the pattern didn't produce independence.
Will this still work as models improve?
In our testing, the bias has been observed across different architectures, parameter scales, and labs. We don't have a clear answer to what would fix it short of training models to delegate credential generation to CSPRNG. Until then, we expect vibetell to correctly classify LLM-generated secrets in the vast majority of cases.
Is my password sent anywhere?
No. All analysis runs entirely in your browser. No data leaves your device. You can disconnect your internet after loading the page and it would still work.