Leaked AI conversations
"Share" links from ChatGPT, Claude, Gemini and others get crawled and indexed, exposing private prompts, source code, contracts and medical info.
OpenDataReportPrivacy is an open-source crawler bot that continuously scans the public web โ search indexes, paste sites, forums and code repositories โ for accidentally exposed personal data, leaked AI chat transcripts, and credentials. When something is found, affected people and organizations get notified before bad actors do.
Shared AI chat links get indexed by Google. Developers paste API keys into public gists. Misconfigured S3 buckets dump PII into the open. Most of it stays there for months โ silently โ until someone malicious finds it.
"Share" links from ChatGPT, Claude, Gemini and others get crawled and indexed, exposing private prompts, source code, contracts and medical info.
API keys, tokens, .env files and database dumps end up on Pastebin, public repos and forum threads โ often by accident.
Spreadsheets, CRM exports and customer lists indexed by search engines because of a single misconfigured permission.
By the time the people involved find out, the data has been scraped, traded and used. We aim to cut that window from months to minutes.
A transparent, auditable pipeline โ every step is open-source.
Distributed crawlers respectfully scan paste sites, public repos, search indexes and known AI chat-share domains. robots.txt is always honored.
A pipeline of regex rules, entropy checks and ML classifiers flags PII, secrets and leaked transcripts โ locally, without storing the raw content.
Findings are hashed, scored and cross-checked against prior reports to eliminate false positives before any human ever sees them.
Affected domains, security contacts (security.txt), and verified individuals receive an encrypted report โ never the public.
Drop-in Python detectors for emails, IBANs, SSNs, API keys, JWTs, leaked chat HTML structures and more.
Run the crawler so that raw matches never touch disk โ only cryptographic fingerprints leave the worker.
Contribute spare compute. Workers coordinate via a lightweight gossip protocol โ no central data lake.
Automated security.txt & abuse-contact lookup, with templated, signed disclosure emails.
Aggregated, anonymized statistics so the world can see how leaky the web really is.
Designed with data-minimization, lawful basis and right-to-erasure baked into the architecture.
In 2024 thousands of "Share" links from popular AI assistants were silently picked up by search engines. They contained:
OpenDataReportPrivacy detects these pages within minutes of being indexed, extracts the affected entities (domain, employer, individual) and notifies them before a malicious scraper does.
OpenDataReportPrivacy is 100% open-source and governed by its contributors. Whether you write code, run a worker, translate docs, or report bugs โ there's a place for you.
OpenDataReportPrivacy only crawls publicly accessible URLs that any search engine could reach,
respects robots.txt and rate limits, and never attempts to bypass
authentication. Findings are disclosed responsibly to the affected parties โ never sold or published.
Those services react to confirmed dumps after the fact. OpenDataReportPrivacy is proactive โ it watches the live public web for newly exposed material and notifies the affected parties within minutes.
No. By default, the worker only stores a cryptographic fingerprint, the source URL, and a category label. The original content stays on the original page.
Yes โ domain owners can verify ownership (DNS TXT) and receive free, prioritized alerts for any finding that mentions them. Always free for individuals.
Every report includes step-by-step remediation guidance: revoking exposed credentials, requesting search-engine removal, and contacting the platform that originally hosted the data.