Open-source · MIT licensed · Community-driven

Your private data is leaking.
We find it first.

OpenDataReportPrivacy is an open-source crawler bot that continuously scans the public web — search indexes, paste sites, forums and code repositories — for accidentally exposed personal data, leaked AI chat transcripts, and credentials. When something is found, affected people and organizations get notified before bad actors do.

See how it works → View source code

URLs scanned today

Leaks reported this week

Contributors

The problem

Sensitive data ends up on the public web every single day.

Shared AI chat links get indexed by Google. Developers paste API keys into public gists. Misconfigured S3 buckets dump PII into the open. Most of it stays there for months — silently — until someone malicious finds it.

🤖

Leaked AI conversations

"Share" links from ChatGPT, Claude, Gemini and others get crawled and indexed, exposing private prompts, source code, contracts and medical info.

🔑

Exposed credentials

API keys, tokens, .env files and database dumps end up on Pastebin, public repos and forum threads — often by accident.

🪪

PII in the wild

Spreadsheets, CRM exports and customer lists indexed by search engines because of a single misconfigured permission.

⏱️

Discovered too late

By the time the people involved find out, the data has been scraped, traded and used. We aim to cut that window from months to minutes.

How it works

Crawl. Detect. Verify. Notify.

A transparent, auditable pipeline — every step is open-source.

1

Crawl the public web

Distributed crawlers respectfully scan paste sites, public repos, search indexes and known AI chat-share domains. robots.txt is always honored.
2

Detect sensitive content

A pipeline of regex rules, entropy checks and ML classifiers flags PII, secrets and leaked transcripts — locally, without storing the raw content.
3

Verify & deduplicate

Findings are hashed, scored and cross-checked against prior reports to eliminate false positives before any human ever sees them.
4

Responsibly notify

Affected domains, security contacts (security.txt), and verified individuals receive an encrypted report — never the public.

Features

Built for privacy, by privacy people.

🧩

Modular detectors

Drop-in Python detectors for emails, IBANs, SSNs, API keys, JWTs, leaked chat HTML structures and more.

🔐

Zero-retention mode

Run the crawler so that raw matches never touch disk — only cryptographic fingerprints leave the worker.

🌍

Federated workers

Contribute spare compute. Workers coordinate via a lightweight gossip protocol — no central data lake.

📬

Responsible disclosure

Automated security.txt & abuse-contact lookup, with templated, signed disclosure emails.

📊

Public dashboard

Aggregated, anonymized statistics so the world can see how leaky the web really is.

🪪

GDPR & CCPA aligned

Designed with data-minimization, lawful basis and right-to-erasure baked into the architecture.

A real example

Leaked AI chats — indexed and forgotten.

In 2024 thousands of "Share" links from popular AI assistants were silently picked up by search engines. They contained:

Full names, addresses, and medical histories
Proprietary source code and unreleased product specs
Internal incident reports and legal correspondence
Customer support transcripts with full PII

OpenDataReportPrivacy detects these pages within minutes of being indexed, extracts the affected entities (domain, employer, individual) and notifies them before a malicious scraper does.

OpenDataReportPrivacy — scanner

$ OpenDataReportPrivacy scan --source ai-share-links
[ok] crawler online · 12 workers
[scan] chat.example-ai.com/s/9f3a... 200
[hit]  detector=pii.email           confidence=0.97
[hit]  detector=pii.phone           confidence=0.92
[hit]  detector=secret.openai_key   confidence=0.99
[link] entity → acme-corp.com (security.txt found)
[mail] disclosure sent · ref #LS-2025-08F31
$ _

Community-owned

Help build a less-leaky internet.

OpenDataReportPrivacy is 100% open-source and governed by its contributors. Whether you write code, run a worker, translate docs, or report bugs — there's a place for you.

Contribute on GitHub

312

Contributors worldwide

Languages supported

100%

Open source & auditable

Data sold. Ever.

FAQ

Frequently asked questions

Is this legal?

OpenDataReportPrivacy only crawls publicly accessible URLs that any search engine could reach, respects robots.txt and rate limits, and never attempts to bypass authentication. Findings are disclosed responsibly to the affected parties — never sold or published.

How is this different from "have I been pwned"-style services?

Those services react to confirmed dumps after the fact. OpenDataReportPrivacy is proactive — it watches the live public web for newly exposed material and notifies the affected parties within minutes.

Do you store the leaked content?

No. By default, the worker only stores a cryptographic fingerprint, the source URL, and a category label. The original content stays on the original page.

Can my company subscribe to alerts for our domain?

Yes — domain owners can verify ownership (DNS TXT) and receive free, prioritized alerts for any finding that mentions them. Always free for individuals.

I found my data via OpenDataReportPrivacy — what now?

Every report includes step-by-step remediation guidance: revoking exposed credentials, requesting search-engine removal, and contacting the platform that originally hosted the data.

Your private data is leaking. We find it first.