Open-source ยท MIT licensed ยท Community-driven

Your private data is leaking.
We find it first.

OpenDataReportPrivacy is an open-source crawler bot that continuously scans the public web โ€” search indexes, paste sites, forums and code repositories โ€” for accidentally exposed personal data, leaked AI chat transcripts, and credentials. When something is found, affected people and organizations get notified before bad actors do.

0
URLs scanned today
0
Leaks reported this week
0
Contributors
The problem

Sensitive data ends up on the public web every single day.

Shared AI chat links get indexed by Google. Developers paste API keys into public gists. Misconfigured S3 buckets dump PII into the open. Most of it stays there for months โ€” silently โ€” until someone malicious finds it.

๐Ÿค–

Leaked AI conversations

"Share" links from ChatGPT, Claude, Gemini and others get crawled and indexed, exposing private prompts, source code, contracts and medical info.

๐Ÿ”‘

Exposed credentials

API keys, tokens, .env files and database dumps end up on Pastebin, public repos and forum threads โ€” often by accident.

๐Ÿชช

PII in the wild

Spreadsheets, CRM exports and customer lists indexed by search engines because of a single misconfigured permission.

โฑ๏ธ

Discovered too late

By the time the people involved find out, the data has been scraped, traded and used. We aim to cut that window from months to minutes.

How it works

Crawl. Detect. Verify. Notify.

A transparent, auditable pipeline โ€” every step is open-source.

  1. 1

    Crawl the public web

    Distributed crawlers respectfully scan paste sites, public repos, search indexes and known AI chat-share domains. robots.txt is always honored.

  2. 2

    Detect sensitive content

    A pipeline of regex rules, entropy checks and ML classifiers flags PII, secrets and leaked transcripts โ€” locally, without storing the raw content.

  3. 3

    Verify & deduplicate

    Findings are hashed, scored and cross-checked against prior reports to eliminate false positives before any human ever sees them.

  4. 4

    Responsibly notify

    Affected domains, security contacts (security.txt), and verified individuals receive an encrypted report โ€” never the public.

Features

Built for privacy, by privacy people.

๐Ÿงฉ

Modular detectors

Drop-in Python detectors for emails, IBANs, SSNs, API keys, JWTs, leaked chat HTML structures and more.

๐Ÿ”

Zero-retention mode

Run the crawler so that raw matches never touch disk โ€” only cryptographic fingerprints leave the worker.

๐ŸŒ

Federated workers

Contribute spare compute. Workers coordinate via a lightweight gossip protocol โ€” no central data lake.

๐Ÿ“ฌ

Responsible disclosure

Automated security.txt & abuse-contact lookup, with templated, signed disclosure emails.

๐Ÿ“Š

Public dashboard

Aggregated, anonymized statistics so the world can see how leaky the web really is.

๐Ÿชช

GDPR & CCPA aligned

Designed with data-minimization, lawful basis and right-to-erasure baked into the architecture.

A real example

Leaked AI chats โ€” indexed and forgotten.

In 2024 thousands of "Share" links from popular AI assistants were silently picked up by search engines. They contained:

  • Full names, addresses, and medical histories
  • Proprietary source code and unreleased product specs
  • Internal incident reports and legal correspondence
  • Customer support transcripts with full PII

OpenDataReportPrivacy detects these pages within minutes of being indexed, extracts the affected entities (domain, employer, individual) and notifies them before a malicious scraper does.

Community-owned

Help build a less-leaky internet.

OpenDataReportPrivacy is 100% open-source and governed by its contributors. Whether you write code, run a worker, translate docs, or report bugs โ€” there's a place for you.

312
Contributors worldwide
47
Languages supported
100%
Open source & auditable
0
Data sold. Ever.
FAQ

Frequently asked questions

Is this legal?

OpenDataReportPrivacy only crawls publicly accessible URLs that any search engine could reach, respects robots.txt and rate limits, and never attempts to bypass authentication. Findings are disclosed responsibly to the affected parties โ€” never sold or published.

How is this different from "have I been pwned"-style services?

Those services react to confirmed dumps after the fact. OpenDataReportPrivacy is proactive โ€” it watches the live public web for newly exposed material and notifies the affected parties within minutes.

Do you store the leaked content?

No. By default, the worker only stores a cryptographic fingerprint, the source URL, and a category label. The original content stays on the original page.

Can my company subscribe to alerts for our domain?

Yes โ€” domain owners can verify ownership (DNS TXT) and receive free, prioritized alerts for any finding that mentions them. Always free for individuals.

I found my data via OpenDataReportPrivacy โ€” what now?

Every report includes step-by-step remediation guidance: revoking exposed credentials, requesting search-engine removal, and contacting the platform that originally hosted the data.