FractionsOfACent

Scans public GitHub for leaked API keys, DB connection strings, private keys, and JWTs. Files courtesy issues on the source repo and tracks remediation over time. C# CLI with rate-limit-aware daemon mode plus a Blazor review/visualization UI. Hash-and-discard: stores SHA-256 fingerprints, never the secret itself.

# FractionsOfACent

A public-service credential-leak detector for public GitHub. Finds exposed API keys, DB URIs, and private keys; opens courtesy issues so owners can rotate; tracks whether leaks get remediated. Hash-and-discard — the credential itself is never stored. Detection, disclosure, and measurement — not exploitation.

A public-service credential-disclosure pipeline for public GitHub repositories. It detects leaked credentials, opens a courtesy issue on the leaker's repo asking them to rotate, and tracks whether the leak gets remediated. Originated as a Masters-thesis dataset (LLM API key prevalence) and now covers a broader credential surface — see Exposure types below.

The system records metadata only — the credentials themselves are never persisted, logged, or returned from any function. The defensible hash-and-discard property is preserved across the new patterns and the new auto-notify path.

Pipeline at a glance #

                ┌──────────────────────────────────────────────────┐                 │                                                  │    GitHub  ─►   1. Scan  ─►  2. Notify (gated)  ─►  3. Recheck  ───┘    Code Search                                       (every run)    + Contents      └─ writes findings           └─ writes                                                    remediation_checks    leaker repo  ◄─── auto-issue (only when auto_inform=true) 

Each invocation runs all three phases in order and persists everything to the SQLite DB at --out. With --loop, the binary becomes a daemon that paces itself against GitHub rate limits and re-runs forever.

Why this exists #

Leaked LLM keys, cloud credentials, payment-provider tokens, and DB connection strings are an active abuse vector on public GitHub. Providers run secret-scanning partner programs, and GitHub's Push Protection blocks many of them at push time. What's missing is the public-service tier: a third-party that observes the leaks, files a courteous notice on the leaker's own repo, then watches whether the leak gets remediated. That's what this project does.

The research artifact and the public-service operator are the same binary. Aggregate measurements (leak rate per provider, time-to- remediate, notice-to-remediation conversion) fall out for free as the pipeline runs.

Exposure types #

Findings are categorized into four broad types via the exposure_types SQLite lookup table, joined to findings.exposure_type:

Type What it covers Auto-inform default
ApiKey Provider tokens — Anthropic, OpenAI (incl. legacy), Google Gemini, AWS access keys, GitHub PATs (classic + fine + OAuth + app variants), Stripe live secret/restricted, Slack bot/user/webhook, Discord webhooks, Twilio, SendGrid, Mailgun, npm, PyPI, DigitalOcean PAT/OAuth, Shopify (private + access), Square access/secret, JWT false
ConnectionString Postgres / MySQL / MongoDB / Redis URIs containing inline user:pass@host false
PrivateKey PEM-encoded private key blocks: RSA, OpenSSH, EC, PGP false
PlainTextPassword Contextual password = "..." literals + opt-in shape patterns. Opt-in only via --include-passwords because of the false-positive rate false

Every type defaults to auto_inform = false — the CLI's auto-notify pass does nothing until you flip a category on in the Web UI. This is deliberate: false positives that auto-file public issues against innocent repos = real reputational harm. You review, then approve.

The Web UI can either flip a whole category to auto-inform, or send notices manually per finding, or do both.

Research ethics & non-retention #

Methodology precedent #

Hash-and-discard methodology follows:

Meli, M., McNiece, M. R., & Reaves, B. (2019). How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories. NDSS. https://www.ndss-symposium.org/ndss-paper/how-bad-can-it-git-characterizing-secret-leakage-in-public-github-repositories/

Repository layout #

FractionsOfACent/ ├── v2/ │   ├── Shared/                 # FractionsOfACent.Shared (library) │   │   ├── Db.cs               # SQLite schema, migrations, queries │   │   ├── Finding.cs │   │   ├── Notice.cs           # Notice + RemediationCheck records │   │   ├── NoticeService.cs    # Issue-opening + notice persistence │   │   ├── GitHubClient.cs     # Search, fetch, refetch, open-issue │   │   ├── Patterns.cs         # ProviderPattern[] + ExposureTypes │   │   └── Settings.cs │   ├── Cli/                    # FractionsOfACent.Cli (exe) │   │   ├── Program.cs          # arg parsing, --loop daemon mode │   │   ├── Scraper.cs          # 3-phase pipeline (scan/notify/recheck) │   │   └── Report.cs │   └── Blazor/                 # FractionsOfACent.Blazor (Blazor Server) │       ├── Program.cs          # DI + render pipeline │       ├── VizData.cs          # Visualizations data plumbing │       ├── Components/ │       │   ├── Pages/ │       │   │   ├── Findings.razor          # Tab 1: paginated table │       │   │   └── Visualizations.razor    # Tab 2: charts + KPIs │       │   ├── CumulativeChart.razor │       │   ├── HistogramChart.razor │       │   ├── ProviderBarChart.razor │       │   └── DonutChart.razor │       ├── wwwroot/app.css │       └── appsettings.json └── v1/          # retired — see v1/DEPRECATED.md 

The C# Cli and the Web app both read/write the same SQLite file. WAL

Running #

Scanner CLI #

cd v2/Cli dotnet build export GITHUB_TOKEN=ghp_xxx           # or settings.json (see below)  # Single pass (default behaviour): dotnet run -- --out ../../findings.db --max-per-provider 50  # Daemon mode — never quits, paces itself against rate limits: dotnet run -- --out ../../findings.db --loop 30m  # Include the opt-in PlainTextPassword patterns (high FP rate): dotnet run -- --out ../../findings.db --include-passwords --loop 1h 

Useful flags:

Web UI #

cd v2/Blazor dotnet run --urls http://localhost:5000 

Two tabs:

The Web app reads appsettings.json for FractionsOfACent:DbPath (default ../../findings.db so it points at the repo root). Override the notice template by setting NoticeChannel / NoticeTitle / NoticeBody keys; otherwise the in-code default in NoticeService.cs is used.

GitHub PAT #

Both the CLI and the Web app read the token from (in order):

  1. GITHUB_TOKEN env var

  2. %APPDATA%\MindAttic\FractionsOfACent\settings.json (Windows) or ~/.config/MindAttic/FractionsOfACent/settings.json:

    { "github_token": "github_pat_..." } 

A fine-grained PAT with public-repo read and Issues: write is sufficient for the full pipeline. (Issues:write is needed because the Notify pass opens issues; if you only ever scan, public-repo read is enough.)

Rate limits #

GitHub authenticated rate limits:

The pipeline's HandleRateLimitAsync respects Retry-After, X-RateLimit-Remaining=0/X-RateLimit-Reset, and falls back to a 60s back-off for secondary limits without an explicit hint. The --loop mode is designed to ride this out indefinitely. Do not rotate multiple PATs to multiply the budget — that violates GitHub's Acceptable Use Policy. For higher legitimate throughput, apply for GitHub Research Access (academic study).

What this tool does NOT do #