TaxRateCollector
Implementation Plan — Proof of Concept

US Sales & Use Tax
Rate Collection System

Scrape, sanitize, store, diff, and calculate tax rates across US jurisdictions — from a single Blazor Server app backed by SQLite.

C# / .NET 8 Blazor Server SQLite + EF Core HtmlAgilityPack CsvHelper

The system is a 6-layer pipeline: Schedule → Scrape → Store → Diff → Alert/Update → Calculate. For the PoC we collapse this into a Blazor Server app with SQLite, a background scraper service, and a Razor-based UI.

Scheduler
PoC Approach
IHostedService with a Timer (BackgroundService) — runs on configurable interval, default daily.
Production Path
Hangfire / Quartz.NET / Azure Functions Timer Trigger
Scraper
PoC Approach
HttpClient + HtmlAgilityPack targeting 3–5 sample state tax sites. Each jurisdiction gets its own IScrapeStrategy implementation.
Production Path
Playwright/Puppeteer for JS-rendered pages, queue-based fan-out across 14k jurisdictions
Storage
PoC Approach
SQLite via EF Core — two logical snapshots: "current" and "previous" rate rows with an EffectiveDate column.
Production Path
PostgreSQL / SQL Server with partitioned history tables
Diff Engine
PoC Approach
LINQ comparison of today's scraped set vs latest stored set. Flags inserts, deletes, and value changes.
Production Path
Temporal tables or CDC (Change Data Capture)
Alert / Update
PoC Approach
In-app toast notifications + a ChangeLog table the UI reads. Optional email via SmtpClient.
Production Path
SNS / Slack webhook / SendGrid integration
Tax Calculator
PoC Approach
Simple endpoint: TaxAmount = price × quantity × tax_rate. Rate fetched by jurisdiction key from SQLite.
Production Path
Dedicated microservice with caching layer

SQLite with EF Core. Four tables cover the full pipeline: jurisdictions, rate snapshots, a change log, and scrape run metadata.

Jurisdictions
ColumnTypeNote
IdINTEGER PKAuto-increment
StateCodeTEXTe.g. 'IL'
JurisdictionNameTEXTe.g. 'Cook County'
JurisdictionTypeTEXTState | County | City | District
FipsCodeTEXTFederal code — unique key
SourceUrlTEXTURL to scrape
IsActiveBOOLEANSoft-delete flag
TaxRates
ColumnTypeNote
IdINTEGER PK
JurisdictionIdINTEGER FK→ Jurisdictions
RateREALe.g. 0.0625 for 6.25%
RateTypeTEXTGeneral | Food | Clothing | Digital
EffectiveDateTEXT (ISO)When rate became active
ScrapedAtTEXT (ISO)Timestamp of scrape
ScrapeRunIdINTEGER FK→ ScrapeRuns
RawValueTEXTOriginal text before sanitization
IsCurrentBOOLEANTrue for latest snapshot
ChangeLog
ColumnTypeNote
IdINTEGER PK
JurisdictionIdINTEGER FK→ Jurisdictions
ChangeTypeTEXTRateChanged | NewJurisdiction | Removed
OldRateREALnull for new entries
NewRateREALnull for removals
DetectedAtTEXT (ISO)
AcknowledgedBOOLEANUI dismiss flag
ScrapeRuns
ColumnTypeNote
IdINTEGER PK
StartedAtTEXT (ISO)
CompletedAtTEXT (ISO)null if still running
StatusTEXTRunning | Completed | Failed
TotalScrapedINTEGERCount of jurisdictions hit
ChangesDetectedINTEGERCount of diffs found
ErrorCountINTEGER

Strategy pattern — one class per source format. A ScrapeOrchestrator loops through active jurisdictions, invokes the matching strategy, sanitizes results, and persists them.

Illinois DOR — tax.illinois.gov
Technique: HTML table parse via HtmlAgilityPack — rates listed in a <table> by county.
Sanitize: Strip '%' suffix, parse to decimal, validate 0 < rate < 0.20
California CDTFA — cdtfa.ca.gov
Technique: CSV download link — fetch file, parse with CsvHelper.
Sanitize: Map city names to FIPS codes, normalize rate format
Texas Comptroller — comptroller.texas.gov
Technique: PDF/Excel file download — parse with ClosedXML or iTextSharp.
Sanitize: Merge combined city+county rows, split into individual rate records
1. ScrapeOrchestrator pulls all active Jurisdictions from DB
2. Groups by scraper strategy (based on StateCode or SourceUrl pattern)
3. Calls IScrapeStrategy.ScrapeAsync(jurisdiction) for each
4. Strategy returns RawScrapeResult { RawValue, ParsedRate, Confidence }
5. Sanitizer validates: rate range, format, duplicate detection
6. Persisted to TaxRates with IsCurrent = true (previous rows flipped to false)
7. DiffEngine compares new vs old, writes to ChangeLog

Single Blazor Server solution with clear vertical slices. The solution separates concerns without over-engineering — no microservices for the PoC.

TaxRateCollector/ ├── TaxRateCollector.sln ├── src/ │ ├── TaxRateCollector.Web/ ← Blazor Server host │ │ ├── Program.cs ← DI, EF, hosted service registration │ │ ├── Components/ │ │ │ ├── Layout/ │ │ │ │ ├── MainLayout.razor │ │ │ │ └── NavMenu.razor │ │ │ └── Pages/ │ │ │ ├── Dashboard.razor ← Overview + run trigger │ │ │ ├── Rates.razor ← Browse / search rates │ │ │ ├── ChangeLog.razor ← Diff results + alerts │ │ │ ├── ScrapeRuns.razor ← History of runs │ │ │ └── TaxCalc.razor ← price × qty × rate tool │ │ ├── wwwroot/ │ │ └── appsettings.json │ │ │ ├── TaxRateCollector.Core/ ← Domain + interfaces │ │ ├── Entities/ │ │ │ ├── Jurisdiction.cs │ │ │ ├── TaxRate.cs │ │ │ ├── ChangeLogEntry.cs │ │ │ └── ScrapeRun.cs │ │ ├── Interfaces/ │ │ │ ├── IScrapeStrategy.cs │ │ │ ├── IScrapeOrchestrator.cs │ │ │ ├── IDiffEngine.cs │ │ │ └── ITaxCalculator.cs │ │ └── Enums/ │ │ ├── JurisdictionType.cs │ │ ├── ChangeType.cs │ │ └── ScrapeStatus.cs │ │ │ ├── TaxRateCollector.Infrastructure/ ← EF, scrapers, services │ │ ├── Data/ │ │ │ ├── AppDbContext.cs │ │ │ └── Migrations/ │ │ ├── Scrapers/ │ │ │ ├── ScrapeOrchestrator.cs │ │ │ ├── Strategies/ │ │ │ │ ├── IllinoisTableScraper.cs │ │ │ │ ├── CaliforniaCsvScraper.cs │ │ │ │ └── TexasExcelScraper.cs │ │ │ └── Sanitizer.cs │ │ ├── Services/ │ │ │ ├── DiffEngine.cs │ │ │ ├── TaxCalculator.cs │ │ │ ├── AlertService.cs │ │ │ └── ScrapeSchedulerService.cs ← BackgroundService │ │ └── Seeding/ │ │ └── JurisdictionSeeder.cs │ │ │ └── TaxRateCollector.Tests/ │ ├── ScraperTests/ │ ├── DiffEngineTests/ │ ├── SanitizerTests/ │ └── TaxCalcTests/ └── README.md

Core contracts that drive the system. These are the interfaces you build against — implementations live in Infrastructure.

IScrapeStrategy.cs
public interface IScrapeStrategy { string StrategyKey { get; } // e.g. "IL-TABLE", "CA-CSV" bool CanHandle(Jurisdiction jurisdiction); Task<IReadOnlyList<RawScrapeResult>> ScrapeAsync( Jurisdiction jurisdiction, CancellationToken ct = default); } public record RawScrapeResult( string RawValue, // "6.25%" decimal? ParsedRate, // 0.0625 string? RateType, // "General" string? JurisdictionHint, // county/city name from source float Confidence); // 0.0–1.0
IDiffEngine.cs
public interface IDiffEngine { Task<DiffReport> DetectChangesAsync( int scrapeRunId, CancellationToken ct = default); } public record DiffReport( int TotalCompared, IReadOnlyList<RateChange> Changes); public record RateChange( int JurisdictionId, ChangeType Type, decimal? OldRate, decimal? NewRate);
Sanitizer.cs — core logic
public static class RateSanitizer { public static decimal? Parse(string raw) { if (string.IsNullOrWhiteSpace(raw)) return null; // Strip %, whitespace, commas var cleaned = raw.Trim().TrimEnd('%').Trim(); if (!decimal.TryParse(cleaned, out var value)) return null; // If > 1, assume percentage (6.25 → 0.0625) if (value > 1m) value /= 100m; // Validate realistic tax range return value is > 0m and < 0.20m ? value : null; } }
ScrapeSchedulerService.cs
public class ScrapeSchedulerService : BackgroundService { private readonly IServiceScopeFactory _scopeFactory; private readonly ILogger<ScrapeSchedulerService> _logger; private readonly TimeSpan _interval = TimeSpan.FromHours(24); protected override async Task ExecuteAsync( CancellationToken stoppingToken) { while (!stoppingToken.IsCancellationRequested) { using var scope = _scopeFactory.CreateScope(); var orchestrator = scope.ServiceProvider .GetRequiredService<IScrapeOrchestrator>(); try { await orchestrator.RunFullScrapeAsync(stoppingToken); } catch (Exception ex) { _logger.LogError(ex, "Scrape cycle failed"); } await Task.Delay(_interval, stoppingToken); } } }

Five pages cover the full workflow. Blazor Server's SignalR connection enables real-time scrape progress without polling.

Summary cards: total jurisdictions, last run time, changes detected, error count
"Run Now" button triggers ScrapeOrchestrator via injected service
Live progress bar during scrape (update via InvokeAsync + StateHasChanged)
Recent changes feed (last 10 from ChangeLog)
Searchable, sortable table of all current rates
Filter by state, jurisdiction type, rate range
Click row to see full history for that jurisdiction
Export to CSV button
Timeline view of all detected changes
Color-coded: green (new), yellow (changed), red (removed)
Acknowledge / dismiss individual changes
Filter by date range and change type
History table of all scrape executions
Status badges: Running, Completed, Failed
Drill into per-jurisdiction results for any run
Error details expandable per failed jurisdiction
Input fields: Jurisdiction (autocomplete), Price, Quantity
Real-time result: Tax Amount = price × qty × rate
Shows which rate is being used and its effective date
Handles combined rates (state + county + city + district)

Three 1-week sprints take you from empty repo to working PoC. Week 4 is buffer / polish.

Sprint 1 — Foundation Week 1
dotnet new blazorserver, add project references, NuGet packages
Define all entities + enums in TaxRateCollector.Core
AppDbContext + SQLite connection string + initial migration
JurisdictionSeeder — seed 15–20 jurisdictions across 3 states
Basic MainLayout + NavMenu with all 5 page stubs
Dashboard page showing seeded jurisdiction count
✓ Deliverable: App runs, DB seeds, navigation works
Sprint 2 — Scraper Pipeline Week 2
Implement IScrapeStrategy for Illinois (HTML table)
Implement IScrapeStrategy for California (CSV download)
RateSanitizer with unit tests (edge cases: '6.25%', '0.0625', 'N/A', '')
ScrapeOrchestrator — loop, invoke strategy, persist
DiffEngine — compare current vs previous, write ChangeLog
ScrapeSchedulerService — BackgroundService with manual trigger
Dashboard "Run Now" button wired to orchestrator
✓ Deliverable: End-to-end scrape → store → diff working
Sprint 3 — UI & Calculator Week 3
Rates Browser page with search/filter/sort
Change Log page with timeline and color coding
Scrape Runs page with drill-down
Tax Calculator page with jurisdiction autocomplete
Live progress feedback during scrape on Dashboard
Error handling + logging polish
README with setup instructions
✓ Deliverable: Full PoC demo-ready
Microsoft.EntityFrameworkCore.Sqlite Microsoft.EntityFrameworkCore.Design HtmlAgilityPack CsvHelper ClosedXML Serilog.AspNetCore Blazored.Toast