The system is a 6-layer pipeline: Schedule → Scrape → Store → Diff → Alert/Update → Calculate. For the PoC we collapse this into a Blazor Server app with SQLite, a background scraper service, and a Razor-based UI.
Scheduler
PoC Approach
IHostedService with a Timer (BackgroundService) — runs on configurable interval, default daily.
Production Path
Hangfire / Quartz.NET / Azure Functions Timer Trigger
Scraper
PoC Approach
HttpClient + HtmlAgilityPack targeting 3–5 sample state tax sites. Each jurisdiction gets its own IScrapeStrategy implementation.
Production Path
Playwright/Puppeteer for JS-rendered pages, queue-based fan-out across 14k jurisdictions
Storage
PoC Approach
SQLite via EF Core — two logical snapshots: "current" and "previous" rate rows with an EffectiveDate column.
Production Path
PostgreSQL / SQL Server with partitioned history tables
Diff Engine
PoC Approach
LINQ comparison of today's scraped set vs latest stored set. Flags inserts, deletes, and value changes.
Production Path
Temporal tables or CDC (Change Data Capture)
Alert / Update
PoC Approach
In-app toast notifications + a ChangeLog table the UI reads. Optional email via SmtpClient.
Production Path
SNS / Slack webhook / SendGrid integration
Tax Calculator
PoC Approach
Simple endpoint: TaxAmount = price × quantity × tax_rate. Rate fetched by jurisdiction key from SQLite.
Production Path
Dedicated microservice with caching layer
SQLite with EF Core. Four tables cover the full pipeline: jurisdictions, rate snapshots, a change log, and scrape run metadata.
Jurisdictions
| Column | Type | Note |
| Id | INTEGER PK | Auto-increment |
| StateCode | TEXT | e.g. 'IL' |
| JurisdictionName | TEXT | e.g. 'Cook County' |
| JurisdictionType | TEXT | State | County | City | District |
| FipsCode | TEXT | Federal code — unique key |
| SourceUrl | TEXT | URL to scrape |
| IsActive | BOOLEAN | Soft-delete flag |
TaxRates
| Column | Type | Note |
| Id | INTEGER PK | |
| JurisdictionId | INTEGER FK | → Jurisdictions |
| Rate | REAL | e.g. 0.0625 for 6.25% |
| RateType | TEXT | General | Food | Clothing | Digital |
| EffectiveDate | TEXT (ISO) | When rate became active |
| ScrapedAt | TEXT (ISO) | Timestamp of scrape |
| ScrapeRunId | INTEGER FK | → ScrapeRuns |
| RawValue | TEXT | Original text before sanitization |
| IsCurrent | BOOLEAN | True for latest snapshot |
ChangeLog
| Column | Type | Note |
| Id | INTEGER PK | |
| JurisdictionId | INTEGER FK | → Jurisdictions |
| ChangeType | TEXT | RateChanged | NewJurisdiction | Removed |
| OldRate | REAL | null for new entries |
| NewRate | REAL | null for removals |
| DetectedAt | TEXT (ISO) | |
| Acknowledged | BOOLEAN | UI dismiss flag |
ScrapeRuns
| Column | Type | Note |
| Id | INTEGER PK | |
| StartedAt | TEXT (ISO) | |
| CompletedAt | TEXT (ISO) | null if still running |
| Status | TEXT | Running | Completed | Failed |
| TotalScraped | INTEGER | Count of jurisdictions hit |
| ChangesDetected | INTEGER | Count of diffs found |
| ErrorCount | INTEGER | |
Strategy pattern — one class per source format. A ScrapeOrchestrator loops through active jurisdictions, invokes the matching strategy, sanitizes results, and persists them.
Target Strategies (PoC)
Illinois DOR — tax.illinois.gov
Technique: HTML table parse via HtmlAgilityPack — rates listed in a <table> by county.
Sanitize: Strip '%' suffix, parse to decimal, validate 0 < rate < 0.20
California CDTFA — cdtfa.ca.gov
Technique: CSV download link — fetch file, parse with CsvHelper.
Sanitize: Map city names to FIPS codes, normalize rate format
Texas Comptroller — comptroller.texas.gov
Technique: PDF/Excel file download — parse with ClosedXML or iTextSharp.
Sanitize: Merge combined city+county rows, split into individual rate records
Pipeline Steps
1. ScrapeOrchestrator pulls all active Jurisdictions from DB
2. Groups by scraper strategy (based on StateCode or SourceUrl pattern)
3. Calls IScrapeStrategy.ScrapeAsync(jurisdiction) for each
4. Strategy returns RawScrapeResult { RawValue, ParsedRate, Confidence }
5. Sanitizer validates: rate range, format, duplicate detection
6. Persisted to TaxRates with IsCurrent = true (previous rows flipped to false)
7. DiffEngine compares new vs old, writes to ChangeLog
Single Blazor Server solution with clear vertical slices. The solution separates concerns without over-engineering — no microservices for the PoC.
TaxRateCollector/
├── TaxRateCollector.sln
├── src/
│ ├── TaxRateCollector.Web/
│ │ ├── Program.cs
│ │ ├── Components/
│ │ │ ├── Layout/
│ │ │ │ ├── MainLayout.razor
│ │ │ │ └── NavMenu.razor
│ │ │ └── Pages/
│ │ │ ├── Dashboard.razor
│ │ │ ├── Rates.razor
│ │ │ ├── ChangeLog.razor
│ │ │ ├── ScrapeRuns.razor
│ │ │ └── TaxCalc.razor
│ │ ├── wwwroot/
│ │ └── appsettings.json
│ │
│ ├── TaxRateCollector.Core/
│ │ ├── Entities/
│ │ │ ├── Jurisdiction.cs
│ │ │ ├── TaxRate.cs
│ │ │ ├── ChangeLogEntry.cs
│ │ │ └── ScrapeRun.cs
│ │ ├── Interfaces/
│ │ │ ├── IScrapeStrategy.cs
│ │ │ ├── IScrapeOrchestrator.cs
│ │ │ ├── IDiffEngine.cs
│ │ │ └── ITaxCalculator.cs
│ │ └── Enums/
│ │ ├── JurisdictionType.cs
│ │ ├── ChangeType.cs
│ │ └── ScrapeStatus.cs
│ │
│ ├── TaxRateCollector.Infrastructure/
│ │ ├── Data/
│ │ │ ├── AppDbContext.cs
│ │ │ └── Migrations/
│ │ ├── Scrapers/
│ │ │ ├── ScrapeOrchestrator.cs
│ │ │ ├── Strategies/
│ │ │ │ ├── IllinoisTableScraper.cs
│ │ │ │ ├── CaliforniaCsvScraper.cs
│ │ │ │ └── TexasExcelScraper.cs
│ │ │ └── Sanitizer.cs
│ │ ├── Services/
│ │ │ ├── DiffEngine.cs
│ │ │ ├── TaxCalculator.cs
│ │ │ ├── AlertService.cs
│ │ │ └── ScrapeSchedulerService.cs
│ │ └── Seeding/
│ │ └── JurisdictionSeeder.cs
│ │
│ └── TaxRateCollector.Tests/
│ ├── ScraperTests/
│ ├── DiffEngineTests/
│ ├── SanitizerTests/
│ └── TaxCalcTests/
└── README.md
Core contracts that drive the system. These are the interfaces you build against — implementations live in Infrastructure.
IScrapeStrategy.cs
public interface IScrapeStrategy
{
string StrategyKey { get; } // e.g. "IL-TABLE", "CA-CSV"
bool CanHandle(Jurisdiction jurisdiction);
Task<IReadOnlyList<RawScrapeResult>> ScrapeAsync(
Jurisdiction jurisdiction,
CancellationToken ct = default);
}
public record RawScrapeResult(
string RawValue, // "6.25%"
decimal? ParsedRate, // 0.0625
string? RateType, // "General"
string? JurisdictionHint, // county/city name from source
float Confidence); // 0.0–1.0
IDiffEngine.cs
public interface IDiffEngine
{
Task<DiffReport> DetectChangesAsync(
int scrapeRunId,
CancellationToken ct = default);
}
public record DiffReport(
int TotalCompared,
IReadOnlyList<RateChange> Changes);
public record RateChange(
int JurisdictionId,
ChangeType Type,
decimal? OldRate,
decimal? NewRate);
Sanitizer.cs — core logic
public static class RateSanitizer
{
public static decimal? Parse(string raw)
{
if (string.IsNullOrWhiteSpace(raw)) return null;
// Strip %, whitespace, commas
var cleaned = raw.Trim().TrimEnd('%').Trim();
if (!decimal.TryParse(cleaned, out var value))
return null;
// If > 1, assume percentage (6.25 → 0.0625)
if (value > 1m) value /= 100m;
// Validate realistic tax range
return value is > 0m and < 0.20m ? value : null;
}
}
ScrapeSchedulerService.cs
public class ScrapeSchedulerService : BackgroundService
{
private readonly IServiceScopeFactory _scopeFactory;
private readonly ILogger<ScrapeSchedulerService> _logger;
private readonly TimeSpan _interval = TimeSpan.FromHours(24);
protected override async Task ExecuteAsync(
CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
using var scope = _scopeFactory.CreateScope();
var orchestrator = scope.ServiceProvider
.GetRequiredService<IScrapeOrchestrator>();
try
{
await orchestrator.RunFullScrapeAsync(stoppingToken);
}
catch (Exception ex)
{
_logger.LogError(ex, "Scrape cycle failed");
}
await Task.Delay(_interval, stoppingToken);
}
}
}
Five pages cover the full workflow. Blazor Server's SignalR connection enables real-time scrape progress without polling.
Summary cards: total jurisdictions, last run time, changes detected, error count
"Run Now" button triggers ScrapeOrchestrator via injected service
Live progress bar during scrape (update via InvokeAsync + StateHasChanged)
Recent changes feed (last 10 from ChangeLog)
Searchable, sortable table of all current rates
Filter by state, jurisdiction type, rate range
Click row to see full history for that jurisdiction
Export to CSV button
Timeline view of all detected changes
Color-coded: green (new), yellow (changed), red (removed)
Acknowledge / dismiss individual changes
Filter by date range and change type
History table of all scrape executions
Status badges: Running, Completed, Failed
Drill into per-jurisdiction results for any run
Error details expandable per failed jurisdiction
Input fields: Jurisdiction (autocomplete), Price, Quantity
Real-time result: Tax Amount = price × qty × rate
Shows which rate is being used and its effective date
Handles combined rates (state + county + city + district)
Three 1-week sprints take you from empty repo to working PoC. Week 4 is buffer / polish.
dotnet new blazorserver, add project references, NuGet packages
Define all entities + enums in TaxRateCollector.Core
AppDbContext + SQLite connection string + initial migration
JurisdictionSeeder — seed 15–20 jurisdictions across 3 states
Basic MainLayout + NavMenu with all 5 page stubs
Dashboard page showing seeded jurisdiction count
✓ Deliverable: App runs, DB seeds, navigation works
Implement IScrapeStrategy for Illinois (HTML table)
Implement IScrapeStrategy for California (CSV download)
RateSanitizer with unit tests (edge cases: '6.25%', '0.0625', 'N/A', '')
ScrapeOrchestrator — loop, invoke strategy, persist
DiffEngine — compare current vs previous, write ChangeLog
ScrapeSchedulerService — BackgroundService with manual trigger
Dashboard "Run Now" button wired to orchestrator
✓ Deliverable: End-to-end scrape → store → diff working
Rates Browser page with search/filter/sort
Change Log page with timeline and color coding
Scrape Runs page with drill-down
Tax Calculator page with jurisdiction autocomplete
Live progress feedback during scrape on Dashboard
Error handling + logging polish
README with setup instructions
✓ Deliverable: Full PoC demo-ready
NuGet Packages
Microsoft.EntityFrameworkCore.Sqlite
Microsoft.EntityFrameworkCore.Design
HtmlAgilityPack
CsvHelper
ClosedXML
Serilog.AspNetCore
Blazored.Toast