Data platform · Cloud · Full-stack .NET

RA Import Platform

A production-grade data-aggregation and contact-enrichment engine that scrapes regulatory records for residential assisted-living facilities across all 50 U.S. states, normalizes them into a single source of truth, verifies contact data, and feeds marketing systems — running fully automated on Azure.

U.S. states + DC covered

Source-specific scrapers

~15.5k

Lines of service code

.NET 10

Containerized on Azure

What it does

A single automated pipeline turns fragmented public health-department data into a clean, sales-ready contact database.

🗺️

Nationwide coverage

Harvests every state's assisted-living registry — from modern open-data APIs to PDF-only records and CAPTCHA-gated portals — into one consistent dataset.

✅

Verified contacts

Enriches each facility with a website, email, and phone, then validates deliverability so only real, mailable contacts reach the marketing team.

🔁

Always current

Runs on a schedule, tracks change history per facility, and exports ready-to-import lists for ActiveCampaign (email) and Postalytics (direct mail).

Architecture

Layered pipeline — acquire → normalize → enrich → export — behind a secured API, orchestrated by scheduled background workers.

Technology stack

Modern .NET, real browser automation, and a normalized SQL model — deployed as a container on Azure with CI/CD.

Platform & language

C# .NET 10 ASP.NET Core Web API Swagger / OpenAPI Async/await throughout

Data & scraping

Azure SQL Dapper 2.1 Microsoft Playwright 1.54 HtmlAgilityPack ClosedXML Excel PdfPig PDF CsvHelper

Cloud & DevOps

Docker multi-stage Azure Container Apps Azure Container Registry Azure DevOps CI/CD Azure Key Vault Managed Identity

Integrations

Google Places API ZeroBounce verification ActiveCampaign Postalytics

Engineering highlights for technical reviewers

The problems that made this hard, and the patterns used to solve them.

Registry-driven scraper fan-out

A single KnownStates map registers 59 scrapers as keyed DI services. Adding a state is a one-line registration — no factory or switch logic. Multi-track states (OH has 3 license systems, CA/AZ have 3 sources each) coexist cleanly.

Stable identity & idempotent upserts

Every facility gets a deterministic FacilityKey (SHA-256 of canonical name + address). Re-scrapes upsert by key, so history and hard-won enrichment data survive re-runs instead of being overwritten.

Normalized schema + JSON grab-bag

Core entities (Facility, Address, Person, Email, Phone, Website) are relational; volatile state-specific fields live in an ISJSON-checked Details column with an append-only history table — schema stays stable as 50 states' quirks change.

Cost-aware enrichment

Email verification is metered per call, so verdicts are cached in-process and in SQL to avoid re-billing. A Facebook fallback with a login-wall circuit breaker recovers contacts the primary path misses.

Heterogeneous source handling

One pipeline absorbs ArcGIS/Socrata REST feeds, JS-heavy portals via real Chromium, Excel workbooks, and PDF-only state records parsed positionally — plus a reCAPTCHA-gated portal handled via snapshot.

Cloud-native operations

Multi-stage Docker image bakes Chromium + system deps for headless scraping in-container. Background workers run scrapes on an interval without blocking Kestrel startup; a warm-up service rebuilds in-memory state from SQL on boot.

Data & API surface

Normalized SQL model

Facility — identity, source, active/seen/scraped timestamps
Address / Person / FacilityPerson — typed addresses, role-carrying links (Owner, Administrator, Agent…)
Email / Phone / Website — owned by facility or person; carry source & verification state
FacilityDetails + History — current JSON snapshot plus append-only change timeline
ScrapeRun — per-run audit: counts, success, errors

API endpoints

POST /scrape — trigger an on-demand state scrape
POST /enrich · /enrich-all — run contact enrichment
GET /latest · /latest-multi — results as JSON or CSV
GET /status — facility counts & scrape-run history
GET /states — catalog of supported sources
Secured with X-Api-Key; documented via Swagger

Capabilities demonstrated services & hiring

What building and running this system proves I can deliver.

Full-stack .NET / ASP.NET Core Web scraping & browser automation at scale Data engineering & pipeline design Relational schema design & SQL Third-party API integration Data quality & email deliverability Docker & containerization Azure cloud (Container Apps, SQL, Key Vault, ACR) CI/CD pipeline authoring Background-job / scheduler design Cost optimization API design & security Marketing-ops integration (ActiveCampaign, direct mail)