A Linux container wakes every hour, scrapes 40 Smyths Toys product pages, diffs against the previous run, and commits the results to this repo. This page is rendered from those commits.
This is a small, single-purpose product watcher. Every hour it pulls the same 40 Smyths Toys product pages, parses each one into structured fields (name, price, availability, image), and compares against the previous run's snapshot.
If anything moved — a price drop, a stockout, a renamed product — that change shows up in the feed below within an hour of it happening on the live site. If a run fails, the failure itself becomes a row in the timeline, with the stderr tail attached.
There is no database and no server. The scraper writes JSON files into this repository; GitHub Pages serves the dashboard; the browser does the rendering. View source on this page — the fetch paths are the data contract.
Tor daemon up, Xvfb virtual display up, then a self-check verifies Tor control auth + NEWNYM works (refuses to start if not), then patchright launches a real headed Chrome.SIGNAL NEWNYM to Tor to rotate the exit circuit, then retry.data/latest-changes.json.data/timeline.json, refresh data/run-summary.json, commit + push.Reverse-chronological. Newest run at top. Click any row to see captcha counts, Tor rotations, duration, and (on failures) the stderr tail.
Field-level diff from the most recent run. Price moves are coloured down / up.
Compared against what? Each run is diffed against the immediately previous run's snapshot — the product records stored in this repo at data/runs/<previous-timestamp>.ndjson. Matching is by product SKU. The very first run has nothing to compare against, so every product counts as new; from the second run onward you see only what actually moved (price, in-stock, name, image, category).
When a Tor exit gets challenged, Smyths shows an hCaptcha “select in order” doodle puzzle. The free vision-LLM solver reads the legend, locates each target, and clicks in order. Below is what it actually faced this run — the coloured rings are where the solver clicked — with the verdict.
Real problems from building + running this, newest first. The Tor-rotation bug below was found and fixed on this very deployment — the failed run on the timeline is it.
tor --hash-password uses a random salt each call). Every SIGNAL NEWNYM failed auth (515), so the scraper could never leave a bad exit. Three-part fix: the hash is now generated at build time from the real password so it always matches; rotateExit() now parses Tor's reply and only reports success on a real 250 OK (it used to report success even on auth failure); and the container runs a boot-time self-check that refuses to start if rotation auth is broken.503/429 under load, so the captcha solver couldn't read the puzzle.→Providers are chained (Gemini native → Gemini → OpenRouter Qwen3-VL → Groq → Nvidia) and rotate on rate-limit/5xx. When all free tiers are saturated, the lane rotates the Tor exit instead — a fresh exit often isn't challenged at all. Transient overload is recovered from, not a code bug.patchright, a Chromium fork that strips the CDP automation signals Imperva fingerprints.Xvfb (a virtual display) in the container. Nothing is drawn to a physical screen, but Chrome behaves as if it had one — this is exactly what lets it run in a Linux container with no GUI.Tor instance; on consecutive failures the runner sends SIGNAL NEWNYM over the control port to rotate the exit circuit (free, instant).scale: 'css' to page.screenshot() so the solver and the page share one coordinate system.apt-get failure during the Docker build (mirror hiccup).→The Dockerfile is idempotent; a plain rerun cleared it. No source change required.Two layers: fast automatic recovery inside each run, and an agent loop that monitors and repairs the code itself.
SIGNAL NEWNYM picks a fresh circuit.The scraper itself: patchright runner, Tor wiring, vision-LLM captcha solver, Dockerfile, hourly cron unit. Request access from Philippos.
This site. Hourly the scraper commits new JSON files here; GitHub Pages serves them; this page renders them client-side.
The one-shot precursor: a single backfill of 58,605 UK Smyths products. This project is the recurring-watcher follow-up, deliberately scoped to 40 URLs to prove robustness + diffing.
The captcha solver here is mirrored into the hydrascrape codebase on Tamas & Amine's side as a drop-in module.