Updates April 27
PGrid
- multiple issues related to eBay listings changing what they point to
- manually failed items still kept their price history tracked for details pages. fixed this by updating queries + how the MV regenerates for ClickHouse
- eBay listing changes not only could result in cheaper items showing incorrectly on details pages but could also cause expensive items in the same category to get historically inaccurate price history. found this on the 5080 page.
- updated the internal CLI manual fail command to be able to delete price history for a set number of days as well. multiple eBay price history issues to clean up.
- to handle all these title changes i added new logic in the generic pipeline to use an LLM to check if title changes represent the same product or not. this includes a new table in the DB to track this and allows retries if LLM calls fail. also added metrics to track this.
- newegg scraper stopped working. one prompt with browser use for Codex fixed it. verified with newegg audit scripts. issue was the source URLs being crawled.
- this ended up causing an issue where i was saving newegg URLs without cleaning up the query parameters. the new URLs being saved had different query parameters and were treated as new items. this required a migration script and updates to the newegg crawler.
- updated clickhouse to have a new table for daily price history of all items excluding eBay/Amazon. to allow price history graphs without marketplaces (for the US where price history doesn’t look clean due to eBay listings not actually being new)
- added ai browser cli commands so agents can use Perplexity/Gemini workflows easier without extra MCP setup
- added delete snapshots support to item fail command and delete price history option to item reclassify command
- laptop model specs search prompt iterations (spent most of the week here)
- tightened model identity rules to enforce brand + model_name + generation_name + screen size + form factor as the dedupe identity
- added observed part numbers/skus into model research prompts so titles can be mapped to the correct oem model identity more reliably
- shifted from platform_name wording to generation_name
- switched model expansion flow to gemini grounded two-pass (identity pass, then generation expansion pass). perpelxity was not as accurate
- iterating on generation rules for generation name, this is still not fully reliable
- fixed screen size specs search regression
- relaxed one over-strict verification which resulted in multiple failures
- started laptop variant specs search but haven’t ran it yet
- handled featured deals cheaper-item edge cases and updated cohorts/notification rules for ssd+ram deal scans