Updates September 22
PGrid
- Started scraping eBay and Amazon for GPUs, CPUs and SSDs for USA sites using new sharded search scraping method. Went back and forth on the architecture of sharded search scraping which ate a bit of time.
- There appeared to be multiple more bot protections in Amazon US compared to Amazon AUS. I ended up adding a new type of Amazon scraping where it would go to the home page first then input search queries and continue on a single browser. This both sped up scraping as well as got past their anti-bot stuff.
- Added prompt and test file for AI-assisted end-to-end testing where AI compares screenshots and data saved in the DB for accuracy. Needed to iterate on the prompt and test file a bit. This is only done for Amazon multiple term batched searches.
- Also fixing some USA-related config issues around proxies.
- First USA data in the database was nice to see.

- Found eBay single page scraping wasn’t working. Investigating found that they changed where a JSON object that I rely on scraping was located in the HTML. Had to do some debugging as the JSON is not in an easy to access place but updated scraper to find the JSON. JSON structure is still the same luckily.
- Found some duplication issues with some SSD models. Got AI to write a script to combine some models and a brand. Updated model spec search prompt to include existing models in DB when doing specs search.
- Ran through brand classification again for SSDs and then model classification. Found some issues related to SSDs being incorrectly classified due to missing models in the database. E.g. we had
Legend 850 Lite but not Legend 850 which are 2 different models and so classification was classifying everything to the lite version and not queueing specs search. Created a prompt for AI to parse pasted Perplexity results which I will use to populate models for this case.