Updates July 28

PGrid

Fixed some inconsistencies with the postgres integration test setup compared to clickhouse. Allows global prisma client to still be used which was causing an issue before.
Working on a new generic item processing pipeline that can be used by any scraping function to write snapshots and create or update generic items. Before I was trying to create a unified version but the structure for that got complicated so created another new version in a new file with a simpler structure with the help of AI.
Working with AI to create tests for this new generic item processing pipeline. Testing small parts at a time and found some bugs which I fixed.
Migrated pgrid baml to use external AI browser server for its perplexity searches.
Big eBay refactor to both be efficient for unclassified items and handle variant listing pages on eBay. Now generic eBay works in the following way:
- For a category of items:
  - Run eBay searches for all items for that category
    - From search results, create generic items for items not in our database and price snapshots for everything
      - If a new generic item was added, queue a single page crawl for it
  - Queue single page crawls for all classified items for that category
- Single page crawls:
  - If a variant page was found, queue single page crawls for all variants and ignore the base page
  - Otherwise just save price snapshot data
Added a new pipeline to handle creating generic items, updating details for items like titles and fail dates, adding price snapshots, handling currency that any scraper can use. Includes unit tests.
Added geoblock check for Amazon. There has been a recent issue where Amazon assumes my proxy is not in Australia and shows out of stock for everything. For GPUs I added a check if every result for a given GPU shows up as no price, to ignore saving that data. Need to find proper fix.
Started work on direct classification for SSDs. This would be able to run as soon as we scrape new SSD items. It will consist of small efficient tasks for AI: Ignoring items that aren’t SSDs, then doing a classification on brand based on existing brands in the database with the option to queue adding a new brand to the database.
Added a script that will allow Claude Code to review outputs of AI model classification. Used this to review how the AI is at classifying if a given title is or isn’t an SSD.
Updated the AI browser server I created which lets me run queries on Perplexity both through an OpenAI spec API as well as MCP. This can now be dockerised and run in a remote host. It can also handle Cloudflare checks and do a partially automated log in to Perplexity. Just requires the user paste in the log in code from email.
- Due to needing to update Perplexity scraper more to be able to select which model + handle more Cloudflare checks during searches I’m still running the AI browser locally for now