Updates April 7
PGrid
- Implementing handling updates to clickhouse for when things like failed date or gpu titles change on the generic tables. Needed to learn about different clickhouse engines and how they handle updates and performance considerations. Initial attempt at using a temporary table with update data didn’t quite work.
- Ended up deciding to split the clickhouse single item snapshot table into an item snapshot table and an item details table. which means every single query will need to be rewritten and use joins but now the item snapshot table can be largely just append only and updates to item metadata can be synced easier.
- Updating clickhouse queries to use the two tables. Had issues with updating the daily price history, not fully matching the data from postgres. Was due to filtering out items with failed date incorrectly.
- New server with 24gb ram and 6 EPYC cores which will be used for clickhouse. Connected to nomad cluster
- New clickhouse query to get lowest price for all items in a category. Almost all aggregation logic is done in sql compared to before where a lot was done in node.
- Added a script for integrity checks between old gpu queries and new clickhouse queries. Integrity checks implemented for lowest price for a given single product, all listings for a given single product and lowest prices for all items in a category (main page table).
- Increase performance of gpu table to generic table migration functions so they can be ran more frequently when old gpu tables get updated
- Deployed clickhouse on new server