Updates Sept 16
PGrid
- Missed another chance to post to ozbargain. This time with no big fault of any scraper or classification. It was literally that the last mwave scrape was 3 hours ago and i do a scrape every 6 hours. One minor fault was that it was a card that was previously removed from mwaves website. If that happens the url in the db is marked as
failed. I realised it takes 2 scape job runs to actually get the price of a failed card, 1 run to remove the failed attribute and another to get the price. Fixed that to happen on the same run. Also upped the scraping for mwave to every 2 hours. Its not that heavy as it scrapes the price on the gpu listings page so its only a few requests. 
- I have a price alert discord job. but given it runs from random workers there isn’t an easy way to store that a message has been sent. was getting duplicate messages each time the job ran. Created a new db table to fix.
- Fixed handling
end of life items from scorptec
- Wrote initial generic item data processing pipeline. Started with SSDs from MSY.

- Added support for getting price snapshots from ssd pages from computer alliance, mwave, ple, scorptec and centrecom. All those items need to be classified though
- Fucking scraping issues. Centre com appears to have added some scraping protection using AWS WAF. This fingerprints browsers and if you make one obvious bot move it fully blocks IP addresses. Found this out after debugging using a VPN and clicking on a next page link without scrolling it into view. Most of my servers IP addresses and my home IP address have been banned.

- Updated the Centre Com scraper to use gpu listing page to get price data instead of visiting each individual gpu page. This loses the ability to find coupons for centrecom but will be a tradeoff till my scraping gets more undetectable.
- Signed up to a few residential proxy services, had a few issues with signup or payment or trails not working. Ended up paying $11 for 1GB of residential proxy usage on oxylabs.io.
- Ebay listings sometimes incorrectly showed as out of stock due to them changing layouts on where the buy button shows up. Fixed selector
- Someone emailed me using the email address in the about page. I use cloudflares service which lets you receive email on any address on a domain but they dont have any method to send email. Hooked up gmail with SES SMTP to send emails.

- The email from the user led me to finding a bug where some unverified cards wouldn’t get their gpu memory classified and when moving gpus from unverified to verified I would silently ignore cards with missing memory details even though the card itself was classified correctly. 21 GPUs fell into this being ignored, missed a good deal because of it. Manually added memory info to each of them.
- Posted an amd gpu deal on ozbargain but it didn’t get much traction. It wasn’t a great deal but then another deal was posted where the gpu was $30 less than previous and that got much more attention. Maybe nvidia vs amd popularity or random.
- Expanded scraping to the following categories after SSDs were being scraped ok:
- internal ssd
- external storage
- cpu
- memory
- headset
- monitor
- keyboard
- mouse
- Updated the scorptec browser to bypass cloudflare just from not loading any scripts. This is different from disabling javascript which they don’t allow. Before I was manually running scorptec using puppeteer on my laptop or sometimes it worked with a product called rebrowser but now I can just use playwright like I do with all other sites.
- Added scraping multiple categories from multiple source in a distributed way

- Bought a 100 australian datacentre proxy from webshare.io but turns out all the ip addresses are basically from one company and the ips gets blocked from aws waf. Lucky it was like $5
- Ran into free tier limits on my analytics app called Umami. But its open source so I self hosted it, took about an hour to set it up and swap to the self hosted instance.
- Initial experimentation to try extract structured data from SSD listing titles. About 50% of listing titles appear to have enough information to extract storage amount, form factor and interface. But might be better in the long run to figure out the brand and model of each listing and store attributes to that. Found a useful looking google sheet that lists a bunch of ssds and lots of data for each. This fits better in the long term goal of providing a much better interface to finding exactly what you are looking for with each item having attributes that you can search and filter though. But its slow to implement.
Rambling Bird
- New project to experiment with voice to text ai.
- I wanted to chat using voice with ai using cursor. Found a few paid apps but nothing popular using open source libs
- Had a chat with a doctor mate who was using a trail ai that listened to consultations and generated a report. He was considering paying $2k a year for that.
- Goal would be to create a basic open source app that just can record voice when a keyboard shortcut is pressed, send recording to an api like whisper, and have output stored in clipboard. Can potentially expand to using ai to summarise output into different formats.
- Had a first attempt to create app using Tauri but I don’t know rust and had issues installing required audio libs so gave up quickly.
- Second attempt using Electron, found it easier to get basic stuff like menu bar icons, handling global keyboard shortcuts. Haven’t implemented voice recording or sending to openai.