Instead of setting up some kind of partnership with our vendors, where they just send us information or provide an API, we scrape their websites.
The old version ran in a hour, using one thread on one machine. Downloaded PDF's and extracted the values.
The new version is Selenium based, uses 20 cores, 300GB of memory, and takes all night to run. It does the same thing, but from inside of a browser.
As a bonus, the 'web scrapers' were blamed for every performance issue we had for a long time.
Instead of setting up some kind of partnership with our vendors, where they just send us information or provide an API, we scrape their websites.
The old version ran in a hour, using one thread on one machine. Downloaded PDF's and extracted the values.
The new version is Selenium based, uses 20 cores, 300GB of memory, and takes all night to run. It does the same thing, but from inside of a browser.
As a bonus, the 'web scrapers' were blamed for every performance issue we had for a long time.