Agreed. I work on a solution for extracting data from random documents ( invoices, payslips, you name it ) for natives pdf in the wild ( not scans) we gave up : and just rasterize to send to an OCR software... It's also probabilistic, but way more reliable.