Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can use BerryDB for doing this use case at scale. BerryDB is a JSON native database that can ingest PDFs, images, etc and it has a built in semantic layer (for labeling) so that way you can build your knowledge database with entities and relationships. This will ground your knowledge with entities and accuracy scales very well with large number of documents

It provides APIs to extract paragraphs or tables from your PDFs in bulk, You can also separately do bulk labeling (say classification, NER and other labeling types). Once you have a knowledge database, it creates 4 indexes on top of your JSON data layer - db index for metadata search, full text search index, annotation index and vector index, so that way you can perform any search operation including hybrid search

The fact that your data layer is in JSON, it gives you infinite flexibility to add new snippets of knowledge or new labels and improve accuracy over time.

https://berrydb.io



What’s the pricing? It doesn’t show me on mobile


It doesn't show on desktop either. It's a hash link to no anchor.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: