Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you compare to Onyx? We've used it for some limited use cases, but one of the real challenges - and one I hope to see a lot of innovation on in the space - was permissioning.

I see in another comment that you encourage each user to build their own dataset with their own permissions, but often this breaks for founders. If I have a Super Secret Personnel Planning Google Doc at a founder level, how can I be the one to set up the system for our company, but ensure that only files that I've explicitly shared with the company are ingested? What if a file needs to be made anyone-with-link-can-access for sharing with a strategic partner, but that shouldn't be indexed for the entire company?

Far too much of the world relies on the security-by-obscurity of public-but-unindexed links, and communications that might look public from a metadata perspective but were carefully designed for a very specific group of people who have verbal/mental context about confidentiality expectations. Being able to categorize by likely confidentiality, and allowing an administrator to partition access on a project and sub-project basis based on that, might be crucial for growth.

My recollection is that Onyx had limited support for some security use cases, but very rudimentary. Hoping you can solve this in a thoughtful way!

Onyx links for comparison:

https://www.onyx.app/

https://docs.onyx.app/developers/guides/chat_guide

https://docs.onyx.app/admin/connectors/official/



It’s a good point. It IS hard to map the various “off-market RBACs” onto a unified model and this is part of the reason we delay that - and instead handle it with per-user syncs that include the q=“sharedWithMe” parameters.

As for intelligently - but probabilistically - determining confidentiality (if I read that correctly), that does sound pretty interesting in scenarios where metadata is just simply insufficient. Also tricky. Sounds like you thought about these problems pretty deeply.


@btown: Biggest difference: Airweave is infra for devs, i.e., connectors, sync, indexing (semantic + keyword), and a retrieval API/MCP designed with LLMs in mind as the consumers. You bring the agent/UI. Onyx is an end-to-end search app that owns the agentic reasoning layers that orchestrates their search. You can think of Airweave as a dev tool that you would use if you were building an agentic application, where Onyx is a good example of one.

On permissioning: we default to per-user syncs that adopt the permissions of the syncing user and mirror source ACLs (e.g., Drive items a user owns or that are sharedWithMe). In practice, founders avoid leaking private docs by either (a) having each user sync their own corpus, or (b) using a centrally-scoped token limited to Shared Drives/team folders and excluding personal “My Drive.” You can also keep separate collections and only expose cross-user search behind your own checks. We’re exploring richer org-level RBAC mapping on a per-customer basis (e.g., mapping Drive/SharePoint groups to index ACLs), but the above works today.

@Weves: Thanks, appreciate it!


Don't mean to hijack (one of the Onyx founders here), but the example you described should be doable with Drive service accounts. Admittedly, our permissioning system is only implemented for a handful of connectors like Drive.

Congratulations on the launch Rauf & Lennert! Always great to have more innovation in the open source AI space :D. It looks like Airweave works well with Cursor, something we don't have nailed down yet!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: