More

mfkhalil · 2025-09-26T20:06:08 1758917168

Hey, appreciate the feedback. Will address all your points.

Regarding Reddit, we have our own custom handler for Reddit URLs which uses the Reddit API, which we are billed for when we exceed free limits.

For Terms of Service, you're right, that is definitely an oversight on our part. We just published both our Terms of Service and Privacy Policy on the website.

When it comes to comparing with GPT-5 and Claude, we do believe that our prompting, agent orchestration, and other core parts of the product such as parallel search results analysis and parallel agents are improvements on just GPT-5 and Claude, while also allowing it to run at much cheaper costs on significantly smaller models. Our v1 which we built months ago was essentially the same as what GPT-5 thinking with web search currently does, and we've since made the explicit choice to focus on data quality, user controllability, and cost efficiency over latency. So while yes, it might give faster results and work better for smaller datasets, both we and our users have found Webhound to work better for siloed sources and larger datasets.

Regarding account deletion, that is also a fair point. So far we've had people email us when they want their account deleted, but we will add account deletion ASAP.

Criticism like this helps us continue to hold ourselves to a high standard, so thanks for taking the time to write it up.

mfkhalil · 2025-09-26T16:22:11 1758903731

Could you share the session url via the feedback form if you still have access to it?

That's really strange, it sounds like Webhound for some reason deleted the schema after extraction ended, so although your data should still be tied to the session it just isn't being displayed. Definitely not the expected behavior.

mfkhalil · 2025-09-26T09:19:20 1758878360

Accuracy-wise we think it's almost there but probably still a few iterations away from being perfect. It's great at eliminating a lot of the collection time though.

Interestingly, we're working with B2B clients right now where we use Webhound to curate and then act as the "validation" layer ourselves. The agent lets us offer these datasets way cheaper with live updates, but still with human oversight.

bey0nder · 2025-09-26T12:33:32 1758890012

The indian company I mentioned earlier mainly had exhibitions and events as clients. These clients usually need huge datasets rather than just a few leads, which makes them a good target market for a tool like yours.

mfkhalil · 2025-09-26T09:16:02 1758878162

Thanks for testing it! That's definitely a miss, sounds like it got confused about what you were looking for and went after board member pages instead of the actual meeting/document sites.

We're working on better query interpretation, but in the meantime you could try being more specific like "find BoardDocs or meeting document websites for each district" to guide it better. Also, you can usually figure out how it interpreted your request by looking at the entity criteria, those are all the criteria a piece of data needs to meet to make it in the set.

mfkhalil · 2025-09-26T09:13:04 1758877984

Did this resolve itself? If not shoot us an email at team@webhound.ai and we can get it figured out.

mfkhalil · 2025-09-26T09:12:14 1758877934

Thanks, fixed!

mfkhalil · 2025-09-26T09:08:17 1758877697

Fair point, most of our users have come from referrals/word of mouth so it hasn't really been an issue for us, but you're probably right that we should have more information on the landing page

cco · 2025-09-27T08:43:56 1758962636

Oh definitely not trying to make a point, I'm really just curious about how it is working out in case it is something I should update in my recommendations to customers.

Seems like y'all are doing well with it!

mfkhalil · 2025-09-26T09:07:26 1758877646

We have, 2.5 Flash is about as small as we've been able to go while still delivering consistent results.

mfkhalil · 2025-09-25T22:17:12 1758838632

Yep: NextJS frontend, NodeJS backend, Gemini 2.5 Flash LLM, Firecrawl for crawling, self-hosted SearXNG for web search, and fly.io for hosting. Beyond that everything else is built internally, we don’t use many frameworks.

rick1290 · 2025-09-25T22:57:27 1758841047

What about for the db? Prisma? Postgres?

mfkhalil · 2025-09-26T09:12:23 1758877943

Supabase

mfkhalil · 2025-09-25T20:10:40 1758831040

Sorry about that. If you tell it to restructure the schema and search plan around MCP as model context protocol it should work. The agent can get stuck on its initial interpretation sometimes.