Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Blind person here.

IMO, this announcement is far less significant than people make it out to be. The feature has been available as a private beta for a good few months, and as a public beta (with a waitlist) for the last few weeks. Most of the blind people I know (including myself) already have access and are pretty familiar with it by now.

I don't think this will replace human volunteers for now, but it's definitely a tool that can augment them. I've used the volunteer side of Be My AI quite a few times, but I only resort to that solution when I have no other option. Bothering a random human multiple times a day with my problems really doesn't feel like something I want to do. There are situations when you either don't need 100% certainty or know roughly what to expect and can detect hallucinations yourself. For example, when you have a few boxes that look exactly the same and you know exactly what they contain but not which box is which, Be My AI is a good solution. If it answers your question, that's great, if it hallucinates, you know that your box can only be one of a few things, so you'll probably catch that. Another interesting use case is random pictures shared to a group or Slack channel, it's good enough to let you distinguish between funny memes and screenshots of important announcements that merit further human attention, and perhaps a request for alt text.

This isn't a perfect tool for sure, but it's definitely pretty helpful if you know how to use it right. All these anti-AI sentiments are really unwarranted in this case IMO.

I've written more here https://dragonscave.space/@miki/111018682169530098



>Bothering a random human multiple times a day with my problems really doesn't feel like something I want to do.

Please don't feel like you're bothering us. I've had this app for years and absolutely cherished the few calls I've gotten. I get really bummed if I miss a call.


This! 100%

The people who sign up to help (such as myself) want to help. Honestly, my frustration is that I don’t get asked enough.


Have you ever needed to do something like make an appointment but kept putting it off because you just really didn't want to talk on the phone?

That can happen to anyone. Some blind people are introverts and don't want to talk to random strangers all the time.

Also, while the vast majority of volunteers have the best intentions and try hard to be helpful, you never know what you're going to get. Some are way too chatty, some offer unsolicited advice.


Exactly this.

Calling a human being is much higher-friction than just opening an app, if that weren't the case, we' all still be calling restaurants instead of ordering on Uber Eats.

I also would prefer not to call volunteers at night. This isn't much of a problem if you live in the US, as the app is segregated by language, not country, so you'll probably find somebody down under. I have an additional complication of having to deal with foreign language content because of where I live, so English often isn't good enough for me.


OpenAI announced from day 1 that GPT-4 is multimodal, so it was mostly waiting for safety censorship and enough GPUs to be available for mass rollout.

This won't entirely replace human volunteers, but these models get rapidly better over time. What you are seeing today is a mere toy compared to the multimodals you'll get in the future.

Currently there's no model trained on videos, due to large size of videos, but in the future there will be video-capable models, which means they can understand and interpret motion and physics. Put that in a smart glass, and it can act as live-eyes to navigate a busy street. Granted this will take years to bring the costs down to make that viable.


The censorship part is really worrying me here.

We had enough drama with BeMyAI refusing to recognize faces (including faces of famous people) as it were. If sighted people have the right of accessing porn and sexting, why shouldn't we? Who should dictate what content is "appropriate", and what about cultures with different opinions on the subject?


> Who should dictate what content is "appropriate", and what about cultures with different opinions on the subject?

OpenAI should dictate that, because GPT 4 belongs to them. So they decide what kind of service they're interested in offering.

There will be plenty of other powerful LLMs that can be used in the near future. Some will be more restrictive, some will be less. If you want fewer restrictions, you will be able to pick one that offers that for you.


> There will be plenty of other powerful LLMs that can be used in the near future.

Extremely optimistic take. What tends to happen is you get centralisation, and regulatory capture ensures the largest players dictate what is an acceptable to the incumbents who wish to do things differently.

I mean in theory you can go set up your own social network or video sharing site with whatever rules you like, but you should assume government regulators and big tech will attack you if you do so and believe in the principles of free speech, or simply wish to create a safe-space for conspiracy theorists.


>> the future there will be video-capable models, which means they can understand and interpret motion and physics.

Videos may not suffice. Videos are 2d, with 3d aspects being inferred from that 2d data, which is an issue for autonomous driving based on cameras. A proper model for AI training would be 3d scans rather than videos. The best data set would be a combination of video and 3d scanning. Self-driving cars which might combine video with radar/laser scanning may one day provide such a data set.

There is talk of a 3d version of Google streetview, one using a pair of cameras to allow true VR viewing. That might also be good training data as it will capture, in 3d, may street scenes as they unfold.


I’ve actually been fairly impressed with how far monocular depth estimation has come in a few years. I think that it should not be relied upon in self driving cars because they need closer to 100% accuracy and also almost no latency, and the SoTA ones are too slow to be run every frame currently. Not only that but it’s a life-or-death situation where cutting on accuracy % to save BoM costs seems ridiculous to me.

But in a higher-acceptable-latency, lower-risk environment like this, I am actually quite bullish on camera alone methods. Video understanding has come a long way.


I am a volunteer at Be My Eyes.

Just wanted to say, not talking whatever you feel you should do, but me and others I know don't feel at all like you are 'Bothering a random human multiple times a day'. I personally feel extremely lucky when I get a 'call' and am able to help.

Not much else regarding your comment, just wanted to let you know that I have had very satisfying 'calls' and I am always looking for the next one and always happy when I do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: