Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Offline Voice Assistant on a Microcontroller with 192KB RAM (picovoice.ai)
92 points by kenarsa on Dec 13, 2022 | hide | past | favorite | 31 comments


This is just an advertisement from an account posting only about this product, in the form of an article showing a demonstration of a free-tier of their offering only working on that specific board. So it is a demo you are allowed to reproduce yourself at home.

Good luck for your company, it sure is hard to make one grow.

Edit: removed the word misleading before advertisent. And explained what i meant after the word product.


Yeah, I post about my company that I founded and I am super into it. Which part of this is MISLEADING? The fact that I care or are you saying Picovoice's tech doesn't work? I made the latter easy cause you can now go and try it without me in your way. You comment is misleading.


Man these people are driving me nuts.

When you bootstrap a small business, most likely nobody knows about you. Everyone boasts about how awful ads are so you try to please them and get your product in the eyes of users by doing something useful: write technical articles, provide free/open-source solutions to real problems etc.

And then someone comes in the comments just to point out what a bad person you are for doing this.

What is one supposed to do? Publish a website and just hope that people will end there somehow?

Good job in getting this working on the STM32 board! I have mostly given up on voice assistants because of the latency and rigid phrasing required.

Your solution might seem rigid too for others but if we can use our own phrases and make it respond instantly because the model is local, the interaction can finally become as easy as pressing a button without touching it.

You've taken on a tremendous task, I really wish you succeed!


I'd venture a guess that the issue they have is with the fact that the poster ONLY posts about their product and nothing else. I can see someone being annoyed that a community member is only using the platform to hawk their project and not providing any value to the platform otherwise. FWIW, I'm making no value judgement here, just observing the likely friction.

Edit: Just to be clear, the poster _literally_ only ever posts about their product and only comments on their own product posts.


Because you say it's offline, but require an access token and to quote from your FAQ:

> Picovoice engines call home servers to stay active and report the consumption for billing purposes only.

That's not offline.


Have you noted that the board has no connectivity chip? If I had a way to connect to internet without the required chip I had a better story to tell. You snippet of FAQ is correct for all other platforms we support aside from microcontrollers ...

[1] https://www.st.com/en/evaluation-tools/stm32f4discovery.html


I have to agree that that is offline :)

I use a picovoice wake-word on my DIY offline Assistant myself (RPI-based) and I was tempted to dig up a STM devkit from somewhere but then I remembered that I probably would want a microphone array for it if it works well and I don't see a good way to integrate that.


With how hard something like this is, we should be looking at the positive side.

Picovoice here solves both the issues of hotword/wake-word detection and intent extraction. This looks like something you could build on top of ARM's [keyword spotting program](https://github.com/ARM-software/ML-KWS-for-MCU) and the wake word services listed in [Rhasspy's docs](https://rhasspy.readthedocs.io/en/latest/wake-word/#raven)

But, to implement something like this from scratch would take a good while.

Also, here are some Automatic Speech Recognition toolkits (which won't run offline on a microcontroller) out there. These are useful to pipe the data into a program that deals with intents (something like [RASA](https://rasa.com)

(Require Internet) * [Deepgram](https://deepgram.com) - I believe they build upon OpenAI's Whisper model and have their own custom models too * Google Cloud / Microsoft Azure / AWS / IBM Watson

(Can be run Offline) * [OpenAI's Whisper](https://github.com/openai/whisper) * [nVidia's NEMO](https://github.com/NVIDIA/NeMo) * [PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)

When you see how complicated the space is and how many ways you can actually shoot yourself in the foot. This post starts to look a wee bit better.


Perhaps I'm missing something, but why does it need an access key if its offline?


From their pricing FAQ[1]:

  19. Why do Picovoice engines require an internet connection?

  While data is processed offline, locally on-device, Picovoice engines call home servers to stay active and report the consumption for billing purposes only.
[1] https://picovoice.ai/pricing/


So... not offline.


That board doesn't have connectivity! If we had a way to connect to the internet without a connectivity chip I would have had a more exciting post!


Why if it is fully offline then why does the micro-controller need a license key? What does it use the key for? How can any monitoring/analytics take place?

I read the article and was interested but after clicking around the site I was thoroughly confused about billing/pricing/metering.


Picovoice runs on almost anything: web browsers, mobile, desktop, single board computers, and microcontrollers. For the platforms that have connectivity (i.e. almost anything aside from microcontrollers), we do call home for license management. This helps us keep the `Free Tier` free for personal users, hackers , and skunkworks projects, but make sure we get paid by enterprise customers with deployments at scale [1]. On a microcontroller like the one in this tutorial, there is NO connectivity option. Hence, in this specific case it is 100% offline with no license management. In other cases voice recognition is 100% offline but the call home for license management needs connectivity.

[1] https://picovoice.ai/pricing/


Then please explain the

> Picovoice engines call home servers to stay active and report the consumption for billing purposes only.


It sounds like he's saying that the key is hashed offline to check for validity, and doesn't actually verify it via server.


Yes, the board does not have a connectivity chip of any sort [1]. Even if one really wants to there is no way to connect to anything from this board.

[1] https://www.st.com/en/evaluation-tools/stm32f4discovery.html


I think that's because the libraries it uses are non-free (I haven't checked though). If I'm correct, that's more like a license key in that regard.


Although interestingly enough, the README for the linked repo (https://github.com/Picovoice/picovoice) states that "The SDK is licensed under Apache 2.0 and available on GitHub to encourage independent benchmarking and integration testing." While source isn't provided and only compiled binaries are provided, that should give you permission to flip some bits to skip a license check. However, they may make you agree to a stricter license to use their online training tools.


I'm sure smart people can find a way to hack this! But check my other comment, the chip does NOT have connectivity module. Don't take it from me. Read the ST's product spec.

[1] https://www.st.com/en/evaluation-tools/stm32f4discovery.html


Because all the training is made on a proprietary system (and the prices are high), the MCU is just running the model.


have you noted that there is `Free Tier` that cost you $0? You can train using that. For this tutorial the cost of board ($20) is all you need to pay. Same for personal projects and even small skunkworks projects within companies. Picovoice makes money from large-scale deployments done by device makers

[1] https://picovoice.ai/pricing/


What's their trick for its voice recognition on something so small if something like openai's whisper requires torch, which is almost a gig in size?


context, context, and context!! Then deep learning tricks and efficient implementation.

[1] https://picovoice.ai/blog/end-to-end-intent-inference-from-s...


There are Whisper TFLite ports with model 40Mb size and tflite itself is about 3Mb. So nowhere near a gig. https://github.com/usefulsensors/openai-whisper


Thank you!


Picovoice technology is really nice! I happily recommend it to clients looking for lightweight ASR.

You might not understand but there is a huge amount of work behind this simple demo.


This is nice! Another use case for my STM32 boards… Definitely interested on the implementation too!


would love to check out what you build with it :) I've been constantly and pleasantly surprised but what people can build given this tech


I have recently found the MAX78000 MCU which has some nice features that can be used for recognising trigger words (but also object recognition etc.) I have ordered the feather board to try it, but haven't had the opportunity to test it, yet. From a cost point of view, the STMs are certainly cheaper, but the MAX should have a lot more performance for doing tasks besides the word recognition.


"1. Sign up for Picovoice Console."

0. Thanks, but no thanks.

No offense, but I expect an offline service to be really offline from start to finish.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: