That’s interesting, particularly since as far as I can tell, nothing in userland really bothers to make use of its GPU. I would really like to understand why, since I have a whole bunch of Pi’s and it seems like their GPUs can’t be used for much of anything (not really much for transcoding nor for AI).
> their GPUs can’t be used for much of anything (not really much for transcoding nor for AI)
It's both funny and sad to me that we're at the point where someone would (perhaps even reasonably) describe using the GPU only for the "G" in its name as not "much of anything".
The Raspberry Pi GPU has one of the better open source GPU drivers as far as SBCs go. It's limited in performance but its definitely being used for rendering.
There is a Vulkan API, they can run some compute. At least the 4 and 5 can: https://github.com/jdonald/vulkan-compute-rpi . No idea if it's worth the bus latency though. I'd love to know the answer to that.
I'd also love to see the same done on the Zero 2, where the CPU is far less beefy and the trade-off might go a different way. It's an older generation of GPU though so the same code won't work.
One (obscure) example I know of is the RTLSDR-Airband[1] project uses the GPU to do FFT computation on older, less powerful Pis, through the GPU_FFT library[2].
Seems like it would be an easy target for the government (or really anyone) to DOS, right? Presumably there's no good way for the nation-wide intranet to exclude government actors? I'm just thinking out loud; I'm glad to hear something is being done and I wish the Iranian people the best.
Lora is fine if you want to send a very short message. Its not useful for much else.
Its also not a prevalent technology compared to general.internet/mobile phone.
Organising resistance with it is the pipe dream of those who play with chips and antennas, but its not something thats going to happen when crowds and mobs form up in a situation like this. Not least because the hardware is not accessible to your average citizen.
There are real-world examples of non-internet networks being created in authoritarian regimes. One example I've read about is in Cuba [1] but I presume there are others.
Yeah, that makes sense. I’ve curious if there are sneakernet things for communicating messages between passing mobile devices? Something that uses exist hardware and is actually used in practice.
I've got to think it's easy to find starlink receivers--I know they use a directed beam but they must give off a bunch of lateral noise, right? Or does Starlink use the same frequency bands as other common equipment such that it would be difficult to distinguish starlink signals from others? If the government was motivated they could surely start finding these receivers, right?
> What does it mean that only 3B parameters are active at a time?
In a nutshell: LLMs generate tokens one at a time. "only 3B parameters active a a time" means that for each of those tokens only 3B parameters need to be fetched from memory, instead of all of them (30B).
Then I don't understand why it would matter. Or does it really mean that for each input token 10% of the total network runs, and then another 10% for the next token, rather than running each 10 batches of 10% for each token? If so, any idea or pointer to how the selection works?
Yes, for each token only, say, 10% of the weights are necessary, so you don't have to fetch the remaining 90% from memory, which makes inference much faster (if you're memory bound; if you're doing single batch inference then you're certainly memory bound).
As to how the selection works - each mixture-of-experts layer in the netwosk has essentially a small subnetwork called a "router" which looks at the input and calculates the scores for each expert; then the best scoring experts are picked and the inputs are only routed to them.
I've asked Gemini about it the other day(I'm dumb and shameless). Apparently it means that the model branches into bunch of 3B sections in the middle and joins at both ends, totaling in parameters at 30B. This means computational footprint reduces to (bottom "router" parts + 3B + top parts) of effectively-5B or whatever specific to that model implied by "3B", rather than the full 30B.
MoE models still operate on token-by-token basis, i.e. "pot/at/o" -> "12345/7654/8472". "Experts" are selected on per-token basis, not per-interation, so "expert" naming might be a bit of a misnomer, or marketing.
I’m also curious if an AI could process the screen feed quickly enough to compete in first-person shooter games. Seems like it would be difficult without extremely high end hardware for the foreseeable future?
Thank you for educating me. How does OpenCV work from the perspective of recognizing things in an image? Is there some kind of underlying model there that learns what a target looks like or not?
There's already models specifically for things like identifying players in Counter-Strike 2, including which team they're on.
Someone has even rigged up a system like that to a TENS system to stimulate the nerves in their arm and hand to move the mouse in the correct direction and fire when the crosshair is over the enemy.
How does this work? Do you give the AI read permissions on your system, or is it just running arbitrary commands?In the latter case, is it prompting you before each?
Relatedly, I've been seeing some people buying up old domains and squatting on them with AI generated content. Not even ads, but content that seems like something that might actually show up in a rare Google search query. Not really sure what the play is or why this is better than advertising the domain for sale (do registrars punish overt squatting these days?).
Iran has protests all the time, as do lots of other countries _including the US_ and France and others and they don't herald US-led regime change. But if you have evidence for your claim that the US and Israel are planning a war in Iran, that would be major international news.
Smaller than containers seems unlikely since a container doesn't have any kernel at all, while these microvms have to reproduce at least the amount of kernel they would otherwise need (e.g., a networking stack). I'm sure some will be inclined to compare an optimized microvm to an application binary slapped into an Ubuntu container image, but that's obviously apples/oranges.
Faster might be possible without the context switching between kernel and app? And maybe additional opportunities for the compiler to optimize the entire thing (e.g., LTO)?