Although it is priced competitively, ~ same as Nvidia Jetson Nano ($125) it seems underpowered when compared to Nano. Nano has 4GB RAM, 128 CUDA cores and can 4K encode/decode 30/60 FPS and also handle multiple streams when compared to 15/15? of BeagleBone.
Perhaps the Vision Engine is better for computer vision tasks, but having to use TIDL suite when compared to Jetson Nano's JetPack with tools which we use regularly on bigger GPUs is going be a hard compromise to make.
Jetpack includes CUDA 10, TensorRT, OpenCV 3.3.1 etc. by default and PyTorch is available separately for Jetson Nano. Besides the community is very active.
I bought Nvidia Jetson Nano Dev Kit in India as soon as it became available for (~$125), in USA it's bit cheaper at ~$110 (incl. shipping).
Please note that there is also Nano Module, which is a SOM (System on Module) with 16GB EMMC storage sold for $150. I think this is intended for clusters.
Something worth being aware of: the Jetson Nano development kit that's currently available has an early version of the Nano module which isn't pin-for-pin (finger-for-finger?) compatible with the current production version sold separately.
The production Nano module (which you can also now buy) won't work in the dev kit carrier! Similarly, the dev kit Nano module won't work in a carrier compatible with the production Nano module.
Apparently they'll refresh the development kit to match the production Nano module later in the year, but for now it's a big gotcha if you're designing hardware using the dev kit.
Correction, Jetson Nano has Quad-core ARM A57@1.43 GHZ. They are the Big cores of most big/LITTLE configuration with better L1/L2 cache(48KB/2MB unified in case of Nano) than A53 cores.
It's not just you. Any trademark expert will tell you that you only have to use the symbol once, in the first or most prominent place where you use the trademark.
Also when you use other companies' trademarks, you should have a notice of who the trademark owner is, which this press release does for Sitara but not for the other trademarks used in the piece.
Search the page for "disclaimer", or just read the whole article - it has a lot of good trademark advice. It also somewhat disagrees with what I said about only using the trademark symbol once - it says to use it the first time and "occasionally thereafter".
So here is an interview with an IP attorney who suggests just using it once:
> A disclaimer is a statement that you include in your application to indicate that you do not claim exclusive rights to an unregistrable portion of your mark. For example, if you sell shirts and your mark includes the generic word "SHIRTS," you could not object to someone else also using the word “SHIRTS” as part of his/her mark. The word is still part of both marks, but no one is claiming exclusive rights in that word, because it is an 'unregistrable' component of an overall mark. (See below for typical examples of unregistrable matter that must be disclaimed.)
> A disclaimer does not physically remove the unregistrable portion from your mark or affect the appearance of your mark or the way you use it. It is merely a statement that the disclaimed words or designs need to be freely available for other businesses to use in marketing comparable goods or services.
I'm speculating but, it's likely due to board size constraints and pricing.
AM57xx SoC have 2 different DDR3 memory controllers called the EMIFs. Each one has a 32 bit data width and can support up to 2GB of memory attached. A single x16 (16 bit wide data path) DDR3 chip in 512MB (4Gb) size is reasonably cheap to buy today but doubling that size to 1GB (8Gb) makes the price go up like 4x since there's just not much volume in that size sold. In order to fit onto the BeagleBone board size, only 2 DDR3 chips could fit, so only 1 EMIF is used, and to keep costs reasonable they used 512MB (4Gb) sized chips leading to 1GB of total DDR3.
Related, the C66 DSPs and M4 cores inside of AM57xx are only able to access the EMIF interfaces over the L3_MAIN core interconnect within the SoC. L3_MAIN is a 32 bit address bus interconnect, so at most can address only 4GB of memory address space. TI's memory maps for the DSPs and M4 cores both start the EMIF addressing at 0x8000_0000 so at most the DSPs and M4 cores can only access up to 2GB of DDR3. Addresses below 0x8000_0000 are mostly memory mapped peripherals, like the GPIOs and PCIe interfaces and such. The Cortex-A15 cores are able to use an extended addressing mode and can access the EMIFs through a different interface, they don't have to use L3_MAIN (but can, but there's errata), so if you connect more than 2GB of DDR3 to AM57xx then the big Cortex-A15s can access it but no one else can, which makes it less useful/valuable for a lot of designs.
If you want a dev board with more DDR3 and AM57xx, the BeagleBoard-x15 is an option although it appears to be hard to buy just the x15 from most distributors right now as they're out of stock. TI still sells the AM572x dev kit with the LCD screen directly (https://www.ti.com/store/ti/en/p/product/?p=TMDSEVM572X), albeit for a bit more than just the x15 used to cost and quite a lot more than the new AI costs.
I think that it probably doesn't make sense to compare this to a general-purpose single board computer. It's apples and oranges.
The BeagleBone AI is aimed at prototyping in industrial automation applications. I've never worked in that area, but I wouldn't be at all surprised to see that large amounts of RAM isn't a priority for industrial controllers. Probably the software tends to be frugal with memory, because a bigger heap means more cache misses, and more cache misses mean worse latency.
A Raspberry Pi, by contrast, is mostly targeted at running a GUI and memory-hungry user applications up to and including Minecraft. It's meant for teaching kids to program and hobby stuff. It doesn't have built-in DSPs and programmable real-time units, because those are for supporting applications that fall far outside its intended purpose of having fun with Python.
That reasoning is just plain wrong. The memory usage has almost nothing to do with the field where it will be used. It will depend on the size of the model being run on the board.
The size of that model will be determined by the number of weights used. Being that industrial automation will likely use CV, that will mean the potential for a lot of weights.
It's been a long time since I've done anything in machine vision, but, at least back in the day, what I was seeing was that, compared to other uses for machine vision, industrial applications tended to stay a lot simpler: Lower-resolution images, black-and-white imaging, support vector machines instead of neural nets (let alone deep learning), all that good stuff. They could get away with it because they are able to much more carefully control the input domain - controlled lighting conditions, consistent orientation of the thing being imaged, all that good stuff. So they don't need 10^9 or more weights' worth of ocean-boiling complexity the way you would in something like self-driving cars or impressing your friends with your Imagenet performance.
And if you can get away with running an SVM on a 1-megapixel black and white image, then your weights will fit in 1GB with an order of magnitude to spare.
Ok, what you said about lower res images makes sense. Lower variation of images maybe means you could get away with less weights/more quantization-- you could afford to lose more information in the model. Maybe 1GB can be sufficient then.
There's no reason to use an SVM over a (C)NN nowadays though.
Sure there is. With an SVM, you can pick different kernels to more carefully engineer specific behaviors, what kinds of errors your model is likely to make, etc. You can get a good, stable model on less training data, which is great when your training data is expensive to produce. (A situation that I'm guessing is not at all uncommon in industrial automation.) You get all that crispy crunchy large margin goodness. Stuff like that.
I'd absolutely focus on ANNs if I were an academic researcher, because that's the hot flavor of the month that's going to get your career the attention it needs to bring in funding, jobs, etc. I'd also pick it for Kaggle-type stuff, where there's effectively no real penalty for generalizing poorly. Bonus points if you consume more energy to train your model than Calgary does to stay warm in the winter.
In a business setting, though, I would only default to ANNs if it were holistically the best option for the problem domain. By "holistically" I mean, "there's more to it than chasing F1 scores at all costs." The business considerations that caused Netflix to never try productionizing the prize-winning recommendation engine, for example, are always worth thinking about. Personally, I'm disinclined to look past linear models - not even as far as kernel methods - without strong reason to believe that I'm dealing with a curve that can't be straightened with a simple feature transformation. Complexity is expensive, and needless complexity is a form of technical debt.
You can get a good, stable model on less training data, which is great when your training data is expensive to produce
Huh? SVMs don't perform better than NNs on less training data.
I'm sorry, but the rest of what you said is out of date and wrong. CNNs work better than SVMs for CV tasks. There's no reason to use SVMs anymore for CV, and no one in their right mind does.
That's fair, I'm just a little confused because the BeagleBone AI board isn't even trying to compete with the <$100 boards. I know it's meant for very specific industrial uses, but depending on your models and how you have it setup, 1GB of ram isn't very much.
If these are meant to be nodes in a larger system I guess that makes a little more sense, but if they're meant to be more autonomous that 1GB is going to be a real limitation for certain applications.
What I saw when I was looking (a few years ago) is the Beagle bone is oriented towards the industrial market/dev board for TI Industrial processors. Totally different than the hobbyist market.
I used BB in previous projects, one thing definitely stands out for BB is that, it could be used as a product directly with a case and some certification(EMC,etc). Nvidia's Nano is more of a development platform.
Beagleboard predates RPi actually, though after Arduino, BB is arguably the very first board running a 32-bit ARM that is also open source, cheap, small, however it's overshadowed by RPi in recently years.
Yeah the Beagleboard and more generally the TI AM335x is getting old. Single core, DDR3-800, no PCI-e, no secure boot, and other bizarre limitations. It's nice that they're putting something newer out.
Having the PRUs as supporting microcontrollers can be great where timing is critical. For example, it's possible to decode signals from an AM receiver and pass them to the host, or monitor sensors and have immediate responses (such as triggering a shut-off via GPIO).
There's nothing particularly difficult about wiring a microcontroller up to a single board computer to do these jobs, I've done exactly that for reading my weather station and heating oil tank level. But it's messy and I think the cohesion of being able to do it directly on one board is a worthwhile advantage.
I think if you're a true hobbyist, the Raspberry Pi is the path of least resistance. I've done projects with both the RPI and the Beaglebone Black.
RPi is very aimed at taking some sort of existing project and modifying it, which sounds mean to say but actually lets you do some very powerful things. The average RPi user will never be compiling device tree overlays, screwing around with the bootloader, busting out the datasheet for the processor, etc. You find some project that is kind of like the one you want to do, use their hardware and low-level settings (involving device tree overlays, but so abstracted that you will never encounter that term), and then go to town on the software, probably written in something like Python.
The Beaglebone is more like a dev board for TI's SoC. You will configure every pin exactly as you want it with a device tree overlay. You will be reading the datasheet to figure out exactly how many CPU cycles a certain instruction takes on the PRU. The amount of power you have over IO and hooking them into Linux is infinite. As an example, you can control the clock signal for the entire board through an IO pin. If you are doing realtime processing and want to make something happen in the Beaglebone PRU at the exact same time as some external hardware, you have the power to make that happen. Most people do not need this.
The TLDR is that I would use the RPi for pretty much any "maker" project because it's so easy to get things working, but would use this new Beaglebone for something like a CNC machine. If you're making a CNC machine, you need a microcontroller to stop the motor instantly when it drives into the endstop (even if Linux is currently processing your mouse movement or a network packet), you need a realtime microcontroller to properly move the axes in unison (so you can cut a circle at your exact feedrate), and you need Linux to drive a monitor with the controls on it. This new board puts all the processing power you need on the same board, and has all the kernel hooks you need to communicate between your non-realtime Linux software and the microcontrollers. (It's been a couple years since I've used the PRUs, but they show up as a "remote CPU" under Linux and have APIs for bidirectional message passing, which is potentially more powerful than a serial port interface to an Arduino that does the realtime stuff.)
You can of course just plug an Arduino into a USB port to get the same effect... this is how pretty much every 3D printer ever works. I literally have this exact setup on my 3D printer; an RPi that controls my 3D printer over USB -- the RPi hosts a web interface, the microcontroller handles the realtime motor moves. If you are manufacturing something like this, having everything on one board will probably lower your costs. That is why the Beaglebone exists.
(What this has to do with AI? I don't know. Maybe it's for pick-and-place machines that need realtime motor moves and some computer vision.)
Yeah, I think robotics are a big focus with this board. Using the PRU for hard PWM and motor and encoder tasks is incredible, and it's just gonna be even more interesting with double the PRU resources.
I'm not overly familiar with TI's SOCs post-2010. Anyone out there with a good overview of what the Sitara AM5729 includes besides the bullet points in that piece?
And what about TIDL adoption? I've been working on the Intel/NVIDIA-grade part of the ML scale and have a few ESP32 boards to fiddle with OV2640 cameras, but very little in between except what Broadcom has been doing.
As for tooling, could not say, but as a previous comment stated - the NVIDIA offering for the same money, makes this hard to stick out for many. Though i'm sure it, as most boards do, have a niche - what that niche is beyond already invested in the beagleboard environment and comfort level, that would be the only uptick for some that i'm seeing from initial glance.
In my experience, software from hardware companies has been so reliably abysmal that it makes the "enterprise software" us SWEs like to complain about look decent by comparison.
What's so "AI" about it? It doesn't even have a TPU. Kendryte K210 has a fixed point TPU, 400MHz dual core RISC V with FPU, 8 channel audio DSP, FFT and crypto acceleration, and costs $8.90 with wifi and $7.90 without. And the module is the size of a half of a postage stamp. Runs TensorFlow Lite (a subset of ops, but good enough to do practical things).
8MB RAM is more than enough to run a quantized MobileNet, which they demonstrate by preloading object detection on it, out of the box. And the chip is real. I have a couple of boards with it, it works. I guess people just have a hard time believing all the stuff in the spec can be done for less than 10 bucks. 28nm by the way, not a joke. The company got its start in crypto mining, so this is a side gig for them.
It's much more focused on real time controls (4 PRUs and 2 Cortex M4) than GPGPU processing. It's pretty much for industrial vision applications than the AI actually.
FWIW, the SGX GPU in it were on of the first to go hog wild on f16.
And if they'd document their ISA, it's pretty amenable to being used for neural networks, wayore than the other mobile GPUs at least. It'd be a cold day in hell before they did that though unfortunately.
No. I actually looked at reverse engineering it a few years back. If you pull apart their drivers you can figure it out pretty quickly (there's actually two different ISAs, the main shader cores and a tiny little RISC esque core that marshals work for the shader cores).
But their driver/software complexity is super high to even get a triangle in a buffer or run a compute job. They have a RTOS looking microkernel running on the shader cores, and there's a ton of caching and MMU setup you have to do from the GPU side (not the main app processor). And there's a lot of caching hints and hacks that are hard to work around if you don't know the context (a lot of tables for bug reference numbers and special cased code depending on those)
If anyone from Imagination is listening, the open source community would still love your help in supporting these chips. : ) They're really pretty inside, and the world should know about the good work y'all did!
Perhaps the Vision Engine is better for computer vision tasks, but having to use TIDL suite when compared to Jetson Nano's JetPack with tools which we use regularly on bigger GPUs is going be a hard compromise to make.
Jetpack includes CUDA 10, TensorRT, OpenCV 3.3.1 etc. by default and PyTorch is available separately for Jetson Nano. Besides the community is very active.