Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Introduction to FPGAs (smist08.wordpress.com)
145 points by ingve on Feb 6, 2023 | hide | past | favorite | 67 comments


For anyone interested in FPGAs, I’d recommend one that can use the excellent open source F4PGA toolchain: https://f4pga.org/

I have an orangecrab board with a lattice FPGA, it’s super cool to be able to run a single makefile and build a riscv cpu and buildroot based linux for it.


That's awesome. I ordered an Orangecrab to start learning FPGAs with and it just arrived :)


Exciting!! This was the project I used: https://github.com/litex-hub/linux-on-litex-vexriscv


>The Xilinix development environment is Vivado which supports writing and compiling your HDL along with creating testbenches and running simulations.

I tried getting into FPGA development, played around a bit with simple Verilog implementations then got a cheap FPGA board an pretty much failed at Vivado. That tool is completely unusable. It makes me think of 90s Visual studio where you have to jump through 5 forms and wizards to generate a broken project that wouldn't compile or run on your board.

Are there shell tools for FPGA programming you can just set up with a Makefile or something? Having to use a GUI for stuff like this seems silly to me.


Check out https://github.com/olofk/fusesoc. It gives you a command line build flow that can drive Vivado (along with many other eda tools via edalize https://github.com/olofk/edalize) without having to touch the GUI (though you might want it for programming the board, though FuseSoC can do that too).


Vivado can be used with tcl commands without looking at GUI. That’s how things are automated. It might take a while to understand what’s happening in background though: https://docs.xilinx.com/v/u/2019.2-English/ug835-vivado-tcl-...

Don’t worry, it’s not for everybody. It looks like other code, but it isn’t your casual code. I do it for more than decade, colleagues from other groups come me, I teach, they try and never come back.


Ironically, Xilinx’s tool chain is probably the best of the large FPGA manufacturers. It does leave a lot to be desired compared to software tool chains.


Yep. For whatever reason, hardware engineers seem to be suckers for punishment and seem to be willing to put up with whatever terrible fork of Eclipse every HW vendor wants to hoist on them this year. In fact they seem to demand it.

That said, all the stuff inside the Vivado pkg is scriptable through Tcl.

When I was playing with this stuff... Between that and Fusesoc + some hooks with CMake + Verilog mode in CLion I was able to get a fairly reasonable working environment that wasn't totally awful. Still had to dump out into Vivado here and there to configure various "IP" blocks (and then take the generated Tcl and clean it up and import it into proper source control)


It's possible to run the Xilinx tools using a Makefile, If you don't want to use Xilinx tools at all there's Yosys but you'd have to check if it supports the device you're targeting.

There's an example of using a makefile for Vivado here: https://github.com/hdlguy/make_for_vivado


Yes and no. You can stitch a toolchain together using these tools: opencircuitdesign.com, yosys, openlane

But if you just want to get up and running with a dev board, use Vivado (for xilinx) and Quartus for altera. Neither are great. For Vivado, use a release that ends in a X.4 unless you want to bang your head against the wall for a few hours.

If you are not liking the graphical flow for Vivado, you can definitely just write your VHDL or Verilog and compile it without the need for "block diagrams" ,etc. Vivado calls it the "non project" flow I believe and thats what most of us in industry use.

Most people in industry use the Vendor tools. A few years ago you could get 3rd party tools but they weren't much better. The open source toolchains are coming along but still not for the beginner. The vendors will make their libraries more like foundry PDKs soon though and open source 3rd party tools will hopefully finally take off.


I really like vivado. Compared with ASIC EDA tools it's amazing. Perjaps it's my EE and ASIC deign background that colours my impression of it, but I've found it works really well. I especially like using the elaborated design view for exploring my RTL structure. What specifically did not work for you?


The proprietary tools are painful! I never really used FPGAs outside of school until there were other options. Look at the open source yosys/nextpnr toolchain. Your best bet is to use lattice ice40 or ecp5 FPGAs with it, they’re very well supported.


Yes, yosys + nextpnr for example. But you have to pick an FPGA that is supported.


Yosys is nice, but you aren't going to get around using vendors tools for configuring IP unfortunately. Vendors stopped supporting 3rd party tools a few years ago primarly because it was so hard to integrate the IP configuration. For instance, the last time I used Synopsys' tool (Synplyfy) I had to always have ISE or Vivado open in another window to configure IP.

Of course, for many things, you might not need to configure vendor IP. Examples where you most definitely need to are 1) Filters, 2) FFTs, 3) DDR controllers 4) High speed transceivers. In some cases you can configure vendor IP with language templates.

If you don't need these components, then Yosys will be good for you.


I don't understand why FPGA dev tools are so, so incredibly terrible. 100GB+ install size, hour-long compilation for trivial projects, random glitches you can only fix by restarting the IDE, other random glitches that corrupt your project files (hope you keep backups!), clunky slow GUI that randomly crashes to desktop, the requirement that you pay $x,xxx/year to use a $xxx dev board.

It was a fun hobby for awhile, but I gave up and left it behind.


The huge installation size must be related to the size and quantity of FPGAs that are supported.

It wouldn't surprise me if those 10 million LUT FPGAs end up bloating it up for the every day hobbyist.

Some form of modularity would be welcome.


They are modular. At least, last time I checked. You can pick and choose which FPGAs you want to support when you install the vendor's tools.

But even if you remove support for all but one model of FPGA chip, it's still a very large install. And if you expand that to one entire FPGA family (maybe 100 or so chip models), that jumps it up by 50GB easily. So 500MB per FPGA model. And these models are highly similar within the same family, they just have different numbers of LUTs and layouts.

My hopeful theory is that they're doing some kind of per-device precomputation that helps accelerate placing & routing. The theory I actually believe in is that they're just duplicating lots of data unnecessarily, and don't see any value prospect in fixing it.


Yes, Xilinx calls it batch mode. It's much better, but still sucks.


Pretty good but one correction

>To build a CPU, you need a few more elements than basic logic gates, namely memory and a way to synchronize everything to a clock, but this is a good starting point.

You can actually make all that with logic gates! Two NANDs with each's input connected to one of the other's output makes the most basic sort of static memory element. 4 NANDs and an inverter give you a D-latch that lets you synchronize your circuits to the clock signal. You can use fancier techniques that don't correspond to these so well but these are actually all you need.


Yep. You can even build the clock oscillator itself with chained NOT (or NAND) gates in a feedback configuration.

It's not a great idea for a production machine but it's much more practical than the software analogy of using a Turing machine for everything.


> 4 NANDs and an inverter give you a D-latch that lets you synchronize your circuits to the clock signal.

A D latch would be enabled all the time the clock signal is high, so it is not suitable for synchronizing your circuits to the clock signal. What you need is for the device to store the input at the instant when the clock signal goes from low to high. (Using the tech terminology: the 4-NAND D-latch has level triggering, while for the clock you want edge triggering.)


You can build an edge triggered flip flop from just NAND gates. Real world gates have delay, which means that you can generate the enable signal by using a NAND gate to invert the clock and generate a pulse on the edge of the clock by ANDing that with the clock.


Yeah, but you can't make DRAM cells (although you can emulate them with a lot more hardware).


(and you don't need DRAM to make a CPU)


But you can make SRAM, that's the OP point. You don't need any specific type of memory, you just need something that is capable of storing bits.


I've been learning a lot from the https://zipcpu.com/ blog lately, and I'm just starting to play around with formal verification after reading all of the recommendations for that from the blog.

SpinalHDL has been very nice to use so far instead of Verilog: https://spinalhdl.github.io/SpinalDoc-RTD/master/index.html It even has simulation and formal verification workflows built in. In the simulation you can wiggle the bits on your ports using Scala, so you can code an emulation of any peripheral or the like that you want and have your design use it. (You can also do the same thing using C++ if you use Verilator directly instead.)

You can code a bridge between a serial port in your simulated design and a TCP port, and then write a second program to bridge a real serial port to a TCP port the same way. You can then write tools that connect to the TCP port and then use those same tools against both your simulated design and your design in hardware.


Are there ANY tasks for which I could get a significant speed advantage on a cheap FPGA (including I/O with computer to communicate result) compared to just using a CPU? Are there any compared to using CPU+GPU?

Edit: By cheap I mean something like in the article or a bit more expensive, for sure < $1000. By CPU I mean something like an M1. By GPU I mean something like an Nvidia 2080Ti.


An FPGA is useful for high-speed hard-realtime designs with input to output reaction times measured in nanoseconds.

In retrocomputing for example they're useful for building accelerators that bolt much faster CPU's of a different model (like 68060 accelerators for an Amiga) or implementing a new graphics card with HD resolution, HDMI, and SDRAM controllers like the ZZ9000 https://shop.mntre.com/products/zz9000-for-amiga-preorder


I did a project for a laser cutting machine a few years ago. Read two pulse trains, do some floating point calculations, and output another pulse train, according to a set of parameters transmitted from another controller and updatable in real time.

Less than a microsecond of latency without even trying and 100% predictability.

If needed, you can place more processes on the chip, all running in parallel with zero interference between them.


Sounds cool! Yes, well-specified real-time tasks seem to be a great fit for FPGAs.


Probably, but FPGAs/reconfigurable logic can't be clocked at CPU speeds and usually have a limited number of floating point units and other "hard" resources. At the higher end, yes definitely, especially packet switching, sniffing and other tasks that don't involve parallel ALU/FPU calculations. At the lower end FPGA's can still shine on applications that have precise/low-jitter timing constraints. For example, sampling an analog signal for frequency analysis, actuating a thing within X nano/microseconds of some event, regulating clock drift with the pulse-per-second GPS signal, etc.


Insofar as an FPGA can have a customized pipeline to deal with an operation without the overhead of your CPU's instruction pipeline? Yes. Most any task can be sped up, but ultimately whether or not an FPGA is right will depend on how much that FPGA costs out the door vs a CPU, and how much you optimized the algorithm for your chosen task. With proper engineering an FPGA can handily beat a CPU in many cases... but it's going to depend on the task, the FPGA in question, and what CPU it's going up against. As with all things, it depends on the particular task and how much you have to spend.


First of all, not everything has a big CPU: there are countless small embedded devices which have e.g. just an FPGA or maybe an FPGA and a microcontroller. Example: if you look into e.g. a MOTU AVB audio interface, you will only find one or two FPGAs (and no microcontroller/"CPU"/…) which handle real time Ethernet, USB, the audio DSP algorithms, the ADCs/DACs, the web server with the UI, and everything else that's in this audio interface.

FPGAs give you the ability to process a lot of data in parallel, with very low latency. As soon as you make use of this, a CPU has a very hard time to compete. If you have e.g. radio data from an SDR, you might want to process it on an FPGA → if you look at various SDR modules, you'll find an onboard FPGA to do exactly that. If you want to process video signals with low latency and with low power consumption, again, you might want to do it on an FPGA → look at video interface cards from BlackMagic Design and most of them will have an FPGA on board. If you have complex mathematical models to emulate e.g. some vintage analog audio hardware, which would create some serious CPU load on a PC, you might want to do it on an FPGA instead → this is what some companies like e.g. UAD do with FPGA based accelerators. If you have high-speed interfaces, like e.g. in a network router/switch, you might want to implement the packet processing in an FPGA → this is what some network equipment does, unless it uses an ASIC. There are countless applications for FPGAs.

Sometimes FPGAs are also the prototype "playground" where you can validate your design before you build an ASIC.

And finally, many FPGAs aren't really that expensive, at least if you ignore the crazy price increase from 2020+. The FPGA alone from the Basys-3 development board from the article (an XC7A35T) costs something around $20-25, at least if you don't need the exact same package. Of course that's only the FPGA and you still might want to add some external configuration flash, RAM and connectivity, but that's still very cheap. To give you an idea, a 256MB DDR3 RAM chip costs maybe $3-5. If you want to connect this to your computer, you could use PCIe, which is directly supported by these Artix-7 FPGAs. Of course there are much bigger and more expensive FPGAs available, but I think you can imagine already that you can do a lot with a $1000 hardware budget.


>And finally, many FPGAs aren't really that expensive,

https://www.digikey.de/de/products/detail/lattice-semiconduc...

This price is ridiculous. I also know you can get 640 Lists for slightly more but I can't find the link.


The primary benefit of FPGAs is not to be an alternative type of computing hardware to cpus or gpus but to do things that neither of them are particularly good at. Mainly producing some output for some input or series of inputs with extremely low latency (like in the single digit nanoseconds). This is what ASICS are usually used for but FPGAs come in handy when it doesn’t make sense to spend a million plus dollars spinning your own asic. FPGAs are good for things like building digital transceivers, extremely precise synchronization systems, or industrial/scientific/medical applications where you need to sequence various effectors/sensors/etc. with extremely tight timing.

Since they are just general digital logic machines they can be obviously implement a cpu and are great are parallel computation, but in most cases cpus and gpus are much better at general computation because they’re specifically designed for that task.


This will really change with your definition of cheap, which CPU/GPU you compare it to and most importantly the algorithm. A recent example is bitcoin mining. Which AFAIK used to be done on FPGAs before they moved to ASIC.



I tried getting yosys (and related) working on my M1 macbook so I could play around with an icestick fpga I have, but could never get it working. There we software installation issues that I couldn't get past and don't remember the details of. Has anyone gotten a yosys (and related) setup going on an M1 mac?


There's a darwin-arm64 asset for https://github.com/YosysHQ/oss-cad-suite-build/releases at least. Installation is just 4 steps (see the readme). It just worked for me on Windows and Linux at least.


I'll check out those release downloads. IIRC, I think I had been installing via homebrew, and that wasn't working out.


Two historic related sites of interest:

https://opencores.org/

https://www.fpga4fun.com/


Note, the included schematics of a DTL (diode-transistor-logic) gate is wrong on many levels and isn't even something we have used in the last four decades. The CMOS implementation _is_ what we actually use and for drive strength 1 it's literally just this (which isn't hard to understand): https://commons.wikimedia.org/wiki/File:CMOS_NAND.svg


It's not supposed to be DTL though, it's RTL...


Thanks, my slip up. The rest stands, RTL hasn't been used since before most people here were born.


RTL means something different in FPGA/ASIC-land (Register-Transfer Level.) It simply refers to the practice of low-level 'coding' in an HDL, as opposed to even lower-level schematic or gate-level entry, or higher-level synthesis based on IP core generators or traditional programming languages.

See also https://electronics.stackexchange.com/questions/69022/rtl-vs... .


With only 17,576 different possibilities, TLAs are bound to alias. RTL in electronics can mean both register transfer level and Resistor-Transistor-Logic. If you are old enough, you will have used both.


Exactly. I not only missed out on the original RTL, but I never even had to use a 74xx TTL part number that didn't have at least an 'LS' or an 'F' in the middle.


My Big Plan is to replicate what the UTokyo CPU project asks: Write a custom CPU and synthesize in a FPGA board -> port uv6 -> write a compiler for a subset of C -> write and run some programs


That’s one of the fun university projects people can do. Small groups use whole term to replicate some historical CPU on FPGA. Your project sounds like one year part time if also writing report is included.


Yeah since I'm taking some couses for the next few years I can work on the project slowly.



Yeah! I can see how fun it is to anyone who has a bit of interest in the inner working of a computer.

Now that said, I need to pick up the RISC-V softcore introductory course on edx and finishes it.


While these things seem fun to play with, what are fun projects to actually do with them? I'd buy one, but I know a few people who have one, garhering dust.


Anything that would make a fun ASIC could be turned into an FPGA project. Something like Ben Eater's GPU project, which he did on a breadboard, could be implemented on them, or you could make a toy CPU with RISC-V, or create a softcore (faster than a software-simulated, slower than actual hardware) version of any number of older processors.

Edit: Link to Ben's GPU project: https://eater.net/vga


I have an FPGA and I've loaded up a sample that outputs a VGA signal[0] but the output is only based on hardwired patterns, not driving by a frame buffer or CPU. I'm not sure where to go next in order to use the memory chip on the dev board (I have the Digilent Arty [1] as well as a MiSTer FPGA setup) or have some sort of frame buffer which I can use with a CPU that is also running in the FPGA. I'm 90% through building Ben Eater's 8 bit cpu on breadboards (just have to add the control logic) and I did the 65C02 project of his just before that, so I'm definitely in the market for learning where to go next with it. My dream is to have a computer that I have completely design control over :)

[0] https://digilent.com/reference/learn/programmable-logic/tuto... specifically

[1] https://digilent.com/reference/programmable-logic/arty/refer...


I think a fun project would be to emulate old computers (i.e. i486, Sgis, SUNs)


You can find links to a lot of old computers and game consoles implemented in FPGA here: - https://github.com/MiSTer-devel/Wiki_MiSTer/wiki/Cores


I habe a few lying about. My fun project is to implement a usb CCID (smartcard) device to create a pkcs11comiant HSM from scratch (FYI the ESA dis this on MAX10s for one of their satellites or probes). Assuming you dev environment is not compromised. USB 1.1 and USB 2.0 have free/OSS IP cores.

Personally, I do not see point in implementing a softcore Cpu unless design absolutely requires it.

Separately, FPGA makers has shifted focus a bit from facilitating hardware design to providing coprocessors/accel cards.


Question to the community: At some point, I may want to work towards implementing a small tensor-core (or a vertex/pixel shader as in "GPU implementation of neural networks," Oh&Jung, 2004) for a toy ml project. Do you have some materials to recommend that would teach me how I should/could proceed?

Thanks a lot.


One good approach could be to base the architecture on the TPU v1 from [1]. There are also open-source accelerators you could get inspiration from, for example [2][3]. If you want to do less work/not hand code the RTL yourself then you could look into methods for automatically mapping OpenCL to an FPGA accelerator architecture (or a service like [3] provides pre-designed architectures for multiple FPGAs).

[1] https://arxiv.org/abs/1704.04760

[2] https://github.com/jofrfu/tinyTPU

[3] https://github.com/tensil-ai/tensil


Thanks!


Tensor cores are simply multiplying matrices. Relatively easy to do with a compute shader.

Here's an example of such shader in D3D11: https://github.com/Const-me/Whisper/blob/master/ComputeShade...



Nice I bought a crab recently wanting to use it


Is there any algorithmic problems for FPGAs or algorithms that need implementations?


It's been a while since I worked in this space but ~10 years ago I would have said TCP termination. It is a pretty big ask though. :)

One issue in this area is that the underlying hard logic is limited, and differs from vendor to vendor and product to product. So a nice free IP core might exist for something, say for FFT, but the complete design cannot fit on your chip's LUTs, or it may not have enough buffer blocks or clock multipliers or buses or something to run a given core, or everything "fits" but it has to be run slower to meet timing constraints.

That's not to say it's always like that.

There is probably a need though for various implementations of some algorithms, with different topologies, or for niche environments etc. Just as in embedded development we need various implementations of FFT in floating point, fixed point, using static memory, etc.

And it's not algorithmic really, but peripheral drivers contribute a ton to the community. Being able to e.g. plug a certain e-ink display into a widget without having to write the driver yourself.

opencores.org might be of interest to you (github.com/klyone/opencores-ip).


Thank you




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: