IMHO Its most likely a variant of the AMD GPU or APU, because it would take some...

Symmetry · on July 24, 2017

That's doubltful. AMD is happy to provide bespoke chips for big customers (like the XBox) but what you want for inference is to multiply entire int8 matrixies in hardware dataflow. The silicon providing float32 support in an AMD GPU isn't needed and their dataflows only extend to vectors rather than matrices. In the case where the matrix you're multiplying is much bigger than the execution hardware you can throw at it this doesn't make a big difference in efficiently using adders. But using vectors rather than matrices is hugely wasteful of register read ports if you're always doing matrix operations and don't also have to excel at operations which are only vectors.