Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The Bayesian Geometry of Transformer Attention (arxiv.org)
4 points by samwillis 22 days ago | hide | past | favorite | 1 comment


Higher level overview and links to the other related papers: https://medium.com/@vishalmisra/attention-is-bayesian-infere...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: