No, it doesn't shift across 64-bit boundaries. Take a look at the gcc output in your link
```
movdqa xmm0, XMMWORD PTR [rdi]
movdqa xmm1, xmm0
psrlq xmm0, 10
psrldq xmm1, 8
psllq xmm1, 54
por xmm0, xmm1
movdqa xmm1, XMMWORD PTR .LC0[rip]
```
That's a lot of psr and psl instructions for a "single 128-bit wide shift"...
That's a lot of psr and psl instructions for a "single 128-bit wide shift"...