Because the client often has actual knowledge of their design and the places where they want force to be applied to find weaknesses, because they're trying to evaluate the results with regards to specific outcomes, not every possible open-ended question you can think up. On top of that there is a reasonable limit in terms of time/money/staff/resources that can be spent on these kinds of audits, etc. For example, if you're using a cloud provider it's not like you're going to pay them infinity money to compromise GCP over the course of 9 months through a background operator compromise or some nation-state attack. You're not going to pay them to spend all day finding 0days in OpenSSL when your business runs a Django app. You're going to set baseline rules like "You need to compromise our account under some approaches, like social engineering of our own employees, or elevating privileges by attacking the software and pivoting."
It's mostly just a matter of having a defined scope. They could of course say "You can only attack this one exact thing" that makes them look good, but this is true of many things.
Defining the threat model is standard in the infosec auditing/pentest world, FWIW.
> If Mullvad dictated how to do things or imposed limits on the reach of the testing, the results are worthless anyway
That's only true if your threat model is "literally every possible thing that could ever happen", which is so broad to be meaningless and impossible to test anyway.
Computer programmers also do not typically design their programs under the assumption that someone stuffed newspaper between their CPU and heatsink and it caught on fire. They work on the assumption the computer is not on fire.
It's mostly just a matter of having a defined scope. They could of course say "You can only attack this one exact thing" that makes them look good, but this is true of many things.
Defining the threat model is standard in the infosec auditing/pentest world, FWIW.
> If Mullvad dictated how to do things or imposed limits on the reach of the testing, the results are worthless anyway
That's only true if your threat model is "literally every possible thing that could ever happen", which is so broad to be meaningless and impossible to test anyway.
Computer programmers also do not typically design their programs under the assumption that someone stuffed newspaper between their CPU and heatsink and it caught on fire. They work on the assumption the computer is not on fire.