Since the post only specified polynomial time and placed no constraint on space, this is actually solvable (e.g., it's not too hard to come up with an polynomial implementation that uses a number of VMs that grows exponentially with the size of the set).
But the getacoder poster only specified that it had to be done in polynomial time, and didn't clarify whether that had to be total CPU time (i.e., "work") or wall clock time.
Given that underspecification, it's perfectly justifiable to use an exponential amount of resources to achieve the goal.
Still, regardless of whether P!=NP, it is known that P is contained in NP which is contained in PSPACE. These classes are defined for certain computational models. If you change the computational model then those definitions might not make sense.
In the real world, taking exponential space implies taking exponential time. The fastest you can transmit information is the speed of light, so it will take exponential time to communicate with the furthest pieces of your exponentially large system.
Depends on the problem and how you distribute data.
e.g., each node can generate its own subset to check, so you only need to distribute the program and input. That can be done using a multicast tree of some kind which brings distribution costs back down by a log.
But if you only have a finite amount of hardware available (which is probably the case), each node's data set grows exponentially with input size.
If your node population increases exponentially with input size, you're still running into the speed-of-light delay problem mentioned above (cube root of exponential is still exponential). If your solution is polynomial time with polynomially many nodes, your space cost is also polynomial.
This could be said about almost all "polynomial time" algorithms though. The bigger the loop, the larger the variables you'll need (in general), and the more time you'll need to communicate between the pieces.