it's r^2 in the denominator. one idea is to think of the geometry: the surface area of a sphere is proportional to r^2. therefore if you think of a constant "influence" at a given distance, you might divide it by the surface area (to get r^2 in the denominator)
Yes, it's trivial to deduce Newton's law of universal gravitation from the divergence theorem. But to deduce that the divergence theorem applies, you need the lagrangian and the least action principle.
Conceptually the least action principle is the fundamental concept in classical mechanics, everything is derived from it, plus symmetry.
Historically this has all been done backwards. Once you have Newton's law of universal gravitation, it's trivial to prove that the gravitational flux must be constant for every enclosing surface. It's easy, in terms of mathematics required, to come up with a least action principle from that (albeit not as easy as the inverse deduction). However, if you think of Newton's laws as fundamental, it is not a very natural thing to do. Why would you do it? Lagrange was a very deep thinker to see why the least action principle is the truly fundamental thing. He was doing this way before Noether's theorem. He was doing this before the concept of energy was formalized! They knew that a quantity with what we now know as units of energy was conserved, and they knew momentum was conserved, but they didn't know what these things were, especially as no other form of energy except kinetic and potential energy was known back then. Truly a visionary thinker.