My experience with GCJ has been that raw execution speed almost never matters as long as you have the right algorithm.
In the case of the OP, for example a trivial optimization(the kind you do without even thinking in these competitions) would've brought his execution time down 10000x. Basically, you are asked about the number of "fair and square" numbers in a certain range, over and over again, in 10000 different cases. He could have just taken his solution, pre-generated all those numbers once and then ran all those test cases by counting in that pregenerated list, instead of generating it for every single test case(all 10^5 of them). It turns out, that in the whole span of 1->10^14 there is only 39 such numbers and it would've taken his program probably less than a minute to generate all of them(I'm being generous here, since it ran 10^5 test cases in 53min).
In fact, if you read the problem analysis later provided by google you will see that in fact you were expected to make this optimization to solve the problem. The fact that C-small had 100 test cases, C-large-1 had 10^5 and C-large-2 had 100 should've been a strong hint that C-large-1 required some pre-caching across test cases.
In the case of the OP, for example a trivial optimization(the kind you do without even thinking in these competitions) would've brought his execution time down 10000x. Basically, you are asked about the number of "fair and square" numbers in a certain range, over and over again, in 10000 different cases. He could have just taken his solution, pre-generated all those numbers once and then ran all those test cases by counting in that pregenerated list, instead of generating it for every single test case(all 10^5 of them). It turns out, that in the whole span of 1->10^14 there is only 39 such numbers and it would've taken his program probably less than a minute to generate all of them(I'm being generous here, since it ran 10^5 test cases in 53min).
In fact, if you read the problem analysis later provided by google you will see that in fact you were expected to make this optimization to solve the problem. The fact that C-small had 100 test cases, C-large-1 had 10^5 and C-large-2 had 100 should've been a strong hint that C-large-1 required some pre-caching across test cases.