Serious question: what on earth have supercomputers got to do with this? My impr...

philipkglass · on Aug 13, 2019

I used what was then a top-10 system on the Top 500 when I worked at a national laboratory in the early 2000s. An embarrassing number of jobs in its job queue would have run well on much smaller clusters with less expensive hardware. Only once in a while would we run a single job that used more than half of the entire system and achieved decent scaling.

I suspect that the mismatch is worse nowadays. Although software and interconnects have improved, core counts and node counts have gone up even faster.

IMO simulation-guided research would probably have gone faster at the lab if the money for the top-10 system had been spent on a bunch of smaller clusters with less exotic hardware, divvied up according to the actual lines of research scientists were pursuing. But there's prestige and budgetary room for a new Grand Challenge system that may not be there for a bunch of more affordable, less exotic systems. And once in a while somebody does have a job that only runs well on the big machine.

This is also why I don't much worry about China building systems that rank higher on the Top 500 than American systems. Until Chinese research groups start churning out Gordon Bell Prize-winning software to go with the giant systems, they're probably just misallocating even more money than American labs.

EDIT: well that was arrogant and foolish of me to dismiss Chinese HPC. I looked up recent Gordon Bell Prize winners and Chinese researchers won in 2016 and 2017. It looks like they're making good progress in using those really big systems.

https://en.wikipedia.org/wiki/Gordon_Bell_Prize

chmhsm · on Aug 13, 2019

Creating a 3d model out of 2d images requires computer vision to extract objects in the images and estimate their dimensions (including elevation). This will most likely require implementing an end-to-end deep learning model that's gonna need training, validation and test. Given the amount of data it'll have to deal with (100ks to millions of images) it'll need to load (high dimensional?) images in batch for them to get processed. This can still be done arguably on aws or Azure (or or...) with TensorFlow and HPC, but two things here, HPC bring a bit more overhead to the table, and a supercomputer could do better since none of the current cloud service providers have supercomputers that can compete especially in terms of cpu performance.

alwayslearning0 · on Aug 13, 2019

Theres no reason it needs a DL model. There's a lot of software that calculates tie points and creates point clouds from pictures, which is what they are almost certainly going to do here. DL to go from orthoimages to point cloud, if it is a thing, is probably still in the feasibility steps.

salty_biscuits · on Aug 13, 2019

The steps are all fairly easily parallelizable until you get to a final large scale nonlinear least squares refinement step, and even then there are tricks to make the decomposition tractable. It usually just involves single images or pairs of images with no need for communication between processes until the last bit.

digikata · on Aug 13, 2019

If you look at georeference systems, they may fit a parametric equation with coefficients to a large set of earth data. The geoid in the link below is a refinement over an ellisoid (think lumpy potato vs smooth pebble), that gives you a 3D fit that has some level of accuracy with a very compact equation compared to carrying around the raw data. I'm thinking a supercomputer might be pretty useful to do the fit of data as it's updated, as well as giving options of using equations with different costs-to-fit on large data sets, or providing better fits on finer granularity.

https://en.wikipedia.org/wiki/Earth_Gravitational_Model

gashad · on Aug 13, 2019

Good question! This does seem like an embarrassingly parallel problem. Whenever I've used the big HPC centers, the secret sauce has been a fast low latency network interconnect. The fast interconnect is useful for PDE solvers which need a lot of processor-to-processor communication (i.e. for sending data back/forth periodically along the boundaries of the grid cells you're solving for).

tuvan · on Aug 13, 2019

Supercomputers are computers with 10.000s of cores of consumer-like CPUs. So it is the opposite of your impression. They can only work with tremendously parallelizable tasks. I don't know the exact details of weather simulation or nuclear explosion but they would have to be parallelized to work on HPCs. Even if the computation is not parallelizable, the scientists can leverage supercomputers by running a simulation with randomized parameters at every node and reach a consensus result.

steve_musk · on Aug 13, 2019

There is a difference between parallelizable and embarrassingly parallelizable. The former means that you can get better performance by dividing the work among different processors, while the latter usually implies that the work can be dividing into independent units of work that don’t need to communicate with each other.

A supercomputer typically means that those thousands of cores are connected with fast and expensive interconnects so that the cores can communicate with low latency. A large portion of the budget is usually spent on this interconnect. If you have an embarrassingly parallel problem and you run it on a supercomputer then that expensive interconnect is sitting idle - you would get the same performance on AWS or a more standard compute cluster.

eveningcoffee · on Aug 13, 2019

Well, your impression is just wrong. Simple as that. No shame in it.

Today what is called a super computer is usually just a cluster (i.e. multiple connected normal spec computers). It is normally connected with a high speed interconnect though (100 Gbit/sec and more) what is its most defining capability.

Why they are using this cluster? My speculation is that probably because it is available and does not have much use for the real scientific computing (because it is old https://bluewaters.ncsa.illinois.edu/hardware-summary) and intelligence agency prefers to rather support the academia then feed some commercial entity.

nwallin · on Aug 14, 2019

> It is normally connected with a high speed interconnect though (100 Gbit/sec and more) what is its most defining capability.

That's his point. The high speed interconnect, the defining capability of the modern super computer, is unnecessary to solve the problem they are using a super computer to solve.

They could have equally well used BOINC or some other distributed, loosely coupled technology for this.

I agree with you regarding "it's available." It sounds like a press person got ahold of it and stopped paying attention once they saw the word "supercomputer".

EMM_386 · on Aug 13, 2019

It seems like they are using IMAGES of the area and then using visual processing to determine the 3D height map.

That would be intense visual processing, stitching together various angles of photos to work out a terrain elevation.

They are not dealing with a 3D point cloud of elevation data.

HellDunkel · on Aug 13, 2019

Sounds as if they plan to use agisoft photoscan to process all satellite images they can find.

xvedejas · on Aug 13, 2019

The distinction between "supercomputer" and "enterprise cloud" is nearly meaningless now that a cloud compute instance is considered in the top 500 fastest super computers: https://medium.com/descarteslabs-team/thunder-from-the-cloud...

semi-extrinsic · on Aug 13, 2019

I don't think it's anywhere near "meaningless". The fact that a team of software engineers who are led by someone extremely well experienced in HPC (since the early 90s, multiple awards won, etc.) can do this, does not mean it is easy for a typical university or engineering company to do the same.

Will HPC migrate towards the cloud? Maybe yes, but we need several major overhauls to tooling before that is anywhere close to happening.

Just think about how much work it would be today to configure a Packer image that has several MPI libraries, a scheduler like SLURM, various Python versions + required packages, C and Fortran compilers, BLAS/LAPACK/etc, VCS systems, integration with some sort of user authentication system including support for SSH login to each node as well as linked for each user to the accounting in the scheduling system, and to have confidence that it will be highly performant for the application you work with on the AWS allocation that you have requested. Not many people could pull that off in a reasonable amount of time, if at all.

jobigoud · on Aug 13, 2019

> calculation that can't be subdivided into parts -- e.g. weather

I'm not sure how weather simulation work but I always wondered why they aren't performed cellular automaton style. Bottom up rather than top-down. At each time step a given cell state is computed based on the state of its neighbors at t-1. This should be parallelizable. I feel it's also how it works in the real world anyway.

petschge · on Aug 13, 2019

That is exactly what weather simulations do. Same most other kinds of hydrodynamic simulations (simulation of gas or fluid flows).

The thing is: you need the state at t-1 of all your neighbors. Then you can do a small timestep to get from t-1 to time t. And then you need the NEW state of your neighbors. That requires a fast interconnect. Which HPC machines have, unlike most clouds or commodity clusters.

In other words, yes it is parallelizable, but not trivially so, because the different grid points are coupled.

kvakkefly · on Aug 13, 2019

I’ve seen that azure has some nodes with infiniband connection (same that is often used in super computers).

I did my PhD in physics simulations (molecular dynamics), and have the same problem there. I tried running these simulations in Google Cloud without any good performance results due to high latency (compared to HPC). I’m no GC expert though, so should be possible to improve what I did.

glaberficken · on Aug 13, 2019

For a simplified explanation of how weather prediction models work check out: NOVA: Decoding the Weather Machine

Available on Netflix for example:

https://www.netflix.com/title/81121177

lern_too_spel · on Aug 14, 2019

If you don't want discontinuities at the seams between tiles, you need communication between nodes processing different tiles.