SC23 This yr marks the thirtieth anniversary of the Top500 rating of the world’s publicly identified quickest supercomputers.
In celebration of this reality and with the annual Supercomputing occasion underway in Colerado, we thought it would be enjoyable, if a bit foolish, to see how cheaply we might obtain the efficiency of a top-ten supercomputer from 1993. So, we spun up just a few digital machines within the cloud and compiled the HPLinpack benchmark. Spoiler alert: you will not be too shocked by this experiment of ours.
On the finish of 1993, the quickest supercomputer on report was Fujitsu’s Numerical Wind Tunnel, situated at Japan’s Nationwide Aerospace Lab. With a whopping 140 CPU cores, the system managed 124 gigaFLOPS of double-precision (FP64) efficiency.
Aurora dawns late: Half-baked entry secures second in supercomputer stakes
In the present day we’ve got programs breaking the exaFLOPS barrier, however in November 1993, all you wanted to do to assert a spot among the many 10 strongest programs was to handle higher than the US CM-5/544’s 15.1 gigaFLOPs of FP64 efficiency. So, the goal for our cloud digital machine to beat was 15 gigaFLOPS.
Earlier than we dig into the outcomes, just a few notes. We all know we might have achieved a lot, a lot greater efficiency if we might opted for a GPU-enabled occasion, nevertheless these aren’t precisely low-cost to lease within the cloud, and GPUs did not actually begin showing in Top500 supercomputers till the mid to late 2000s. It is also a lot less complicated to get Linpack operating on a CPU than on a GPU.
These assessments had been run for the novelty of it, to mark the thirtieth anniversary, and are not at all scientific or exhaustive.
A $5 cloud VM versus a 30-year-old Top500 tremendous?
However earlier than we might get to testing, we would have liked to spin up a pair VPCs. For this run we opted to run Linpack in Vultr however this is able to simply as properly in AWS, Google Cloud, Azure, Digital Ocean, or no matter cloud supplier you favor.
To begin off, we spun up a $5/mo digital machine occasion with a single shared vCPU, 1GB of RAM and a 25GB of storage. With that out of the best way, it was time to compile Linpack.
That is the place issues can get somewhat difficult since there’s truly a good bit of tweaking and optimization that may be executed to eke out just a few further FLOPS. Nonetheless, for the needs of this check and within the curiosity of preserving issues so simple as potential, we opted for this information right here. That documentation was written for Ubuntu 18.04 although we discovered it labored simply superb in 20.04 LTS.
To generate our HPL.dat file we used this nifty type that mechanically generates an optimized configuration for a Linpack run.
We ran the benchmark 3 times for just a few totally different VM varieties and chosen the very best rating from every run. Listed here are our findings:
As you’ll be able to see, our completely unscientific check outcomes confirmed a single shared vCPU compares fairly favorably to November 1993’s ten most energy supers.
A single CPU thread netted us 31.21 gigaFLOPS of FP64 efficiency, placing our VM in competition for the number-three ranked supercomputer in 1993, the Minnesota Supercomputing Middle’s 30.4 gigaFLOPS CM-5/554 Pondering Machines system. Not unhealthy contemplating that system had 544 SuperSPARC processors whereas ours had a single CPU thread, albeit operating at a lot greater clock speeds, in fact.
As you’ll be able to see from the chart above, an additional $1/mo noticed efficiency leap to 51.85 gigaFLOPS, whereas stepping as much as an $18 “premium” shared CPU occasion with two threads bought us nearer to 87.46 gigaFLOPS.
Nonetheless, to beat Fujitsu’s Numerical Wind Tunnel required stepping as much as a 4 vCPU VM from which we squeezed 133 gigaFLOPS of FP64 goodness. Sadly, leaping as much as 4 threads wasn’t almost as low-cost at $48/mo. At that worth, Vultr truly sells fractional GPUs which we anticipate would carry out comically higher, and can be fairly a bit extra environment friendly.
Higher choices on the market
One thing we should always point out is these had been all shared cases, which often means they have been over provisioned to some extent.
This could result in unpredictable efficiency that might range from run to run relying how closely loaded the host system is in its cloud area.
Intel drops the deets on UK’s Daybreak AI supercomputer
In our extremely unscientific runs we did not see a lot variation. We predict it’s because the cores simply weren’t that closely loaded. Operating the identical check on a devoted CPU occasion rendered close to equivalent outcomes as our $6/mo occasion however at 5x the associated fee.
However past the novelty of this little experiment, there’s probably not a lot level. If you want to get your fingers on a bunch of FLOPS on brief discover, there are many CPU and GPU cases optimized for this sort of work. They will not be anyplace as low-cost as a $5/mo occasion, however most these are literally billed by the hour so for real-world workloads the precise price goes to be decided by how rapidly you will get the job executed.
And by no means thoughts how your smartphone compares to those 30-year-old programs.
In any case, The Register can be on the bottom in Denver this for SC23 the place we’ll be bringing you the newest insights into the world of high-performance computing and AI. And for extra evaluation and commentary, do not forget our friends at The Subsequent Platform, who’ve the convention lined too. ®