Akemi's Anime World

Akemi’s Anime Blog AAW Blog

A Supercomputer In Your Pocket

A supercomputing expert recently made some comments about the computing power of an iPad 2 relative to a Cray-2. It was a funny coincidence, because I had just, completely at random, been goofing around with some similar calculations myself.

So, for the heck of it, here are my very rough and mostly unscientific comparisons of classic supercomputers to one pocketable device and one luggable one: A current iPod and my new laptop, a MacBook Pro quad-core 2.2GHz Sandy Bridge i7.

These two devices are arbitrary; I picked an iPod because it’s cheap, popular, and lacks unnecessary (for this comparison) cellular hardware, and the laptop because it’s portable, pretty beefy, and I just bought one.  The numbers will be about the same for an iPhone 4 or any similarly high-end smartphone (or much higher, in those that use an iPad 2 class dual-core CPU), or any relatively high end laptop.

Caveats

One thing I’m ignoring in my attempt to compare stuff you can buy in 2011 to decades-old supercomputers is GPUs. Modern GPUs are incredibly powerful, even in high-end handhelds, but they are also very specialized, so while you can find GFLOPS ratings for many GPUs, and some general-purpose computing tasks can be run on them, you can’t run the Linpack benchmark on most, and it’s hard to make a one-to-one comparison of the computing power of a GPU versus a general-purpose number-cruncher, be it modern CPU or classic supercomputer.

The other caveat is the standard one about synthetic benchmarks across very different systems; you can only really compare some specific task. The Linpack benchmark suite has been used since the early ’90s to benchmark massive supercomputers, and it’s easy enough to download and run yourself, plus it’s the standard used by the Top500 fastest supercomputer list. Since it also produces results in FLOPS (floating point operations per second), which have been used for a very long time to measure computing power, I’m using it as the measure of power. It’s actually a fairly specialized (and not necessarily very useful) set of math routines, but it’s as universal a benchmark as any.

Vector calculations also complicate things; being more specialized, classic vector computers often had disproportionately high performance at the cost of less versatility. Of course, many CPUs (including all modern ones) also have a vector processing unit (Apple based a whole advertising campaign on the power of the AltiVec vector unit in their G4 CPUs). These usually produce much higher FLOPS ratings for a subset of appropriate tasks, making comparisons somewhat harder.

All that said, off we go.

A 35-year-old Cray In Your Pocket

Bottom line first: a 4th Gen iPod Touch (or an iPhone 4) is significantly more powerful than a Cray-1 from the late ’70s.

iPod Touch equals Cray1

The Cray-1 was sort of the start of the classic supercomputers; it was a big, cool-looking, appliance-sized thing that sold well and debuted in 1976, a couple years before I was born (and just before Apple started selling the Apple I).

Benchmarks

Assuming Wikipedia is correct, the Cray-1 generally performed at about 136 MFLOPS, but could max out at 250 MFLOPS with carefully-tuned code that made use of its vector capabilities.

However, Linpack, which presumably uses more complicated routines, isn’t so kind to the Cray-1’s capabilities.  The Linpack FAQ actually has some figures for the machine (verified in the huge list of Linpack results linked from this page, down at the bottom); with the software available at the time of its release, it could apparently manage about 3.4 MFLOPS. A few years later, with better compilers, it turned in a much more respectable 12 MFLOPS.

As for the iPod Touch, there is, conveniently, a Linpack app in Apple’s App Store; the current (4th Gen) iPod tests at a little over 38 MFLOPS with it (my 3rd-gen Touch manages about 30 MFLOPS). Its ARM CPU (like the Cray) isn’t the strongest in that particular sort of calculation; in more optimized sorts of calculations using the right compiler and routines, it can apparently manage around 260 MFLOPS, which would be similar to what the Cray-1 could do under ideal circumstances.

Cray-1 Hardware Specs

The Cray-1 had a single, hand-wired, 64-bit, 80MHz CPU with vector capabilities, 1M 64-bit words of RAM (8MB in modern terms), one or more external 300MB hard drive units (which could be combined up to 4.8 GB, if I understand the manual correctly), and cost in the range of $5-8 million. It was freon-cooled and used (again, according to the manual) 115kW of electricity before you factored in storage and auxiliary hardware. Upgrades over the next few years offered versions with up to 32MB of RAM and 256MB of solid state storage.

Cray-1 CPU, at EPFL, Switzerland, photographed by Rama

A Cray-1 on display at EPFL in Switzerland (photographed by Rama)

Also, in addition to the main CPU unit (the tower-shaped thing above), which was about 9 feet (2.5m) wide and 6.5 feet (2m) high and weighed 5 and a quarter tons (4700kg) fully equipped, the Cray-1 also required two coolant condensing units, a power cabinet, two 150kW generators, and a smaller computer (closer to a modern desktop in size) that served as the user interface. The disk units were separate and rather large, as well.

iPod Touch Hardware Specs

A maxed-out iPod Touch (4th gen) uses a 32-bit, 800MHz A4 CPU (ARM Cortex-A8 class) with floating point and vector units, has 256MB of RAM, 64GB of solid state storage, and costs $400.  The iPhone 4, if you prefer, has the same CPU, 512MB RAM, up to 32GB storage, and costs $700 unsubsidized. The iPod weighs 3.6 oz (101g). The iPhone gets 2.75 hours out of its 5.2 Watt-hour battery running full-bore, meaning it uses a little under 2W (some significant portion of which is the screen backlight, GPU, and wireless hardware); the iPod Touch should be about the same.

Meaning…

Which is to say, in ballpark terms, you can walk out and buy a $400 iPod that has somewhat better performance than a Cray-1 from 35 years ago that would have cost you $8 million.  At 1/20,000th the cost it has 8x more RAM, 13x more storage, weighs 50,000x less not counting all the accessories, and uses at least 60,000x less electricity.

Nice.

A 20-year-old Supercomputer On Your Lap

The Cray-1 was a hand-built bundle of circuit boards and wads of wires, and I’m comparing it to a pocketable media player/gaming doodad (or phone); I’m a lot more interested in a comparison between more modern, refined supercomputers from, say, when I was in high school, and a modern supercomputer-on-your lap, the high-end laptop I’m typing this on.

The early ’90s were an interesting time for supercomputers, so I like using 20 years ago as the comparison; there was a boom in the construction of computers that used hundreds of off-the-shelf CPUs in parallel instead of fancier custom hardware, and there was a corresponding rapid increase in the speed of such systems. It was, effectively, the start of the “modern,” massively-parallel supercomputer era.

So, my 2011 MacBook Pro 17″ versus 1991, give or take a couple of years.

The bottom line is that the CM-5—a top-tier supercomputer from 1993—is very similar in specs to my laptop, and if you go back an even 20 years to 1991, this thing I use everyday would probably have been the fastest computer on earth by a modest margin.

A MacBook Pro equals a Thinking Machines CM5 Supercomputer

Calculations

To start with, I downloaded a copy of Linpack from Intel and ran it myself; I got about 38 GFLOPS. (In benchmarks that make more use of vector math, for reference, I’ve seen as much as 59 GFLOPS. The GPU is theoretically capable of a ridiculous 500-700 GFLOPS, if you believe AMD, but that’s even more specialized sorts of calculations with little to do with everyday, non-3D-graphics computing.)

For comparison, we’ll use the Top500 List’s Rmax values, which are measured real Linpack throughput. In 1990, pre-list, the fastest supercomputer in the world was apparently the NEC SX3/44R, which clocked at 23.2 GFLOPS. It has the same 4 cores, interestingly, as my laptop, although they were 400MHz vector processors, so not quite comparable.

In mid-1993, when the Top500 list debuted, the top public computer was a Thinking Machines CM-5/1024, clocked at 59.7 GFLOPS (the NSA had a somewhat faster CM-5 called Frostburg with 512 better processors and a lot more RAM, but it was classified at the time); the #2 that year, a 544-core version of the same CM-5, managed 30.4GFLOPS, and 512-core versions took the 3 and 4 spots.

I like the CM-5 as a comparison; it wasn’t necessarily the best supercomputer in absolute terms, and the company was horribly mismanaged, but it was promoted heavily at the time (here’s a promo video) and was more or less the first of the “mass market” (if such a term applies to less than a dozen sales) supercomputers. Plus, it was the first system to sit atop the Top500 list.

The CM-5 also looks rather sexy, even earning a cameo in Jurassic Park; it’s the big black series of towers with a bunch of scary-looking flashing red lights running the park (the lights were mostly for show, but did actually show the status of the processors).

The Frostburg CM-5 Supercomputer

The NSA's CM-5 "Frostburg," looking suitably imposing (photo by Austin Mills)

CM-5 Specs

The CM-5 used a variable number (up to 1024 in practice) of Sun SPARC processors; 512 was the most common (not that very many were sold) configuration.

I found plenty of info about the NCSA’s CM-5/512 (ranked #3 in June 1993, retired in 1997), so I’ll use that as the sample system.

It had 512 SuperSPARC I 32 MHz CPUs, 16GB of RAM (cheaper configurations had 8GB), and 140GB of disk storage.

The storage array was composed of 1.2GB 3.5″ SCSI disks with throughput of 2MB/s each, which were effectively in a sort of RAID-0 array; a paper I found (PDF) said that each 56-disk-set had a theoretical throughput of about 110MB/s, but in practice only managed 32MB/s reading and a much better 90MB/s writing, so presumably that 140GB array managed about 75MB/s write, 200MB/s read.

The CPUs alone would have drawn 7kW of power; I can’t find a number anywhere for the whole system, but probably at least 20kW, if not much, much higher; pricing was a weird, scandalous subsidy thing with DARPA, but somewhere over a million bucks. The individual cabinets were about 8 feet (2.5m) tall (this page has some photos with people for scale), and took up more or less a whole room; I couldn’t find any weight figures, but presumably several tons.

2011 MacBook Pro Specs

In comparison, my MacBook Pro has a four-core 2.2GHz CPU, 8GB of RAM (supports up to 16GB, if you’re rich), and a 120GB SSD (which I added; OWC 6G, SandForce 2200-based; about twice the performance of the stock SSD Apple offers) that can transfer data at around 500MB/s in both directions, plus a slower, old-fashioned spinning drive of 750GB that can manage around 90MB/s. The Sandy Bridge i7 CPU draws 45W at 2.2GHz, while the whole computer uses under 85W including screen, GPU, and battery charging. It cost me around $3000 including the high-end SSD; a more stock configuration with a smaller screen (but otherwise identical specs) would cost closer to $2500.

Meaning…

Which, when you put it together, means that my high-end but not-particularly-unusual laptop from 2011 is about 18% faster than a 1993 CM-5/512, has the same amount of RAM as the lower-end configuration, a similar amount of storage that can transfer data at about twice the speed (or, if you go with stock hardware, 5 times as much at half the speed), and uses at worst 1/200th the power, if not much less than that, for maybe 1/500th the cost.

A very close match to one of the best room-sized computers a few million dollars would buy you in 1993, and you can carry it in a briefcase and run it off of an internal battery for several hours.

Or, alternately, it would have been the fastest (publicly-known) computer on earth 20 years ago. If we compare it to our Cray-1 era of 35 years ago, it might well be faster than every general-purpose computer on the planet combined.