Exascale Challenges

The Challenges of Exascale

The emerging exascale computing architecture will not be simply 1000 x today’s petascale architecture. All proposed exascale computer systems designs will share some of the following challenges:

  • Processor architecture is still unknown.
  • System power is the primary constraint for the exascale system: simply scaling up from today’s requirements for a petaflop computer, the exaflop computer in 2020 would require 200 MW, which is untenable. The target is 20-40 MW in 2020 for 1 exaflop.
  • Memory bandwidth and capacity are not keeping pace with the increase in flops: technology trends against a constant or increasing memory per core.  Although the memory per flop may be acceptable to applications, memory per processor will fall dramatically, thus rendering some of the current scaling approaches useless
  • Clock frequencies are expected to decrease to conserve power; as a result, the number of processing units on a single chip will have to increase – this means the exascale architecture will likely be high-concurrency – billion-way concurrency is expected.
  • Cost of data movement, both in energy consumed and in performance, is not expected to improve as much as that of floating point operations , thus algorithms need to minimize data movement, not flops
  • Programming model will be necessary: heroic compilers will not be able to hide the level of concurrency from applications
  • The I/O system at all levels – chip to memory, memory to I/O node, I/O node to disk—will be much harder to manage, as I/O bandwidth is unlikely to keep pace with machine speed
  • Reliability and resiliency will be critical at the scale of billion-way concurrency: “silent errors,” caused by  the failure of components and manufacturing variability, will more drastically affect the results of computations on exascale computers than today’s petascale computers

These challenges represent a change in the computing cost model, from expensive flops coupled with almost free data movement, to free flops coupled with expensive data movement.