Newsletter-banner-No-151

Complex supercomputer upgrade completed

Mike Hawkins

 

You may not be able tell from looking at the outside or from the size of the electricity bill, but the ECMWF High Performance Computing Facility (HPCF) is now much bigger than it was at the beginning of 2016. After an upgrade that involved replacing almost 1,800 processor blades, the system now has 260,000 processor cores to run work on, equivalent to more than 200 desktop computers for everyone at ECMWF. It also has more than 900 terabytes of memory, enough to store almost 750 years of MP3 music.

Blade swap
%3Cstrong%3EBlade%20swap%3C/strong%3E.%20Cray%20engineers%20moving%20a%20trolley%20load%20of%20compute%20blades.%20The%20trolley%20holds%20half%20the%20number%20of%20blades%20that%20are%20in%20a%20compute%20rack.
Blade swap. Cray engineers moving a trolley load of compute blades. The trolley holds half the number of blades that are in a compute rack.

Cray provide ECMWF with an HPCF under a multi-year, multi-phase service contract. In December 2015, we agreed to extend this contract for two years, until the end of September 2020. This gave ECMWF a better alignment with the timescales of the data centre relocation programme. It has also provided the Centre with a much more powerful phase two system from 2016 and a new ‘Novel Architecture Platform’ to support the work of the Scalability Programme.

The new contract enabled a full upgrade of both existing Cray clusters, replacing all of the existing compute nodes that used Intel ‘Ivy Bridge’ processors with the latest Intel ‘Broadwell’ processors along with extra storage and two new cabinets of nodes. Two generations of progress in processor technology have allowed Intel to pack more than seven billion transistors on one chip. This has resulted in an increase in the core count from 12 to 18, but with a reduction in the power consumption from 130 W to 120 W per chip. The new processors have a new microarchitecture and new instructions, including Advanced Vector Extension (AVX2) instructions to improve performance on some codes. Since the upgrade was a full swap, we avoided the problem of running a heterogeneous system. Different instruction sets would have caused some programs to fail to execute on the older processors or to give different results. Different processor speeds and memory sizes would have made scheduling workload difficult and inefficient. With a homogeneous system, we can fully optimize the workload to take advantage of the new architecture.

Minimal disruption

The team planning and executing the upgrade faced a key challenge: unlike in previous upgrades, the installation did not take place while the existing service continued to run as usual. Instead, the team had to take apart the system providing the service and replace it with a new one. Minimising the impact and especially the time the system was out of service was vital. This was especially important as an upgrade to the Integrated Forecasting System to implement a horizontal resolution increase had to take place over the same period. A three-step approach was decided on:

  • Step 1: upgrade part of one cluster to allow large-scale user code testing and to practise the upgrade procedure
  • Step 2: upgrade the remaining part of the cluster
  • Step 3: upgrade all of the second cluster.

This staged approach allowed ECMWF to get access to the new technology for large-scale code testing while retaining operational resilience. It also allowed Cray to practise their install procedure to meet the logistical challenge.

The first step was reported on in the spring 2016 Newsletter. It made available 700 nodes for large-scale testing from the beginning of April until mid-May. The bigger challenge was to swap out the remaining blades. A Cray system is made up of compute nodes. Nodes live on blades and blades live in cabinets. There were 19 cabinets in ECMWF’s original configuration, each with 48 processor blades, giving 912 blades per cluster. Some of these nodes have special functions, such as connecting to the storage or networks, so their blades did not need to be replaced. But this still left 904 blades on each machine to be swapped. Since Cray had already replaced 144 blades, only 760 blades remained for the first cluster. In May, Cray unpacked the blades, replaced them in the machine, and packed up the old ones for shipment, in under two days. They then repeated this process a month later, but this time with all 904 blades of the second cluster. That was an impressive feat, especially considering that at more than 30,000 kilogrammes the weight of the blades to be moved for one system is roughly the equivalent of five African elephants. After each upgrade, Cray ran an extensive suite of tests to ‘burn in’ the new hardware before handing it back. The downtime for each system was kept to just a week.

Scalability

Exploiting the potential of new HPC architectures is one of the challenges for the Scalability Programme. ECMWF already has a GPGPU cluster with 68 NVIDIA Tesla K80 GPUs. To complement this resource, on 14 September Cray delivered a standalone machine with 32 nodes of Intel Xeon Phi ‘Knights Landing’ processors. Each node has one processor with 64 cores, 16 gibibytes of high-bandwidth memory and 96 gibibytes of DDR4 memory, which is enough to run the forecast model at its operational configuration.

Primary data growth
%3Cstrong%3EPrimary%20data%20growth%3C/strong%3E.%20%20The%20evolution%20of%20the%20average%20daily%20increase%20in%20primary%20data%20stored%20in%20the%20Data%20Handling%20System%20clearly%20shows%20the%20impact%20of%20the%20upgraded%20HPCF.
Primary data growth. The evolution of the average daily increase in primary data stored in the Data Handling System clearly shows the impact of the upgraded HPCF.

Extra computational power enables more work to be done, which in turn creates more data to be managed, as can be seen in the data storage chart. Because of the upgrade, the Data Handling System was enhanced with extra servers and disk cache to support the growing volume of data, which passed 200 petabytes at the end of 2016.

The configuration of the Cray systems will remain stable, except for software upgrades, until they are replaced by new systems in 2020. The process to find those successor systems has already started with a project to bring together science requirements from the new Strategy, assessments of future technology and a study of the socio-economic benefits of improved forecasts and services to present a business case for the next procurement to Council.