Sunday 7 June 2020

A Systematic Literature Review on High Performance Computing

I need to do multiple sampling events of multiple simulations. No way can my existing system pull that off in a timely fashion, if at all.”
- David Warner; Research Fisheries, Biologist

We had an 80-node cluster here in Golden several years ago, but when I left for Memphis, nobody wanted to manage it. Now we have an overworked 12 node server.”
- Oliver Boyd; Research Geophysicist

High Performance Computing (HPC) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.

In today’s world, larger and larger amounts of data are constantly being generated. The nature of computing is changing, with an increasing number of data-intensive critical applications: by 2020, 25 billion devices will be connected and will generate over two zettabytes of traffic every year. As a result:

  • Industry and SMEs are increasingly relying on the power of supercomputers to work on innovative solutions, reduce costs and decrease time to market for products and services.

  • Modern scientific discovery requires very high computing power and capability: for example, to accelerate genome sequencing by two orders of magnitude and enable scientists to crack cancer diseases.

In this work we will compile and provide an overview of the latest developments in the various branches of HPC such as supercomputers, CPU/GPU, distributed computing and more.


Supercomputers

Supercomputers have the highest level of performance and are the largest computers available at any point in time. They were first introduced in the 1960’s by Seymour Cray, but the term was first used in New York World (1920) to describe “new statistical machines with the mental power of 100 skilled mathematicians in solving even highly complex algebraic problems”.

The earliest models such as IBM 7030, UNIVAC and SSEC demonstrated potential to be commercially distributed. Since November 2017, all of the world’s fastest supercomputers run Linux-based operating systems. The current best supercomputer can do 200,000 trillion calculations per second.

The performance of a supercomputer is generally measured in floating-point operations per second, which is an arithmetic used to represent very small and very large real numbers. A variety of architectural developments were employed to increase speed and performance beyond that which technology could offer.

A sole supercomputer uses shared memory and tightly coupled processors. We can also refer to a cluster or grid of loosely coupled machines working together to solve a common problem as a “virtual supercomputer”.


Architectural Evolution

Supercomputer

Improvement

Year

CDC 6600

  • Increase of 2 orders of magnitude over existing machines.

  • Providing factor of 10.

  • data was prefetched

  • registers were reserved

1961

IBM 360

  • Multiple execution units

  • instruction scheduling mechanism

  • 3 sets of registers: floating-point, fixed-point and index.

1964


  • Instruction overlap and look-ahead were extended to a separate autonomous unit

  • Floating-point arithmetic done by autonomous unit

  • instruction processing and storage access time were decoupled

1964

ILLIAC IV

  • First massive parallelism

  • 64 separately functioning processing elements

1966

CDC STAR-100

  • Introduced vector processing

  • runs a single stream of data with a rate of 108 instructions per second

1974

Cray 1

  • Successfully implemented vector processing design

1975

CDC CYBER 205

  • Less latency time

  • separate arithmetic units for vector and scalar instructions

  • concurrent vector and scalar instructions

1980



Distributed Computing

Distributed computation or processing involves interconnecting multiple independent entities (usually computers or processors), which communicate and work together to solve a common problem.

This includes parallel processing in which a single computer uses more than one CPU to execute programs.

Any process, if divisible, can be performed in parallel. Although the terms parallel and distributed systems overlap there is a single characteristic that distinguishes them:

  • In parallel computing, the multiple processors have direct access to a shared memory, which forms a common address space, to exchange information between processors. The processors are in close physical proximity and tightly coupled, and usually run the same operating system (homogeneous software and hardware), such as the processors inside a supercomputer.

  • In distributed computing, each processor has its own private memory. Information is exchanged by passing messages between the processors, such as processors in distinct independent devices.

For computing systems, a distributed system is a collection of semi-autonomous computers that appear to the user as a single computer, they are also characterized by:

  • No common physical clock

  • No shared memory – requires information to be exchanged by passing messages between the processors, this can be implemented with message-passing protocols such as remote procedure call (RPC) or its object-oriented analog remote method invocation (RMI).

  • Autonomy and heterogeneity – the processors may have different speeds and can be running on different operating systems

  • Geographical separation and resource sharing – by sharing and accessing geographically remote data and resources, the performance/cost ratio is increased, because a task can be partitioned across the various entities in the distributed system.

Pros

Cons

  • efficiency of processing and reliability

  • better performance overall while serving many users

  • can share data and resources

  • allows for reuse of available services

  • scalability – as the processors are connected by a network, adding more processors doesn’t affect the communication or performance of the system

  • modularity and incremental expandability – different processors may easily be added to the system, or be replaced without affecting the performance

  • developing, managing and maintenance of the system

  • controlling shared access to data and resources

  • failure of a single unit can compromise the rest of the system

  • heterogeneity

  • security and confidentiality of shared data


  • Distributed Systems Challenges


Granularity - In computing, the relative measure of the amount of processing to the amount of communication within the distributed program is termed as granularity.

If the degree of parallelism is fine-grained, there are fewer CPU instruction executions compared to the number of times the processors communicate (either via shared memory or message-passing) and synchronize with the other processors. This type of parallelism is best suited for memory-sharing, tightly coupled systems, as the latency delays of communicating with physically remote multiprocessors would degrade the overall output.

If the degree of parallelism is coarse-grained, this means a program is being split into larger tasks, which means more computing power and load as most of the program is being executed sequentially on a processor. This results in low communication and synchronization overhead, which makes this type most suited for message-passing, loosely coupled systems.

Communication - Involves implementing appropriate mechanisms for communication, such as message-oriented (MPI) versus stream-oriented communication, Remote Procedure Call, Remote Method Invocation. In the case of tightly coupled system, memory sharing systems OpenMP is commonly used.

Scalability and Modularity – the data and services must be as distributed as possible. Techniques such as replication (having backup servers), cache management and asynchronous processing help to achieve scalability.

Naming, consistency and replication – easy to use and robust schemes for names, identifiers and addresses for locating resources in a scalable manner. To provide fast access to data and scalability, dealing with consistency among the cache/replicas (backups and checkpoints) is essential.

Data storage and access – schemes for accessing the data in an efficient and scalable manner.

Reliability and Fault tolerance – being able to maintain efficient operation despite crashes or any process or nodes.

Security – access control, secure channels and other methods of cryptography are required for maintaining any system operational.

Transparency and APIs – for ease of use by non-technical users, an Applications programming interface and other specialized services are important. An example of an API platform for distributed and parallel computing is NVIDIA’s CUDA.





  • Distributed Systems Architecture Models




Client/Server – a machine offers a service, and multiple clients communicate with the server to use the service.

Peer-to-Peer (P2P) – P2P arose as a paradigm shift from the client-server computing. All involved entities exist at the same “peer” level, are equally privileged, without any hierarchy among the processors, this means they all play a symmetric role in computation.

It’s comprised of a flexible network of autonomous, self-organizing client machines connected by an overlay network.

This type of organization is more appealing to business file sharing, content delivery and social networking.



  • Distributed Systems Applications

Many devices can be used to execute distributed computing, such as computers, smartphones, tablets, sensors, etc. Some of these technologies have very distinct capabilities, which are not always compatible with typical distributed system architectures. We explore some of its applications’ challenges below.

Mobile systems

These systems typically use wireless communication, based on a shared broadcast medium. Hence, the characteristics of communication are different, other problem variables can be the range and power of transmission, battery power conservation, signal processing and interference.

Sensor networks

A sensor is a processor with an interface capable of sensing physical parameters such as temperature, pressure, velocity, noise levels, traffic patterns, etc. The streaming data reported from a sensor network differs from the data reported by computer processes, in that the data is external to the computer, limiting the nature of information.

Ubiquitous computing

Ubiquitous systems represent a class of computing where the processors are embedded in and pervading through the environment, while performing functions in the background. Smart workplace or intelligent homes are examples of these distributed systems – these are characterized as having small processors operating collectively in a dynamic environment network.


Organization of Computational Resources


  • Clusters

Grid computing focuses on providing high performance and high scalability. Cloud computing alternatively focuses on delivering high scalability and low cost. Cloud services therefore aim to provide a far lower performance to price ratio and cannot surpass the performance of individual grid components. This is problematic for message passing workloads which rely heavily on high performance computers and fast interconnects.”

A cluster is a composite of many separate computers or servers, which are called nodes.

The nodes in the cluster should be interconnected, preferably by a known network technology, for maintenance and cost control purposes. Sometimes, replacing a computer that has problems can stop the entire system, so it is very important to implement a solution that gives the possibility of adding or removing nodes without compromising the entire system.

Usually clusters are owned and operated by a single authority, which allows full access to the hardware and enables the users to specifically modify the cluster and the application to achieve optimum performance, this can also be a drawback as unauthorized access (hacking) only needs to target the sole owner.

This type of solution is suited for tightly coupled system, which uses the same hardware (homogeneous).

Clusters require substantial investment and technical maintenance, making this solution less accessible to smaller organizations.


  • Grid Computing

Grid computing applies the resources or many computers in a network to a single problem at the same time – usually to solve problems that require a great number of processing cycles. This type of solution is suited for large-scale decentralized systems (loosely coupled) and offers dynamism and diversity (heterogeneous).

Grid computing acts in contrast to the notion of a supercomputer, which has multiple processors connected by a local high-speed computer bus.



  • Cloud Computing

Since 2012, the HPC community has made many progresses towards the “democratization of HPC”: making high performance computing available to a larger and larger group of engineers and scientists. One influencing factor is cloud computing which enables those who cannot afford or do not want to buy their own HPC server to access computing resources in the cloud, on demand, and pay only for what they use.

Forrester’s 2020 predictions for Cloud Computing indicates that “the public cloud market – cloud apps (software-as-a-service [SaaS]); cloud development and data platforms (platform-as-a-service [PaaS]); and cloud infrastructure (infrastructure-as-a-service [IaaS]) – will reach $411 billion by 2022. In 2020, the combined platform and infrastructure markets will grow another 30%, from 2019 to $132.8 billion.” This demonstrates that cloud computing is having investor attention as an interesting market with various growth possibilities.

Microsoft is a lead player in HPC Cloud Computing, alongside with Amazon and Google, according to Gartner’s 2019 Cloud IaaS Magic Quadrant. Microsoft’s Azure is an example of a pay-as-you-go service which translates in a more tailored and target-specific usage of the infrastructure provided to the client.



  • CPU/GPU

Moore’s Law has caused the pendulum of visual computing to swing away from dedicated hardware and specialized software to general-purpose hardware and software packages”

- Jim Jeffers; Principal Engineer, Engineering Manager, SW Defined Visualization, Intel

The difference between CPU and GPU is that the central processing unit is composed of a small number of cores with a complex architecture operating at a high clock frequency, enabling better handling of separate threads of sequential instructions, while the graphics processing unit is composed of a large number of basic cores, enabling the processor quickly solve highly parallelized problems . As a result, GPU, unlike CPU, provides a significantly better performance when dealing with highly parallelized problems. Behavioral patterns of these two processor types are also different: for example, while CPU requires frequent access to the memory, GPU does it less frequently, but sends considerably larger amounts of information. Hybrid CPU/GPU architecture allows combining the advantages of both processor types, using CPU for sequential computing and GPU – for parallel computing. In 2014, researchers from INESC-IST-UL benchmarked several GPU and CPU with the following results:

Article originally written by Rodrigues C. M. and Cisneiros M.



References


What is High Performance Computing? (n.d.). Retrieved May 30, 2020, from https://www.usgs.gov/core-science-systems/sas/arc/about/what-high-performance-computing


High-Performance Computing (2019). Retrieved May 31, 2020, from https://ec.europa.eu/digital-single-market/en/high-performance-computing


Sukharev, P. V., Vasilyev, N. P., Rovnyagin, M. M., & Durnov, M. A. (2017). Benchmarking of high-performance computing clusters with heterogeneous CPU/GPU architecture. 2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). 2017 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). https://doi.org/10.1109/eiconrus.2017.7910619


Gartner (2019). Magic Quadrant for Cloud Infrastructure as a Service, Worldwide. Retrieved May 31, 2020, from https://www.gartner.com/en/documents/3947472/magic-quadrant-for-cloud-infrastructure-as-a-service-wor


Forrester. (2019). Predictions 2020: Cloud Computing. Retrieved May 31, 2020, from https://www.forrester.com/report/Predictions 2020 Cloud Computing/-/E-RES157593#


Andriole, S. (2019, November 20). Forrester Research Gets Cloud Computing Trends Right. Retrieved May 31, 2020, from https://www.forbes.com/sites/steveandriole/2019/11/20/forrester-research-gets-cloud--computing-trends-right/#9b5d6ee68a26


Vestias, M., & Neto, H. (2014, September). Trends of CPU, GPU and FPGA for high-performance computing. 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 2014 24th International Conference on Field Programmable Logic and Applications (FPL). https://doi.org/10.1109/fpl.2014.6927483


Supercomputer architecture. (2020). Retrieved May 31, 2020, from https://en.wikipedia.org/wiki/Supercomputer_architecture


Distributed Computing: Principles, Algorithms and Systems. A. D. Kshemkalyani and M. Singhal. (2007, January 2007).

No comments:

Post a Comment