Home Explore CU-MCA-SEM II-Parallel and Distributed Computing-Second Draft

CU-MCA-SEM II-Parallel and Distributed Computing-Second Draft

Published by Teamlease Edtech Ltd (Amita Chitroda), 2021-10-12 04:18:43

Description: CU-MCA-SEM II-Parallel and Distributed Computing-Second Draft

Read the Text Version

Pages:

MASTER OF COMPUTER APPLICATION SEMESTER-II PARALLEL AND DISTRIBUTED COMPUTING 21MCA635

CHANDIGARH UNIVERSITY Institute of Distance and Online Learning SLM Development Committee Prof. (Dr.) H.B. Raghvendra Vice- Chancellor, Chandigarh University, Gharuan, Punjab:Chairperson Prof. (Dr.) S.S. Sehgal Registrar Prof. (Dr.) B. Priestly Shan Dean of Academic Affairs Dr. Nitya Prakash Director – IDOL Dr. Gurpreet Singh Associate Director –IDOL Advisors& Members of CIQA –IDOL Prof. (Dr.) Bharat Bhushan, Director – IGNOU Prof. (Dr.) Majulika Srivastava, Director – CIQA, IGNOU Editorial Committee Prof. (Dr) Nilesh Arora Dr. Ashita Chadha University School of Business University Institute of Liberal Arts Dr. Inderpreet Kaur Prof. Manish University Institute of Teacher Training & University Institute of Tourism & Hotel Management Research Dr. Manisha Malhotra Dr. Nitin Pathak University Institute of Computing University School of Business © No part of this publication should be reproduced, stored in a retrieval system, or transmitted in any formor by any means, electronic, mechanical, photocopying, recording and/or otherwise without the prior written permission of the authors and the publisher. SLM SPECIALLY PREPARED FOR CU IDOL STUDENTS 2 CU IDOL SELF LEARNING MATERIAL (SLM)

First Published in 2021 All rights reserved. No Part of this book may be reproduced or transmitted, in any form or by any means, without permission in writing from Chandigarh University. Any person who does any unauthorized act in relation to this book may be liable to criminal prosecution and civil claims for damages. This book is meant for educational and learning purpose. The authors of the book has/have taken all reasonable care to ensure that the contents of the book do not violate any existing copyright or other intellectual property rights of any person in any manner whatsoever. In the event the Authors has/ have been unable to track any source and if any copyright has been inadvertently infringed, please notify the publisher in writing for corrective action. 3 CU IDOL SELF LEARNING MATERIAL (SLM)

CONTENT Unit 1 – Introduction To Networks........................................................................................... 5 Unit 2 - Introduction 2............................................................................................................ 28 Unit 3 - Pipeline Processing ................................................................................................... 50 Unit 4 - Synchronous Parallel Processing 1 ............................................................................ 70 Unit 5 - Synchronous Parallel Processing ............................................................................... 86 Unit 6 - Distributed Systems 1................................................................................................ 99 Unit 7 - Distributed Systems 2.............................................................................................. 120 Unit 8 - Communication....................................................................................................... 141 Unit 9 - Resource And Process Management 1 ..................................................................... 179 Unit 10 - Resource And Process Management 2 ................................................................... 196 Unit 11 - Synchronization .................................................................................................... 226 Unit 12 - Mutual Exclusion .................................................................................................. 238 Unit 13 - Consistency And Replication................................................................................. 252 Unit 14 - Distributed File Systems 1..................................................................................... 266 Unit 15 - Distributed File Systems 2..................................................................................... 284 4 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 1 – INTRODUCTION TO NETWORKS 5 STRUCTURE 1.0 Learning objectives 1.1 Introduction to Parallel Computing 1.1.1 Parallel computing types 1.1.2 Applications of parallel computing 1.1.3 Advantages 1.1.4 Drawbacks of parallel computing 1.2 Parallel computing architecture 1.2.1 Computing with many cores 1.2.2 Multiprocessing with symmetry 1.2.3 Computing on a large scale 1.2.4 Trends in Architecture 1.2.5 Parallel architectures convergence 1.2.6 Design issues at the core 1.2.7 Models for parallel computer architecture 1.3 Architectural classification scheme 1.3.1 Flynn’s taxonomy of classification 1.3.2 Classification by Feng 1.3.3 Classification of handlers 1.4 Performance of Parallel Computing 1.5 Summary 1.6 Keywords 1.7 Learning Activity 1.8 Unit end questions 1.9 References 1.0 LEARNING OBJECTIVES After studying this unit, you will be able to:  Explain the basics of Parallel computing CU IDOL SELF LEARNING MATERIAL (SLM)

 Outline the importance of Parallel computing  Describe the architecture of parallel computing  List the advantages and disadvantages of parallel computing 1.1INTRODUCTION TO PARALLEL COMPUTING Parallel computing is the process of running an application or a calculation on several processors at the same time. It's a kind of computer architecture in which big issues are broken down into smaller, generally comparable pieces that can be handled all at once. Multiple CPUs communicate through shared memory to complete the task, which then combines the findings. It aids in the execution of complex calculations by dividing the huge problem across many processors. By expanding the available compute capacity of systems, parallel computing aids in quicker application processing and job resolution. Most supercomputers work on the basis of parallel computing concepts. Parallel processing is frequently utilised in operational situations that need a lot of processing capacity or calculation. This architecture is often housed in a server rack with several processors; the application server divides the computational request into tiny pieces, which are subsequently handled concurrently on each server. The first computer software was designed for serial calculation, which can only perform a single program at a time. Parallel computing, on the other hand, uses many processors to run an application as well as computation at the same time. There are many advantages to using parallel computing, including the ability to save time & expense, offer concurrency, and tackle bigger issues. Parallel computing also lowers the amount of information that has to be processed. In the real-world use of parallel computing, there's many two lines to obtain a ticket for anything; if two cashiers provide tickets to two people at the same time, it saves time and reduces complexity. 1.1.1 Parallel Computing Types There are three kinds of parallel computing accessible from open-source and private parallel computing providers, which are described below: 1. Bit-level parallelism: A kind of parallel computing where each job is determined by the size of the processor word. It minimizes the amount of instructions that processor needs execute while executing a job on huge amounts of data. The operation must be divided into a sequence of instructions. For example, suppose you have an 8-bit CPU and you need to do a 16-bit operation. It must first handle all 8 lower-order bits before moving over to 8 higher- order bits. As a result, the process requires two instructions to complete. A 16-bit CPU can execute the operation with only one instruction. 6 CU IDOL SELF LEARNING MATERIAL (SLM)

2. Instruction-level parallelism: In instruction-level parallelism, the processor determines how many instructions being implemented at the same time in a single CPU clock cycle. In instruction-level parallelism, a processor may have an address that is less than one instruction for each clock cycle phase. In instruction-level parallelism, the software method relies on static parallelism, in which the computer chooses which instructions can execute at the same time. 3. Task Parallelism: A type of parallelism wherein tasks are broken down into smaller units. After that, each subtask is assigned to be completed. Processors also execute subtasks at the same time. 1.1.2 Applications of Parallel Computing Parallel computing has a variety of applications, including the following: Databases & data mining are two of the most common parallel computing applications.  One application of parallel computing is real-time system simulation.  Technologies like Networked Videos & Multimedia. Science and engineering are two of the most important subjects.  Workplaces that encourage collaboration.  Augmented reality, sophisticated graphics, & virtual reality all utilize the idea of parallel computing. 1.1.3 Advantages of Parallel Computing The following are some of the benefits of parallel computing:  In parallel computing, additional resources are utilized to accomplish the job, resulting in a reduction in time and cost. In addition, low-cost components are utilized to build parallel clusters.  When compared to serial computation, parallel computing allows bigger problems to be solved in less time.  When compared to serial computing, parallel computing is far more suitable for simulating, modelling, and comprehending complex, real-world events.  When local resources are limited, you may have an advantage over non-local resources.  There are a variety of big tasks that are difficult or impossible to tackle on a single device; the idea of parallel computing aims to relieve these challenges.  One of the most appealing features of parallel computing would be that it enables you to do several tasks at once by using various computing resources. 7 CU IDOL SELF LEARNING MATERIAL (SLM)

 In addition, parallel computing is better for hardware since serial computing loses processing power. 1.1.4 Drawbacks of Parallel Computing Parallel computing has a number of drawbacks, including the following:  It tackles parallel architecture, which may be challenging to accomplish.  In parallel computing, improved cooling methods are required for clusters.  It necessitates the use of managed algorithms, which may be handled by a parallel mechanism.  Multi-core designs use a lot of electricity.  Low coupling and strong cohesion are required in a parallel computing system, which is challenging to achieve.  Even the most technically competent and experienced programmers may write parallelism-based computer code.  While parallel computing allows you to tackle computationally and data-intensive problems by using many processors, it may occasionally interfere with the system's integration and certain of our control methods, resulting in suboptimal results.  The additional cost of synchronization, thread formation, data transfers, and other factors may be very high; in certain cases, it may even outweigh the benefits of parallelization.  Furthermore, the parallel computing process provides various codes tuning for specific target architectures in order to improve performance. 1.2 PARALLEL COMPUTING ARCHITECTURE The level over which the hardware enables parallelism is used to classify parallel computer architecture. The following are the various types of parallel computer architectures: 8 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 1.1 Parallel Architecture 1.2.1 Computing with Many Cores A multi-core processor is a digital computer integrated circuit that has two or more separate processing cores that may execute programme instructions at the same time. Cores may be VLIW, superscalar, multi - threading, or vector architectures, and they can be implemented on a single chip die or many dies in a microchip package. Multi-core architectures are divided into two categories: heterogeneous, which includes cores that are not similar, and homogeneous, which includes only identical cores. 1.2.2 Multiprocessing with Symmetry A single operating system manages multiprocessor computer architecture with two or more homogenous, independent processors which treat all processors equally in symmetric multiprocessing. Each processor may operate on any job without worrying about whether or not the data needed for that operation is in memory, and they can be linked via on-chip mesh networks. A dedicated cache memory is also included in every CPU. 1.2.3 Computing on A Large Scale The elements of such a distributed system usually spread among many networked computers. These networked computers communicate through HTTP, RPC-like messaging queues, as well as connectors to coordinate their activities. Distributed systems are characterized by component concurrency and component failure that is independent of one another. Peer-to- peer, client-server, n-tier, and three-tier architectures are the most common types of distributed programming architectures. Because there is a lot of overlap between parallel and distributed computing, the words are often used interchangeably. 9 CU IDOL SELF LEARNING MATERIAL (SLM)

Several computers are utilised at the same time to run a list of guidelines in parallel. Grid computing is yet another method in which a number of dispersed computer systems work together to solve a problem by running at the same time and communicating through the Internet. Data locality & data communication are also linked to parallel computing. The technique of arranging all resources to optimise speed and programmability within the limitations set by technology and cost at any particular moment is known as parallel computer architecture. By using an increasing number of processors, parallel computer architecture provides a new dimension to the development of the computer systems. In theory, performance obtained via the use of a large processor is superior to the efficiency of a computer node at any given moment. 1.2.4 Trends in Architecture Technology development determines what is possible; architecture transforms the technology's promise into performance and effectiveness. Parallelism & locality are two techniques for improving performance by using more resources and transistors. These two approaches, however, fight for same resources. The number of iterations required to run the programme is decreased when several processes are performed in parallel. Resources, on the other hand, are required to support each one of the concurrent operations. Additionally, resources are required to assign local storage. An intermediary action plan that utilizes resources to use a level of parallelism as well as a level of locality achieves the greatest results. The development of computer architecture may be split into four generations, each of which includes the following fundamental technologies.  Tubes for vacuuming  Transistors  Circuits with integrated circuits  VLSI The increase in bit-level parallelism dominated the period until 1985. Microprocessors are divided into four categories: four-bit, eight-bit, sixteen-bit, and so on. The width of the data channel was doubled to decrease the number of cycles required to execute a full 32-bit operation. 64-bit operations were added later. The mid-80s into mid-90s were dominated by the rise of instruction-level parallelism. The RISC method demonstrated how easy it was to pipeline the stages of instruction processing such that an instruction is performed nearly every cycle on average. The advancement of compiler technology has increased the efficiency of instruction pipelines. Microprocessor-based computers in the mid-1980s comprised of 10 CU IDOL SELF LEARNING MATERIAL (SLM)

 A unit of integer processing  A floating-point number is a number that may be expressed in any number of ways  Cache management software  Cache data is stored in SRAMs.  Labeling and storage All of these components were integrated onto a single chip as chip capacity grew. Integer arithmetic, floating point operations, memory operations, and branch operations were all implemented separately on a single chip. It retrieves several instructions at a moment and transmits them in parallel to various functional units wherever feasible, in addition to pipelining individual instructions. Superscalar execution is the name for this kind of instruction-level parallelism. 1.2.5 Parallel Architectures Convergence Several different architectures have been designed for parallel machines. We'll talk about various parallel computer architectures and how they converge in this part. Architecture of Communication:Parallel architecture combines traditional computer architecture principles with communication architecture. Communication architecture covers the fundamental communication and synchronization activities, while computer architecture defines key abstractions (such user-system border & hardware-software boundary) and organizational structure. It also looks at the structure of the company. Figure 1.2 Physical Communication Medium 11 CU IDOL SELF LEARNING MATERIAL (SLM)

The top layer is the programming model. Programming models are used to create applications. Models for parallel programming include:  Address space that is shared  Message transmission  Parallel data processing Shared address program is similar to utilising a bulletin board, in that it allows you to interact with one or more people by publishing information at a central place that is shared by everyone. Individual activity is synchronised by noting who is responsible for which task. Message passing is similar to receiving information from a particular sender through a phone call or a letter. Figure 1.3 Shared Memory Data parallelism programming is a technique of collaborating in a structured way. Several people work on different parts of a data collection at the same time and exchange information worldwide in this scenario. Memories that are shared:One of the most significant types of parallel machines is shared memory multiprocessors. It improves multiprogramming workload performance and supports parallel programmes. All computer systems in this instance enable a processor as well as a collection of I/O controllers to access a set of memory modules through a hardware link. Memory capacity can be expanded by adding memory modules, while I/O capacity can be raised by adding devices to the I/O controller or adding more I/O controllers. Waiting for 12 CU IDOL SELF LEARNING MATERIAL (SLM)

such a powerful processor to become available or adding additional processors may improve processing capability. A central memory bridge organizes all of the resources. Any CPU may access any unique identifier in the system via the bus access method. Because all processors are equally spaced from all memory locations, the time complexity or latency of all processors on a memory location is the same. The term for this is symmetric multiprocessor. Architecture for Passing Messages: Parallel computers with message passing architecture are also a popular choice. It uses explicit I/O operations to establish communication between CPUs. Instead of using the memory system, communication is integrated just at I/O level in this instance. User communication is carried out in a message passing architecture by utilising operating system / library calls that perform a variety of lower-level tasks, including the actual communication process. As a consequence, there is a gap between both the programming model & perhaps the actual hardware communication processes. The most frequent user communication activities in a message passing system are send and receive. Send offers a specific data buffer & a distant processor to receive it. Receive provides a transmitting process as well as a local data buffers to store the sent data. An identifier or tag is added to the message during the transmit operation, and even the receiving operation defines the matching rule, such as a particular tag from such a specific processor or indeed any tag from every processor. A memory-to-memory copy is completed by combining a send or a matching receive. Each end provides a pair-wise synchronization event as well as its local data location. Convergence: The distinction between shared memory & message forwarding camps has blurred due to advancements in hardware and software. Both message passing as well as a shared address spaces are programming paradigms that provide a transparent framework for sharing, synchronization, and communication. The fundamental machine structures, on the other hand, have converged toward one common organization. Parallel Data Processing:Processor arrays, data parallel architecture, and single-instruction- multiple-data machines are all terms for the same kind of parallel computer. The programming model's primary characteristic is that operations may be carried out in parallel on each member of a big uniform data structure. Data parallel software programs are typically enforced by looking at the local code base of a set of processes, one for each processor, to create an edge elements space. Because all of the processors interact with one another and have a global view of all activities, a shared address spaces as well as message passing may be utilized 13 CU IDOL SELF LEARNING MATERIAL (SLM)

1.2.6 Design Issues at the Core The development of a programming paradigm alone will not improve the computer's efficiency, nor will the expansion of hardware alone. However, advancements in computer design may make a significant difference in computer performance. By concentrating on how programmes utilize a machine which fundamental technologies are offered, we may better grasp the design issue. We'll talk about the communication abstractions and the programming model's fundamental needs in this part. Abstraction in Communication: The primary interface between the software system and the deployment is communication abstraction. It's similar to an instruction set in that it offers a foundation for a programme to execute properly across many implementations. At this level, operations must be straightforward. Model Requirements for Programming:One or many threads operate on data in a parallel programme. A parallel programming paradigm specifies what data threads may name, what actions they can execute on that data, and in what sequence the operations are done. A parallel programme must coordinate the activities of its threads to ensure that the dependencies here between programmes are enforced. 1.2.7 Models for Parallel Computer Architecture Parallel processing has emerged as a useful technique in contemporary computers to satisfy the need for greater speed, cheaper costs, and more accurate outcomes in real-world applications. Due to the technique of multitasking, multiprocessing, or multicomputing, concurrent occurrences are prevalent in today's computers. Software packages on modern computers are strong and comprehensive. To evaluate the evolution of computer performance, we must first comprehend the fundamental evolution of operating systems. Computer Development Milestones: A computer's development may be divided into two stages: mechanical and electromechanical components. After the advent of electrical components, modern computers emerged. The operating components of mechanical computers were replaced with high mobility atoms in electronic computers. Electric signals, which move at almost the speed of light, have replaced mechanical gears and levers in data transmission. Elements: Application Programs, System Software, and User Interface. A contemporary computer system comprises of hardware, instruction sets, software applications, system software, and user interface. Numerical computation, rationality, and transaction processing are the three types of computing issues. Some complicated issues may need the use of all three processing techniques. 14 CU IDOL SELF LEARNING MATERIAL (SLM)

Computer Architecture Evolution: Over the last four decades, distributed computing has seen dramatic developments. From Von Neumann architecture to multicomputer and multiprocessors, we've come a long way. Computer System Performance: A computer system's performance is determined by both machine capabilities and software behavior. Better hardware technology, sophisticated architectural features, and effective resource management may all help machines perform better. Because it is reliant on the application and run-time circumstances, programme behavior is unpredictable. Multiprocessors and multicomputer are two terms that are often used interchangeably. In this part, we'll look at two different kinds of parallel computers.  Multiprocessors  Multicomputer  Multicomputer with Shared Memory The following are the three most popular shared memory multiprocessor models:  Memory Access in a Uniform Way (UMA): The physical memory is shared equally by all processors in this architecture. All processors have the same amount of time to access all memory words. A private cache may be available to each CPU. The same rule applies to peripheral devices. A symmetric multiprocessor is one in which all of the processors have equal access to all of the peripheral devices. An asymmetric multiprocessor is one in which only one and a few processors have access to external devices.  Memory Access That Isn't Uniform (NUMA): The access time in a NUMA multiprocessor architecture varies depending on the memory word's placement. Local memories are used to physically distribute shared memory across all CPUs. All local memories are combined to create a universal address space that can be accessible by all CPUs.  Memory-Only Cache Architecture (COMA):A particular instance of both the NUMA model is the COMA model. All distributed primary memories are transformed to cache memories in this step. Distributed Memory Multicomputer A memory management multicomputer system is made up of numerous computers, referred to as nodes, that are linked together via a message passing network. Each node functions as a self-contained computer with a CPU, local memory, and, in certain cases, I/O devices. All localized memories are private in this scenario, and only the local processors have access to them. This is why conventional machines are referred to be NORMA (no-remote-memory-access) computers. 15 CU IDOL SELF LEARNING MATERIAL (SLM)

1.3 ARCHITECTURAL CLASSIFICATION SCHEME 1.3.1 Flynn's Taxonomy of Classification Flynn's taxonomy classifies multi-processor computer architectures along two distinct dimensions: Instruction Stream and Data Stream. There are only two potential states for each of these dimensions: single or many. According to Flynn, there are four potential categories, as shown in the matrix below: Figure 1.4 Classification of Flynn SISD (Single instruction single Data)  A serial computer is one that has no parallel processors.  Single Instruction: During each clock cycle, the CPU only acts on one instruction stream.  Single Data: During each clock cycle, only one transmitted data is utilised as input.  Execution that is deterministic  This is the most traditional kind of computer.  Older mainframes, minicomputers, workstations, and single-processor/core PCs are examples. 16 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 1.5 E.g., of SISD SIMD (Single Instruction, Multiple Data)  A parallel computer of a certain kind.  Single Instruction: At any given clock cycle, all cores execute the same instruction.  Multiple Data: Every processing unit may work with a different kind of data.  Especially well-suited to specialised issues with a high degree of consistency, like graphics/image processing.  Execution that is both synchronous (lockstep) & deterministic  There are two types of processor arrays: processor arrays & vector pipelines.  Examples:  Thinking Machines CM-2, MasPar MP-1 & MP-2, ILLIAC IV Processor Arrays  IBM 9000, Cray X-MP, Y-MP, and C90, Fujitsu VP, NEC SX-2, Hitachi S820, and ETA10 Vector Pipelines  SIMD instructions & execution units are used by most contemporary computers, especially those with graphics processing units (GPUs). 17 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 1.6 SIMD Figure 1.7 E.g., of SIMD MIMD (Multiple Instructions, Multiple Data)  A parallel computer of a certain kind.  Multiple Instructions: Each CPU may be carrying out a separate set of instructions.  Various Data Streams: Each CPU may well be dealing with a separate data stream.  Execution may be deterministic or non-deterministic, synchronous or asynchronous.  The most prevalent kind of parallel computer now in use; most contemporary supercomputers belong into this group. 18 CU IDOL SELF LEARNING MATERIAL (SLM)

 Most modern quantum computers, connected parallel computer cluster and \"grids,\" multi-processor SMP systems, and multi-core PCs are only a few examples.  Note that many MIMD designs include SIMD processing sub-components as well. Figure 1.8 MIMD Figure 1.7 E.g. of MIMD MISD (Multiple Instruction, Single Data)  A parallel computer of this kind.  Multiple Instructions: Each processing unit works on the data in its own way, with its own set of instructions.  A single data streams is fed into a number of processing units.  There have been few (if any) real instances of this kind of parallel computer. 19 CU IDOL SELF LEARNING MATERIAL (SLM)

 Some examples of possible applications include:  On such a single signal stream, several frequency filters are used.  Every single coded message is cracked using various cryptography techniques. Figure 1.9 MISD Figure 1.10 E.g., of MISD 20 CU IDOL SELF LEARNING MATERIAL (SLM)

1.3.2 Classification by FENG  Tse-yun Feng proposed using degrees of parallelism to categorise different computer architectures.  The largest amount of binary code that may be used in a calculation  A computer system that processes data in a unit of time is referred to as  P is the greatest degree of parallelism.  A bit slices is a sequence of bytes, one in each of the phrases at the end of the sentencemaintaining the same vertical profile Under the preceding categorization  Word Serial & Bit Serial are two different types of serialization (WSBS)  Word Parallelism vs. Bit Serialism (WPBS)  Word Parallel and Bit Serial (WSBP)  Word Parallelism and Bit Parallelism are two types of parallelism (WPBP) WSBS stands for bit parallel processing since one byte at a time is processed. WPBS is also known as bit slice processing, at a moment, an m-bit slice is processed. WSBP is a programme that may be found on almost all PCs. Because of this, it's been dubbed \"Word Slice Processing.\" At a moment, one phrase of n bits is processed. WPBP refers fully parallel processing. At a point in time, an array of n x m bit is processed. 1.3.3 Classification of handlers  Wolfgang Handler developed a categorization system for determining the degree of parallelism.  The degree of pipelining is incorporated into the hardware construction of a system of computers He examines three subsystems.  Levels:  The Processor Control Unit (PCU) is a device that controls the processing of data (PCU)  Unit of Arithmetic Logic (ALU)  Circuit at the Bit Level (BLC)  Each PCU represents a single processor or CPU. Processor Element is the same as ALU (PE). The combinational logic circuitry is referred to as BLC. In the ALU, 1-bit operations are required. 21 CU IDOL SELF LEARNING MATERIAL (SLM)

1.4PERFORMANCE OF PARALLEL COMPUTING There's a difference among parallel computing & cloud computing. Cloud computing is a broad phrase that refers to the on-demand, pay-as-you-go delivery of scalable services including such databases, data storage, networking, servers, and software via the Internet. Cloud computing services may be public / private, are completely controlled by the provider, and allow for remote access to the data, work, and apps from any device with an Internet connection. Infrastructure as a Service (IaaS), Platform as a Service (PaaS), & Software as a Service (SaaS) are the three most popular service types (SaaS). Cloud computing is a new paradigm shift in software development which enables greater exposure to parallel computing mostly through vast, virtual machine clusters, enabling the typical consumer and smaller organisations to take advantage of parallel processing and memory options previously reserved for large corporations. Parallel processing vs. Parallel computing: Parallel processing is a process of computing in which different portions of a more complicated job are split up and executed on several CPUs at the same time, decreasing processing time. Computer scientists divide and allocate each job to a separate processor with the use of massively parallel software applications, which will also operate to reassemble & read the data after each processor having solved its own equation. A communications system or a computer containing two or even more processors is used to carry out this operation. Parallel processing & parallel computing are frequently used interchangeably since they occur in tandem; however, while parallel processing refers to the number of cores & CPUs operating in parallel in a computer, parallel computing refers to the way software acts to optimise for that situation. When it comes to computing, there are two types: sequential and parallel. Sequential computing, also characterized as serial computation, is the use of a single CPU to run a programme that is broken problem into a set of discrete instructions that are performed one after the other without any overlap. Software has historically been written in a sequential manner, which is a more straightforward method but is constrained by the processor's speed and capacity to execute each set of instructions. Whereas sequential data structures are used on single-processor computers, concurrent data structures are used in parallel computing settings. Measuring performance with sequential programming is much less complicated and essential than benchmarking in parallel computing since it usually just entails detecting system bottlenecks. Benchmarking & performance test automation frameworks, which use a range of measuring methods, including such statistical treatment and numerous repetitions, may be 22 CU IDOL SELF LEARNING MATERIAL (SLM)

used to obtain benchmarks in parallel computing. Parallel computing in data science, machine modelling parallel computing, & parallel computing artificially intelligent application cases all benefit from the ability to bypass this bottleneck by transferring data across the memory hierarchy. Parallel computing is essentially the polar opposite of sequential computing. While parallel computing is more complicated and expensive up front, the benefit of being able to solve problems quicker often exceeds the expense of parallel computing gear. 1.5 SUMMARY  Parallel computing is the act of breaking down complex tasks into small, independent, and frequently related sections that may be executed concurrently by several processors communicating through shared memory, with the results being merged as part of an overall algorithm. Parallel computing's main purpose is to enhance available computing capacity for faster application processing and issue solving.  The application server distributes computation requests in small chunks, which are then executed simultaneously on each server. Parallel computing infrastructure is generally housed within a single data centre, where several processors are installed inside a server rack; computation requests have been distributed in smaller pieces by the application server, which are then executed simultaneously on each server.  Fine-grained parallelism, in which subtasks communicate many times per second; coarse-grained parallelism, in which subsystems do not communicate multiple times per second; or humiliating parallelism, in which subtasks rarely or never communicate, are the three types of parallel applications. Mapping is a technique for solving embarrassingly parallel issues in parallel computing by applying a single operation to all items of a sequence while requiring communication between subtasks.  Parallel computing became popular and evolved in the twenty-first century as a result of processor frequency scaling meeting a power limit. Because scaling the frequency of a processor is no longer feasible after a certain point, programmers and manufacturers started developing parallel system software and producing energy - efficient processors with multiple cores to address the issue of power consumption as well as overheating central processing units.  With the increased use of multicore CPUs and GPUs, parallel computing is becoming increasingly important. GPUs and CPUs work together to boost data speed and the amount of simultaneous calculations within an application. A GPU can finish more work in a given period of time than a CPU thanks to parallelism. 23 CU IDOL SELF LEARNING MATERIAL (SLM)

1.6 KEYWORDS  Abstract data-. A data type is defined by its set of permissible values and the operations that can be performed on those values. The values and operations were defined independently of how the values are represented or how the actions are implemented. The interface of the type discloses the operations on it in a programming language that natively supports ADTs, but the implementation is secret and can (in theory) be modified without impacting clients using the type. A stack is a basic example of an ADT, which is described by its operations, which are usually push and pop. There are numerous internal representations that can be used.  Address Space- The amount of memory that a process or processor has access to. This could apply to either real or virtual memory, depending on the context.  A.P.I. stands for Application Programming Interface. The calling conventions and other information required for one software module (usually an application programme) to use the services provided by some other software modules are defined by an API. MPI is a parallel programming API. The term is also used to describe the notation used by programmers to indicate a specific capability in a software. The OpenMP specification, for example, is referred to as an API. Any application written to an API can be recompiled to operate on every system that supports that API, which is a significant feature.  Cluster. Any group of computers that are connected and utilised as a parallel computer or to construct a redundant system with higher availability. In theory, the computers in a cluster may be used as stand-alone computers because they are not specialised for cluster computing. In other words, the cluster's components, including the computers and networks that connect them, are not specifically designed for use in the cluster. Ethernet-connected workstation networks with rack-mounted parallel computing workstations are two examples. 1.7LEARNING ACTIVITY 1. List the three main software components that may fail when a client process invokes a method in a server object, giving an example of a failure in each case. Suggest how the components can be made to tolerate one another’s failures. 2. Use the World Wide Web as an example to illustrate the concept of resource sharing, client and server. What are the advantages and disadvantages of HTML, URLs and HTTP as core technologies for information browsing? Are any of these technologies suitable as a basis for client-server computing in general? 24 CU IDOL SELF LEARNING MATERIAL (SLM)

1.8UNIT END QUESTIONS A. Descriptive Questions Short questions 1. Define Parallel Computing. 2. List the types of parallel computing. 3. State the advantages of parallel computing. 4. What is multi- processing with symmetry? 5. What are the trends in architecture? Long Questions 1. Explain the applications of parallel computing 2. Describe the drawbacks of parallel computing 3. Explain the architecture of parallel computing 4. Explain parallel architectures convergence 5. State the models for parallel computer architecture B. Multiple Choice Questions 1. Which problems can be handled by recursive decomposition_____________ a. greedy method b. backtracking c. divide and conquer problem d. branch and bound 2. A type of parallelism that uses micro architectural techniques______________ a. instructional b. bit level c. bit based d. increasing 3. A term for simultaneous access to a resource, physical or logical___________ 25 CU IDOL SELF LEARNING MATERIAL (SLM)

a. Multiprogramming b. Concurrency c. Multitasking d. Threads 4. Writing parallel programs is referred to as___________ a. Parallel development b. Parallel processes c. Parallel computation d. Parallel programming 5. Which of the following is an example of data decomposition _____________? a. merge sort b. quick sort c. matrix multiplication d. 15 puzzles Answers 1-c, 2-a, 3-b, 4-c, 5-c 1.9 REFERENCES Reference books  George Coulouris, Jean Dollimore, Tim Kindberg, “Distributed Systems: Concepts and Design” (4th Edition), Addison Wesley/Pearson Education.  Pradeep K Sinha, “Distributed Operating Systems: Concepts and design”, IEEE computer society press. Text Book References  M.R. Bhujade, “Parallel Computing”, 2nd edition, New Age International Publishers2009.  Andrew S. Tanenbaum and Maarten Van Steen, “Distributed Systems: Principles and Paradigms, 2nd edition, Pearson Education, Inc., 2007, ISBN: 0-13-239227-5. 26 CU IDOL SELF LEARNING MATERIAL (SLM)

Websites:  https://www.geeksforgeeks.org/introduction-to-parallel  https://hpc.llnl.gov/training/tutorials/introduction-parallel-computing-tutorial  https://www.javatpoint.com/what-is-parallel-computing 27 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 2 - INTRODUCTION 2 STRUCTURE 2.0 Learning Objectives 2.1 Metrics of Performance 2.1.1 Hash function size and numbers of bloom filter 2.1.2 Importance of metrics in software testing 2.1.3 Metrics used in software testing 2.1.4 Manual test metrics 2.1.5 Life cycle of test metrics 2.2 A model for parallel programming 2.2.1 Channels and tasks 2.2.2 Other programming models 2.3 Introduction to parallel algorithms 2.4 Analysis of Parallel algorithm 2.5 Summary 2.6 Keywords 2.7 Learning Activity 2.8 Unit end questions 2.9 References 2.0 LEARNING OBJECTIVES After studying this unit, you will be able to:  Understand the metrics of performance  Explain about parallel algorithms  Outline the computational model  Describe the analysis of parallel algorithm 28 CU IDOL SELF LEARNING MATERIAL (SLM)

2.1 METRICS OF PERFORMANCE Bloom filters may be traded off on three performance metrics: calculation and the execution time (correlates to the number k of hash functions), filter size (corresponds to the number m of bits), and likelihood of error (equates to the false positive rate). Bloom filters may be traded off on three performance metrics: calculation and the execution time (correlates towards the number k unique hash functions), filter size (equates to a number m of bits), and likelihood of error (equates to a false positives). f = ((1 − p)k ) To improve lookup speed and space economy, a Bloom filter (BF) adds an error tolerance. The Bloom filter returns one of two values: true or false. As a consequence, the Bloom filter's output may be classified into one of four categories- true positive & false positive, true negative & false negative. The Bloom filter has a maximum number of false positives. A system's overhead is caused by both false positives and false negatives. The Bloom filter uses an array to hold an element's information. If a Bloom filter optimal response when the element is held, this is a false positive. False negative is defined similarly: a Bloom filter returns negative while the element is held. As a result, the Bloom filter is a probabilistic data structure. 2.1.1 Hash function size and numbers of bloom filter We realise that if the bloom filter's size is too tiny, most of the bit’s field would eventually become '1', and the bloom filter must return a 'false positive' for each supplied value. As a result, choosing the size of a bloom filter is a critical choice to make. A bigger filter produces fewer false positives, whereas a smaller one produces more. As a result, we can deduce that the size of the bloom filter is entirely determined by a ‘false positive error rate.' Another critical element is the number of hash functions we'll employ. The slower a bloom filter is and the faster it fills up, so more hash functions you need. However, when we have fewer, we may suffer from a high number of false positives. With the formula, we can estimate false positive failure rate, p, depending on the filter size, m, total number of the hash functions, k, as well as the number of items entered, n. p≈(1-e(-kn/m))k We'd mostly need to figure out where the m & k would be. So, if we specify / fix an error tolerant value p & the number of components n ourselves, we may compute these values using the formulae below. (-n ln p)/m=(-n ln p)/m=(-n ln p) (ln 2)2 (m/n)*k=(m/n)*k=(m/n)*k=( (ln 2) 29 CU IDOL SELF LEARNING MATERIAL (SLM)

Software testing metrics were quantitative measures used to assess the quality, performance, & progress of the software testing process. This allows us to collect accurate information on the software testing process & improve its efficiency. Developers will be able to create proactive and accurate choices for future testing processes as a result of this. In software testing metrics, what is a metric? A metric is a measurement of how well a system and its components maintain a certain characteristic. A metric isn't defined just for the purpose of documentation by testers. It has a wider range of applications in software testing. Developers, for example, may use a metric to estimate how time required to create software. It may also be used to calculate the number of new features, tweaks, and other changes made to the programme. 2.1.2 The importance of metrics in software testing As previously said, test metrics are critical in determining the software's quality and performance. Developers may use appropriate software testing metrics to:  Determine what kinds of enhancements are needed to produce a high-quality, defect- free software product.  Make informed choices regarding the testing stages that follow, such as scheduling future projects and calculating the total cost of those initiatives.  Evaluate the existing technology or procedure to see whether it requires any further changes. 2.1.3 Metrics used in software testing Software testing metrics may be divided into three categories. Metrics for the Process The features & execution of the project are defined by process metrics. These qualities are critical to the Software Development Life Cycle's process's improvement and maintenance (SDLC) Metrics for Products A product's durability, size, performance, design, & complexity are all defined by product metrics. Developers may improve the quality of their software development by using these qualities. Metrics for the Project Overall general quality of such a project is determined by project metrics. It is used to estimate a project's resource and deliverables, as well as to assess costs, productivity, and faults. 30 CU IDOL SELF LEARNING MATERIAL (SLM)

It is critical to determine the appropriate testing metrics for process. There are a few things to think about.  Prior to creating the metrics, carefully choose your target audiences.  Define the objective for which the measurements were created.  Prepare metrics based on the project's unique needs.  Calculate the financial benefit of each measure.  Match the metrics to a project lifestyle phase for the best results. Manual and automated software testing are two types of software testing. Manual testing is carried out in a step-by-step manner by QA analysts. Automated testing, on the other hand, uses the test automation frameworks, tool, & software to run tests. Both human and automated testing have advantages and disadvantages. Manual testing is really a time-consuming procedure, but it enables testers to deal with more complicated situations. The most important benefits of the automated testing are how it allows testers to perform more testing in much less time while also covering a large number of variations that would be virtually difficult to compute manually. 2.1.4 Manual test metrics: what are they and how do they work? There are two kinds of manual test metrics. Metrics to Start with Analysts gather data throughout the creation and execution of test cases to provide base metrics. By creating a project status report, these metrics are sent to testing leads and project managers. Calculated measurements are used to quantify it.  The total number of test instances  The number of actual test cases completed  Metrics that have been calculated Data of base metrics is used to create calculated metrics. The test lead collects this information and transforms it into more useful information for monitoring project progress at the module, tester, and other levels. It's an important component of the SDLC since it allows developers to make critical software enhancements. The kinds of metrics that are often utilized both developers and testers are listed below.  Defect metrics: This statistic helps engineers understand the many elements of software quality, such as functioning, efficiency, installation stability, usability & compatibility, and so on.  Faults discovery rate: this metric is used to determine all the pattern of defects over a certain period of time. 31 CU IDOL SELF LEARNING MATERIAL (SLM)

 The severity of the fault allows the developer to see how the issue will affect the software's quality.  Defect cause it's utilized to figure out what's causing the problem.  Test Coverage identifies the number of test cases allocated to the software. This metric guarantees that the testing is completed completely. It also helps in the verification of code flow and the testing of functions.  Problem fixing time is a measurement of how long it takes to repair a defect.  Test case efficiency indicates how effective test cases are in detecting problems.  Schedule adherence: Its main goal is to determine the time gap between the intended and actual execution times of a schedule. 2.1.5 Life cycle of test metrics There are four phases throughout the life cycle in test metrics.  Analysis – During this step, developers discover and specify the necessary metrics.  Communicate - Developers must convey the significance of metrics towards the stakeholder and the testing team after they have been discovered.  Data quantification and verification are part of the evaluation step. The data must then be used by testers to determine the metric's value.  Report - After the assessment is completed, the development team must write a report that includes a comprehensive description of the findings. The report is then disseminated to stakeholders and other relevant individuals. After carefully reviewing the material, the stakeholders provide comments. Different metrics have their own set of rules for analysis. As a result, it's critical to choose the appropriate measurements for the programme. Utilizing software testing metrics to monitor and repair problems early is a productive and effective method to do so. 2.2 A MODEL FOR PARALLEL PROGRAMMING Von Neumann's machine model presupposes a processor that can execute instructions in a sequential order. In addition to different arithmetic operations, an instruction may specify an address of the datum to all be read and written in memory, as well as the location of next instruction to be performed. While writing machine language may be used to programme a computer in term of this fundamental model, this approach is too complicated for most applications since we need to keep tracking the millions of the memory locations & arrange its execution of thousands of the machine instructions. As a result, modular design methods are used, in which large programmes are built from basic components and components are organized using higher-level abstractions like data structures, iteration loops, & procedures. 32 CU IDOL SELF LEARNING MATERIAL (SLM)

Procedures, for example, make it simpler to take advantage of modularity by enabling objects to be handled without regard for their underlying structure. High-level languages like Fortran, Pascal, C, & Ada, for example, enable ideas described in such abstractions to be automatically converted into executable code. Parallel programming presents additional layers of complexity: not only the number of the instructions executed rise, but we'd also have to explicitly manage each execution with multiple processors as well as coordinate large numbers of interprocessor interactions if we were to programme at the lowest level. As a result, abstraction & modularity are now at least as essential in concurrent programming as they are in sequential programming. In fact, in addition of concurrency, scalability, and locality, we will highlight modularity as a fourth basic criterion for parallel software. Parallel data model  Model of the task graph  Workforce pooling model  Slave master model  The producer-consumer model, often known as the pipeline model  Hybrid design 2.2.1 Channels And Tasks Figure 2.1 a Basic Parallel Programming Model A basic parallel programming paradigm is shown in Figure 2.1. The diagram depicts both the current status of the computation as well as a comprehensive representation of a particular job. A computation is made up of a series of tasks (shown by circles) that are linked together via channels (arrows). A task encompasses a programme and local memory, as well as defining a collection of ports that serve as the task's interface to the outside world. A channel 33 CU IDOL SELF LEARNING MATERIAL (SLM)

is just a message queue from which a sender may put messages & where a receiver will remove messages, with messages not accessible blocking the channel. The issue from which abstractions are acceptable and helpful in such a parallel programming paradigm is the next topic we'll look at. Mechanisms that allow for explicit discussion of concurrency and locality, as well as the creation of scalable and modular systems, are clearly required. Abstractions that are easy to work with and fit with architectural model, the multicomputer, are also required. While there are many abstractions that might be explored for such a purpose, two in particular fit the bill: task and channel. The four fundamental task activities are shown in Figure 1.8. A task may send the message and receive a message, generate other tasks (which are suspended till they finish), and terminate in adding to read and write in local memory. Figure 2.2 The four basic task actions. In addition to reading and writing local memory, a task can send a message, receive a message, create new tasks (suspending until they terminate), and terminate. 1. One or more jobs make up a parallel computation. The tasks run in parallel. During the execution of the programme, the number of activities may change. 2. A task is a logical unit that contains a sequential programme and local memory. (It was a virtualized von Neumann machine, in effect.) Its interface to its surroundings is also defined by a collection of imports and outports. 34 CU IDOL SELF LEARNING MATERIAL (SLM)

3. A task may transmit messages on its own outports & receive messages on its own imports, create other tasks, and terminate itself, in addition to learning to write its local memory. 4. A transmit operation is asynchronous, meaning it finishes right away. A synchronous receive operation causes the task's execution to pause till a message is received. 5. Message queues known as channels may link outport/import pairs. Channels may be established and removed, and messages can include references to channels (ports), allowing connection to change on the fly. 6. Tasks may be mapped onto physical processors in a variety of ways; the mapping used has no effect on a program's semantics. Multiple jobs, in particular, may be mapped to the single processor. The task abstraction offers a framework for discussing locality: data stored in such local memory of task is 'near,' whereas data stored elsewhere is' remote.' The channel abstraction is a method for signalling that a job's computation needs data from another task to continue. Example2.1 Consider the construction of a bridge. Consider the following real-life scenario. A bridge will be built using girders manufactured at a foundry. These two operations are coordinated via the use of trucks to carry girders first from foundry to a bridge construction site. Both foundry and bridge are represented as tasks, as well as the stream of trucks is represented as a channel, in Figure 1.9(a). This method allows for simultaneous bridge assembly and girder construction with no intentional co - ordination: a foundry crew loads girders onto trucks as they will be made, and the assembly team attaches girders to a bridge as they arrive. Figure 2.3 Two solutions to the bridge construction problem 35 CU IDOL SELF LEARNING MATERIAL (SLM)

Two approaches to the issue of bridge building. Both of the foundry as well as the bridge assembly location are represented as distinct jobs. The first employs a single channel via which foundry-produced girders are delivered at the same pace as they are created. If the foundry produces girders quicker than the bridge consumes them, girders will pile on the building site. To prevent overflow, the second method utilises a secondary channel to send flow control signals from the bridge to the foundry. The foundry may be able to manufacture girders far quicker than assembly crew can utilise them, which is a drawback of this system. When girder supplies run low, the assembly crew may expressly request additional girders to avoid a bridge site from ever being overburdened. Figure (b) depicts this improved method, also with stream of requests seen as a secondary channel. When the bridge is finished, the second channel may also be utilised to stop the flow of girders. Performance, mappings independence, modularity, & determinism are some of the other characteristics of such a task/channel programming paradigm that we'll look at today. Performance. Procedures and data structures are useful sequential programming abstractions because they can be translated to the von Neumann computer quickly and easily. The job and channel are mapped to the multicomputer in a similar way. A task is often a piece of code which may be performed in a single processor in a sequential manner. The channel link is performed as interprocessor communication if two jobs that share one channel are mapped to separate processors; if they are assigned to the same processor, a more efficient method may be employed. Independence is mapped out. Because tasks communicate via that technique (channels) irrespective on task location, the outcome calculated by a programme is unaffected by task execution location. As a result, algorithms may be developed and implemented without regard for both the number of processors that will be used to execute them; in fact, algorithms are often designed to generate so much tasks than processors. This is a simple method of accomplishing scalability: as the number of processors grows, the number of jobs per processor decreases, but the algorithm remains same. The introduction of even more tasks than that of processors could also help to conceal communication delays by allowing other computation to take place while communication to access distant data is being conducted. Modularity. Different components of a programme are created independently, as separate modules, and instead merged to form a full programme in a modular programme architecture. Interactions between modules are limited to interfaces that are well-defined. As a result, module implementations may be modified without affecting others components, and the characteristics of a programme can be deduced from its module specifications and the code that connects them. Modular design, when used correctly, lowers programme complexity and enables code reuse. 36 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 2.4 The task as building block The task as a component: (a) The foundry & bridge activities are interconnected as construction components. (b) As a result, the two jobs may be combined to create a complete programme. (c) Tasks were interchangeable: a different application may be obtained by substituting another task with a suitable interface. The job is an excellent starting point for modular design. A task, as shown in Figure 1.10, contains both data & the code which manipulates it; the ports by which it sends & receive messages serve as its interface. As a result, under the task/channel model, the benefits of modular design described in the preceding paragraph are immediately accessible. The task/channel model as well as they should want programming paradigm have a lot in common. Tasks, like objects, contain data as well as the code that manipulates it. Concurrency, the use of channels instead of method calls to describe interactions, and the absence of inheritance support are all distinguishing characteristics of the task/channel paradigm. Determinism. When an algorithm or programme is run with a certain input, it always produces the same result. If repeated executions with same input produce various results, it is nondeterministic. While nondeterminism has its uses and should be encouraged, a parallel programming paradigm that makes writing deterministic algorithms simple is very desired. Deterministic programmes are often simpler to comprehend. In addition, rather of 37 CU IDOL SELF LEARNING MATERIAL (SLM)

considering all potential execution sequences while testing for accuracy, one more execution sequence of a parallel programme must be examined. The task/channel model's 'arms-length' interactions make determinism very simple to ensure. It is sufficient to check how each channel has a unique sender & a single receiver, and that a job receiving on even a channel block until a receiving operation is completed, as we're seeing in Part II when we discuss programming tools. When non - deterministic interactions are needed, these requirements may be loosened. In the case of bridge building, determinism implies that the bridge will be built at the same pace regardless of a rate at which foundry produces girders or the assembly crew assembles them. If the assembly group gets ahead of both the foundry, the foundry will be stalled until girders are delivered. As a result, rather of trying to continue building with half-completed girders, it simply suspends activities until additional girders are available. Likewise, if another foundry produces girders quicker than assembly crew can utilise them, the girders will just pile up until they are required. Even if multiple bridges were built at the same time, determinism would be guaranteed: It is impossible to mix girders intended for various bridges as long as they move on separate channels. 2.2.2 Other Programming Models The task/channel model would be used to explain algorithms often in the following chapters. This paradigm, however, is far from the only one that may be used to describe parallel computing. There have been many different models suggested, each with its own adaptability, task interaction mechanisms & task granularities, as well as support for localization, scalability, and modularity. We'll look at a few different options here. The message is being passed forward. Message passing is arguably today's most used parallel programming paradigm. Task/channel programmes, like message-passing programmes, generate many tasks, each enclosing local data. Each job has its own name, and tasks communicate with one another by communicating to and from specified tasks. Message passing is essentially the same as the task/channel paradigm in this regard, with the exception of the data transfer method. Instead of sending a message to 'channel ch,\" we might send a message to 'task 17.\"further, where we examine the Message Passing Interface, we go through the message-passing paradigm in more depth. We show in that chapter why channel specification is a valuable discipline even when building message-passing systems since it requires us to think about a parallel program's communication topology. The message-passing architecture allows for dynamic task creation, multiple task execution per processor, and the processing of different programmes by separate tasks. Most message- passing systems, on the other hand, generate a set number of identical jobs at programme starting and do not enable tasks to be added or deleted during programme execution. Because each job runs the same programme but works on distinct data, these systems are considered to implement a single programme. The SPMD paradigm is adequate for a broad variety of 38 CU IDOL SELF LEARNING MATERIAL (SLM)

parallel programming issues, although it hinders the creation of certain parallel algorithms, as described in future chapters. Data parallelism is a term that refers to the processing of data in Another common parallel programming model and data parallelism, entails taking advantage of concurrency that arises from performing the same operation on multiple elements of a data structure, such as \"add 2 among all elements of such an array\" or \"increase the wage including all employees of 5 years of service.\" A data-parallel programme is made up of a series of operations like this. The inherent granularity of the data-parallel computation is tiny, and the notion of 'locality' does not emerge naturally since each action on each data piece may be thought about as an independent job. As a result, data-parallel compilers often ask the programmer to specify how data should be spread among processors, or how data should be partitioned into jobs. The compiler may then automatically generate communication code by converting the data- parallel programme into an SPMD formulation. Memory that is shared. Tasks share the common address space inside the shared-memory programming paradigm, which they read and write asynchronously. To manage access to a shared memory, several methods like locks and semaphores may be employed. From the perspective of the programmer, one benefit of this paradigm is that there is no concept of data ownership, thus there is no need to define precisely how data is communicated from consumers to producers. This approach has the potential to make software creation easier. Understanding and maintaining locality, on the other hand, becomes more complex, which is an essential issue (as previously said) in most shared-memory systems. Writing deterministic programmes may potentially be more challenging. 2.3 INTRODUCTION TO PARALLEL ALGORITHMS Algorithm is a sequence of steps that takes users input and generates an output after some processing. A parallel algorithm one that can run several instructions on various processing devices at the same time and then integrate all of individual outputs to create the final result. Processing In Multiple Threads The widespread availability of computers, along with the expansion of the Internet, has altered how we store and analyse data. We live in an era where data is readily accessible. Every day, we deal with massive amounts of data that need sophisticated computations in a short amount of time. We may need to gather data from occurrences that are linked or happening at the same time. This is when concurrent processing is required to split a complicated job and process it across many systems in order to generate the result in a timely manner. When dealing with large amounts of complicated data, concurrent processing is important. Large database access, aeroplane testing, astronomical computations, atomic & nuclear 39 CU IDOL SELF LEARNING MATERIAL (SLM)

physics, biological analysis, economic planning’s image processing, robotic, weather prediction, and web-based services are just a few examples. What is the definition of parallelism? The technique of executing several sets of instructions at the same time is known as parallelism. It cuts down on overall calculation time. Parallelism may be achieved via the use of parallel computers, which are computers with many processors. Parallel computers require multitasking algorithms, programming languages, compilers, and operating systems. We will solely talk about parallel algorithms in this lesson. Before we proceed any farther, let's talk on algorithms and their many kinds. What is the definition of an algorithm? A programme is a group of instructions that must be followed in order to solve problems. We must consider an architecture of the machine upon which algorithm will be run while developing an algorithm. There are two kinds of computers based on their architecture.  Computer Sequential  Computer with several processors There are two kinds of algorithms, depending on an architecture of computers.  Sequential Algorithm: A problem-solving algorithm in which many stages of instructions are performed in a sequential sequence.  Parallel Algorithm: The issue is split into sub-problems, which are then performed in parallel to provide separate outputs. These various outputs are then merged to produce the final desired result. It's not simple to break down a big issue into smaller chunks. Data dependencies may exist among sub-problems. As a result, in order to solve the issue, the processors must interact with one another. The time required for processors to communicate with one another has been discovered to be more than actual processing time. As a result, appropriate CPU usage should be addressed while developing a parallel algorithm in order to get an efficient method. To correctly develop an algorithm, we must first understand the fundamental paradigm of computing in such a parallel computer. Computational Model Algorithms are used by either sequential & parallel computers to work on a collection (stream) of instructions. The computer is told what to perform in each step by this set of instructions (algorithm). Computers may be divided into four types based on their instruction and data streams. 40 CU IDOL SELF LEARNING MATERIAL (SLM)

 Computers with a single instruction stream as well as a single digital signal (SISD)  SIMD (Single Instruction, Multiple Data Stream) computers  Computers with multiple instruction streams and a single data stream (MISD)  Computers with multiple instruction streams and multiple data streams (MIMD) Computers At Sisd One control unit, 1 processor unit, & one memory unit make up SISD computers. Figure 2.5 SISD Model The processor gets single stream of an instructions from control unit and works on even single stream of data from of the memory unit in this kind of computer. At each stage of the calculation, the processor gets one command from either the control unit & acts on one piece of data from memory unit. Simd Computers (Single-Input Multiple-Output) SIMD computers have a single control unit, several processing units & shared memory or a network of interconnections. Figure 2.6 SMID Model All processing units get instructions from a single control unit. Each processor receives a separate list of instructions from control unit & operates on a distinct set of data from memory unit at each stage of the calculation. 41 CU IDOL SELF LEARNING MATERIAL (SLM)

Each processor unit seems to have local memory unit of its own where data and instructions are stored. Processors in SIMD computers must interact with one another. This is accomplished via the use of shared memory or an interconnection network. The remaining processors await in their next set of guidelines though some of the processors perform a batch of instructions. The control unit's instructions determine whether a processor would be active (executing instructions) and inactive. Computers from MISD MISD computers have multiple control units & multiple processor units, or one common memory unit, as the name implies. Figure 2.7 MISD Model Each CPU does have its own control unit, although they all share a memory unit. All of the processors get instructions out of their own control unit, & they operate on the single data stream in accordance with the instructions then have received. This processor is active at the same time. Computers from MIMD MIMD computers feature a shared memory and connectivity network, as well as numerous control units and processor units. 42 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 2.8 MIMD Model Every processor seems to have control unit of its own, local memory unit, & arithmetic and logic unit in this configuration. They work on separate sets of data and get different instructions form their respective control units.  Multiprocessors are MIMD computers that share a common memory, while multicomputer are MIMD computers that utilise an interconnection network.  Multicomputer are classified into two kinds dependent on the site distance between the processors.  When all of the processors are extremely near to one another, it is called a multicomputer.  A distributed system is one in which all of the processors are located in separate locations. 2.4 ANALYSIS OF A PARALLEL ALGORITHM An algorithm's analysis allows us to evaluate whether or not the algorithm is helpful. In general, an algorithm is evaluated based on how long it takes to execute (Time Complexity) and how much space it takes up (Space Complexity). Storage space is also no longer a problem since advanced memory devices are now accessible at affordable prices. As a result, spatial complexity isn't given as much weight. Parallel algorithms are intended to boost a computer's processing performance. We usually examine the following factors while evaluating a Parallel Algorithm:  Complexity of time (Execution Time),  The total number all processors in operation, as well as 43 CU IDOL SELF LEARNING MATERIAL (SLM)

 The total price. Complexity Of Time The primary motivation for creating parallel algorithms would have been to decrease an algorithm's calculation time. As a result, determining an algorithm's execution time is critical in determining its efficiency. The time it takes a solution to determine a problem is measured in execution time. The entire execution time is computed from the start of the algorithm to the end of the algorithm. If all of the processors may not begin or complete execution at the very same time, the overall execution time of an algorithm is calculated from the time the first processor began to the time the final processor finished. An algorithm's time complexity may be divided into three types.  Worst-case complexity occurs when an algorithm's time requirement for a particular input is at its highest.  Average-case complexity is when an algorithm takes an average amount of time for a particular input.  Best-case complexity occurs when an algorithm requires the least amount of time for a given input. Analysis Of Asymptotic Values The complexity / efficiency of even an algorithm represents total number of steps required to have the desired result. In a theoretical study, asymptotic analysis is used to determine the complexity of the algorithm. To compute the complication function of an algorithm in asymptotic analysis, the significant length of input is needed. Note that an asymptotic condition occurs when a line approaches a curve but does not cross it. The line as well as the curve are asymptotic to one another in this case. Using high limits & low bounds on speed, asymptotic notation is indeed the simplest method to express the quickest and slowest feasible execution times for an algorithm. We just use following notation for this:  Notation with a big O  Alphabetic notation  Theta notation is a kind of notation that is used to represent Notation With a Big O The asymptotic properties of functions are represented in mathematics using Big O notation. It includes a small and accurate technique to describe the behaviour of a function with big inputs. It's a way of expressing the algorithm's execution time's upper limit. It indicates the longest possible time for the algorithm to finish its execution. f(n) = O(g(n)) is a function. 44 CU IDOL SELF LEARNING MATERIAL (SLM)

if there are positive constant c & n0 so that f(n) = c * g(n) for every n, where n is less than n0. Notation In the Form of An Omega The lower limit of an algorithm's execution time is represented using Omega notation. Using the function (g(n)) = f(n) if there are positive constants c & n0 such that f(n) = c * g(n) for every n, where n is less than n0. Notation In the Form ofa Theta Theta notation is just a way to express both the bottom and upper bounds of an algorithm's execution time. Using the function (g(n)) = f(n) if positive constant c1, c2, & n0 exist by which c1 * g(n) f(n) c2 * g(n) for every n where n n0 Acceleration of An Algorithm Calculating the speedup of a parallel method determines its performance. A worst execution time of a fastest known sequential method for a given issue divided by the worst-case runtime of the parallel solution is known as speedup. When evaluating the effectiveness of a parallel algorithm, the processor utilised is critical. The cost of purchasing, maintaining, and operating the computers is calculated. The more processors utilised via an algorithm to deal with a problem, the more expensive the solution becomes. 2.5 SUMMARY  Figures and data that depict an organization's activities, abilities, and general quality are referred to as performance metrics. Because these metrics assist guide and assess an organization's success, it's critical that organisations choose their key performance indicators and focus on these areas.  Productivity, profit margin, scope, and cost are just a few examples of performance indicators that can be used to see if a company's targets and goals are being fulfilled. A firm is divided into numerous areas, each with its own set of key performance indicators.  Metrics provide objective performance indicators, allowing you to \"manage by fact.\" It's not about how many hours your employees work or how busy they are when it comes to evaluating their performance. What exactly did they accomplish? It's all about the outcomes they're getting. 45 CU IDOL SELF LEARNING MATERIAL (SLM)

 Parallel computing, in its most basic form, is the use of multiple compute resources to solve a computational problem at the same time: a problem is divided into distinct sections that may be performed concurrently. Each component is further subdivided into a set of instructions.  Parallel Processing Systems are designed to speed up programme execution by breaking the programme into several fragments and processing them all at the same time. Parallel computing is a development of serial computing in which jobs are divided down into separate sections that can be done at the same time. 2.6 KEYWORDS  Parallel file system. A file system that is visible to any processor in the system and can be read and written by multiple UEs simultaneously. Although a parallel file system appears to the computer system as a single file system, it is physically distributed among a number of disks. To be effective, the aggregate throughput for read and write must be scalable.  Peer-to-peer computing. A distributed computing model in which each node has equal standing among the collection of nodes.  Process migration. Changing the processor responsible for running a process during execution.  Shared memory. A term applied to both hardware and software indicating the presence of a memory region that is shared between system components.  Single Program Multiple Data. This is the most common way to organize a parallel program, especially on MIMD computers.  Symmetric multiprocessor. A shared-memory computer in which every processor is functionally identical and has ``equal-time'' access to every memory address. In other words, both memory addresses and OS services are equally available to every processor. 2.7 LEARNING ACTIVITY 1. Use the World Wide Web as an example to illustrate the concept of resource sharing, client and server. What are the advantages and disadvantages of HTML, URLs and HTTP as core technologies for information browsing? Are any of these technologies suitable as a basis for client-server computing in general? 46 CU IDOL SELF LEARNING MATERIAL (SLM)

2. A service is implemented by several servers. Explain why resources might be transferred between them. Would it be satisfactory for clients to multicast all requests to the group of servers as a way of achieving mobility transparency for clients? 2.8 UNIT END QUESTIONS 47 A. Descriptive Questions Short questions 1. Define MIMD. 2. Define MISD. 3. What is the influence of the arrival rate on performance? 4. List the life cycle of test metrics. 5. What is sequential algorithm? Long questions 1. Describe about the analysis of parallel algorithm. 2. Draw the model of SISD. 3. Explain about SIMD computers. 4. Explain the types of algorithms. 5. What is parallel algorithm? B. Multiple Choice Questions 1. _____ helps engineers understand many elements of software quality. a. processor b. defect metrics c. Resistance d. Secondary memory 2. Multiprocessors and vector processors are other terms for _____ a. multiware b. middleware CU IDOL SELF LEARNING MATERIAL (SLM)

c. array processors d. supply 3. Test coverage identifies the number of _____ a. Test cases b. processes c. defects d. Work stations 4. Manual testing is a _____ procedure. a. Time consuming b. easy c. complex d. software interrupt 5. A ____ is a measurement of how well a system and its components maintain a certain characteristic. a. program b. TCP c. metric d. Provision 6. Bloom filters may be traded off on three ___ a. Performance metrics b. Movement of data c. Shifting data d. Dynamic data Answers 1-b, 2- c, 3- a, 4- d, 5- a, 6-a 48 CU IDOL SELF LEARNING MATERIAL (SLM)

2.9 REFERENCES Reference books  George Coulouris, Jean Dollimore, Tim Kindberg, “Distributed Systems: Concepts and Design” (4th Edition), Addison Wesley/Pearson Education.  Pradeep K Sinha, “Distributed Operating Systems: Concepts and design”, IEEE computer society press. Text Book References  M.R. Bhujade, “Parallel Computing”, 2nd edition, New Age International Publishers2009.  Andrew S. Tanenbaum and Maarten Van Steen, “Distributed Systems: Principles and Paradigms, 2nd edition, Pearson Education, Inc., 2007, ISBN: 0-13-239227-5. Websites:  https://www.geeksforgeeks.org/introduction-to-parallel  https://hpc.llnl.gov/training/tutorials/introduction-parallel-computing-tutorial  https://www.javatpoint.com/what-is-parallel-computing 49 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 3 - PIPELINE PROCESSING 50 STRUCTURE 3.0 Learning objective 3.1 Introduction of pipelining 3.1.1 Pipeline for arithmetic 3.1.2 Pipeline of instructions 3.1.3 Pipeline disputes 3.1.4 Pipelining benefits 3.1.5 Pipelining drawbacks 3.2 Performance of pipeline architecture 3.2.1 Introduction 3.2.2 Background 3.2.3 Details of the experiment 3.3 Pipeline for arithmetic 3.4 Pipeline of instructions 3.4.1 The impact of the arrival time on productivity 3.5 Summary 3.6 Keywords 3.7 Learning Activity 3.8 Unit End Questions 3.9 References 3.0 LEARNING OBJECTIVES After studying this unit, you will be able to:  Explain the basics of pipelining  Outline the pipeline instructions  Describe the performance of pipeline structure  Explain the procedure of pipelining 3.1 INTRODUCTION OF PIPELINING CU IDOL SELF LEARNING MATERIAL (SLM)

Pages:

Teamlease Edtech Ltd (Amita Chitroda)

CU-MCA-SEM II-Parallel and Distributed Computing-Second Draft

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

CU-MCA-SEM II-Parallel and Distributed Computing-Second Draft

Description: CU-MCA-SEM II-Parallel and Distributed Computing-Second Draft

Read the Text Version

Teamlease Edtech Ltd (Amita Chitroda)

TOP SEARCH

RELATED PUBLICATIONS