96 Parallel and Distributed Computing Figure 3.11: Cache Mapping Now, before proceeding further, it is important to note the following points. NOTES Main memory is divided into equal size partitions called as blocks or frames. Cache memory is divided into partitions having same size as that of blocks called as lines. During cache mapping, block of main memory is simply copied to the cache and the block is not actually brought from the main memory. Cache Mapping Techniques Cache mapping is performed using following three different techniques. Cache Mapping Techniques Direct Fully Associative K-way Set Associative Mapping Mapping Mapping 1. Direct Mapping 2. Fully Associative Mapping 3. K-way Set Associative Mapping CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 97 1. Direct Mapping In direct mapping, A particular block of main memory can map only to a particular line of the cache. The line number of cache to which a particular block can map is given by- Cache line number = (Main Memory Block Address) Modulo (Number of lines in Cache) Example: Consider cache memory is divided into ‘n’ number of lines. Then, block ‘j’ of main memory can map to line number (j mod n) only of the cache. Figure 3.12: Direct Mapping Need of Replacement Algorithm In direct mapping, There is no need of any replacement algorithm. This is because a main memory block can map only to a particular line of the cache. Thus, the new incoming block will always replace the existing block (if any) in that particular line. CU IDOL SELF LEARNING MATERIAL (SLM)
98 Parallel and Distributed Computing Division of Physical Address In direct mapping, the physical address is divided as- Division of Physical Address in Direct Mapping 2. Fully Associative Mapping In fully associative mapping, A block of main memory can map to any line of the cache that is freely available at that moment. This makes fully associative mapping more flexible than direct mapping. Example Consider the following scenario. Figure 3.13: Fully Associative Mapping CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 99 Here, All the lines of cache are freely available. Thus, any block of main memory can map to any line of the cache. Had all the cache lines been occupied, then one of the existing blocks will have to be replaced. Need of Replacement Algorithm In fully associative mapping, A replacement algorithm is required. Replacement algorithm suggests the block to be replaced if all the cache lines are occupied. Thus, replacement algorithm like FCFS Algorithm, LRU Algorithm etc is employed. Division of Physical Address In fully associative mapping, the physical address is divided as: Block Number/Tag Block/Line Offset Division of Physical Address in Fully Associative Mapping 3. K-way Set Associative Mapping: In k-way set associative mapping, Cache lines are grouped into sets where each set contains k number of lines. A particular block of main memory can map to only one particular set of the cache. However, within that set, the memory block can map any cache line that is freely available. The set of the cache to which a particular block of the main memory can map is given by- CU IDOL SELF LEARNING MATERIAL (SLM)
100 Parallel and Distributed Computing Cache set number = (Main Memory Block Address) Modulo (Number of sets in Cache) Also Read- Set Associative Mapping | Implementation and Formulas Example- Consider the following example of 2-way set associative mapping: Figure 3.14: 2-way Set Associative Mapping Here, k = 2 suggests that each set contains two cache lines. Since cache contains 6 lines, so number of sets in the cache = 6/2 = 3 sets. Block ‘j’ of main memory can map to set number (j mod 3) only of the cache. Within that set, block ‘j’ can map to any cache line that is freely available at that moment. If all the cache lines are occupied, then one of the existing blocks will have to be replaced. CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 101 Need of Replacement Algorithm Set associative mapping is a combination of direct mapping and fully associative mapping. It uses fully associative mapping within each set. Thus, set associative mapping requires a replacement algorithm. Division of Physical Address In set associative mapping, the physical address is divided as: Tag Set Number Block/Line Offset Division of Physical Address in K-way Set Associative Mapping Special Cases If k = 1, then k-way set associative mapping becomes direct mapping i.e., 1-way Set Associative Mapping ≡ Direct Mapping If k = Total number of lines in the cache, then k-way set associative mapping becomes fully associative mapping. Different cache mapping techniques are: 1. Direct Mapping 2. Fully Associative Mapping 3. K-way Set Associative Mapping In this section, we will discuss about direct mapping in detail. CU IDOL SELF LEARNING MATERIAL (SLM)
102 Parallel and Distributed Computing Direct Mapping In direct mapping, A particular block of main memory can map to only one particular line of the cache. The line number of cache to which a particular block can map is given by- Cache line number = (Main Memory Block Address) Modulo (Number of lines in Cache) Division of Physical Address- In direct mapping, the physical address is divided as- Division of Physical Address in Direct Mapping Direct Mapped Cache Direct mapped cache employs direct cache mapping technique. The following steps explain the working of direct mapped cache: After CPU generates a memory request, The line number field of the address is used to access the particular line of the cache. The tag field of the CPU address is then compared with the tag of the line. If the two tags match, a cache hit occurs and the desired word is found in the cache. If the two tags do not match, a cache miss occurs. In case of a cache miss, the required word has to be brought from the main memory. It is then stored in the cache together with the new tag replacing the previous one. CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 103 Implementation The following diagram shows the implementation of direct mapped cache: Figure 3.15: Direct Mapping (For simplicity, this diagram shows does not show all the lines of multiplexers). The steps involved are as follows: Step-01: Each multiplexer reads the line number from the generated physical address using its select lines in parallel. To read the line number of L bits, number of select lines each multiplexer must have = L. Step-02: After reading the line number, each multiplexer goes to the corresponding line in the cache memory using its input lines in parallel. Number of input lines each multiplexer must have = Number of lines in the cache memory. CU IDOL SELF LEARNING MATERIAL (SLM)
104 Parallel and Distributed Computing Step-03: Each multiplexer outputs the tag bit it has selected from that line to the comparator using its output line. Number of output line in each multiplexer = 1. Understand It is important to understand: A multiplexer can output only a single bit on output line. So, to output the complete tag to the comparator, Number of multiplexers required = Number of bits in the tag Each multiplexer is configured to read the tag bit at specific location. Example: 1st multiplexer is configured to output the first bit of the tag. 2nd multiplexer is configured to output the second bit of the tag. 3rd multiplexer is configured to output the third bit of the tag and so on. So, Each multiplexer selects the tag bit of the selected line for which it has been configured and outputs on the output line. The complete tag as a whole is sent to the comparator for comparison in parallel. Step-04: Comparator compares the tag coming from the multiplexers with the tag of the generated address. Only one comparator is required for the comparison where: Size of comparator = Number of bits in the tag If the two tags match, a cache hit occurs otherwise a cache miss occurs. CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 105 Hit Latency The time taken to find out whether the required word is present in the Cache Memory or not is called as hit latency. For direct mapped cache, Hit latency = Multiplexer latency + Comparator latency Important Results Following are the few important results for direct mapped cache: Block j of main memory can map to line number (j mod number of lines in cache) only of the cache. Number of multiplexers required = Number of bits in the tag Size of each multiplexer = Number of lines in cache x 1 Number of comparators required = 1 Size of comparator = Number of bits in the tag Hit latency = Multiplexer latency + Comparator latency Concurrent Processing The easy availability of computers along with the growth of Internet has changed the way we store and process data. We are living in a day and age where data is available in abundance. Every day we deal with huge volumes of data that require complex computing and that too, in quick time. Sometimes, we need to fetch data from similar or interrelated events that occur simultaneously. This is where we require concurrent processing that can divide a complex task and process it multiple systems to produce the output in quick time. Concurrent processing is essential where the task involves processing a huge bulk of complex data. Examples include − accessing large databases, aircraft testing, astronomical calculations, atomic and nuclear physics, biomedical analysis, economic planning, image processing, robotics, weather forecasting, web-based services, etc. CU IDOL SELF LEARNING MATERIAL (SLM)
106 Parallel and Distributed Computing What is Parallelism? Parallelism is the process of processing several set of instructions simultaneously. It reduces the total computational time. Parallelism can be implemented by using parallel computers, i.e., a computer with many processors. Parallel computers require parallel algorithm, programming languages, compilers and operating system that support multitasking. In this section, we will discuss only about parallel algorithms. Before moving further, let us first discuss about algorithms and their types. What is an Algorithm? An algorithm is a sequence of instructions followed to solve a problem. While designing an algorithm, we should consider the architecture of computer on which the algorithm will be executed. As per the architecture, there are two types of computers − Sequential Computer Parallel Computer Depending on the architecture of computers, we have two types of algorithms − Sequential Algorithm − An algorithm in which some consecutive steps of instructions are executed in a chronological order to solve a problem. Parallel Algorithm − The problem is divided into sub-problems and are executed in parallel to get individual outputs. Later on, these individual outputs are combined together to get the final desired output. It is not easy to divide a large problem into sub-problems. Sub-problems may have data dependency among them. Therefore, the processors have to communicate with each other to solve the problem. It has been found that the time needed by the processors in communicating with each other is more than the actual processing time. So, while designing a parallel algorithm, proper CPU utilization should be considered to get an efficient algorithm. To design an algorithm properly, we must have a clear idea of the basic model of computation in a parallel computer. CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 107 Model of Computation Both sequential and parallel computers operate on a set (stream) of instructions called algorithms. These set of instructions (algorithm) instruct the computer about what it has to do in each step. Depending on the instruction stream and data stream, computers can be classified into four categories − Single Instruction stream, Single Data stream (SISD) computers Single Instruction stream, Multiple Data stream (SIMD) computers Multiple Instruction stream, Single Data stream (MISD) computers Multiple Instruction stream, Multiple Data stream (MIMD) computers SISD Computers SISD computers contain one control unit, one processing unit, and one memory unit. In this type of computers, the processor receives a single stream of instructions from the control unit and operates on a single stream of data from the memory unit. During computation, at each step, the processor receives one instruction from the control unit and operates on a single data received from the memory unit. SIMD Computers SIMD computers contain one control unit, multiple processing units, and shared memory or interconnection network. CU IDOL SELF LEARNING MATERIAL (SLM)
108 Parallel and Distributed Computing Figure 3.16: SIMD Computers Here, one single control unit sends instructions to all processing units. During computation, at each step, all the processors receive a single set of instructions from the control unit and operate on different set of data from the memory unit. Each of the processing units has its own local memory unit to store both data and instructions. In SIMD computers, processors need to communicate among themselves. This is done by shared memory or by interconnection network. While some of the processors execute a set of instructions, the remaining processors wait for their next set of instructions. Instructions from the control unit decides which processor will be active (execute instructions) or inactive (wait for next instruction). MISD Computers As the name suggests, MISD computers contain multiple control units, multiple processing units, and one common memory unit. CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 109 Figure 3.17: MISD Computers Here, each processor has its own control unit and they share a common memory unit. All the processors get instructions individually from their own control unit and they operate on a single stream of data as per the instructions they have received from their respective control units. This processor operates simultaneously. MIMD Computers MIMD computers have multiple control units, multiple processing units, and a shared memory or interconnection network. CU IDOL SELF LEARNING MATERIAL (SLM)
110 Parallel and Distributed Computing Figure 3.18: MIMD Computers Here, each processor has its own control unit, local memory unit, and arithmetic and logic unit. They receive different sets of instructions from their respective control units and operate on different sets of data. Note An MIMD computer that shares a common memory is known as multiprocessors, while those that uses an interconnection network is known as multicomputers. Based on the physical distance of the processors, multicomputers are of two types − (i) Multicomputer − When all the processors are very close to one another (e.g., in the same room). (ii) Distributed system − When all the processors are far away from one another (e.g.- in the different cities) Parallel Algorithm Analysis Analysis of an algorithm helps us determine whether the algorithm is useful or not. Generally, an algorithm is analyzed based on its execution time (Time Complexity) and the amount of space (Space Complexity) it requires. Since we have sophisticated memory devices available at reasonable cost, storage space is no longer an issue. Hence, space complexity is not given so CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 111 much of importance. Parallel algorithms are designed to improve the computation speed of a computer. For analyzing a Parallel Algorithm, we normally consider the following parameters − Time complexity (Execution Time), Total number of processors used, and Total cost. Time Complexity The main reason behind developing parallel algorithms was to reduce the computation time of an algorithm. Thus, evaluating the execution time of an algorithm is extremely important in analyzing its efficiency. Execution time is measured on the basis of the time taken by the algorithm to solve a problem. The total execution time is calculated from the moment when the algorithm starts executing to the moment it stops. If all the processors do not start or end execution at the same time, then the total execution time of the algorithm is the moment when the first processor started its execution to the moment when the last processor stops its execution. Time complexity of an algorithm can be classified into three categories: Worst-case complexity − When the amount of time required by an algorithm for a given input is maximum. Average-case complexity − When the amount of time required by an algorithm for a given input is average. Best-case complexity − When the amount of time required by an algorithm for a given input is minimum. Asymptotic Analysis The complexity or efficiency of an algorithm is the number of steps executed by the algorithm to get the desired output. Asymptotic analysis is done to calculate the complexity of an algorithm in its theoretical analysis. In asymptotic analysis, a large length of input is used to calculate the complexity function of the algorithm. Note − Asymptotic is a condition where a line tends to meet a curve, but they do not intersect. Here the line and the curve is asymptotic to each other. CU IDOL SELF LEARNING MATERIAL (SLM)
112 Parallel and Distributed Computing Asymptotic notation is the easiest way to describe the fastest and slowest possible execution time for an algorithm using high bounds and low bounds on speed. For this, we use the following notations − Big O notation Omega notation Theta notation Big O Notation In mathematics, Big O notation is used to represent the asymptotic characteristics of functions. It represents the behavior of a function for large inputs in a simple and accurate method. It is a method of representing the upper bound of an algorithm’s execution time. It represents the longest amount of time that the algorithm could take to complete its execution. The function − f(n) = O(g(n)) if there exists positive constants c and n0 such that f(n) ≤ c × g(n) for all n where n ≥ n0. Omega Notation Omega notation is a method of representing the lower bound of an algorithm’s execution time. The function − f(n) = Ω (g(n)) if there exists positive constants c and n0 such that f(n) ≥ c × g(n) for all n where n ≥ n0. Theta Notation Theta notation is a method of representing both the lower bound and the upper bound of an algorithm’s execution time. The function − f(n) = θ(g(n)) if there exists positive constants c1, c2, and n0 such that c1 × g(n) ≤ f(n) ≤ c2 * g(n) for all n where n ≥ n0. CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 113 Speedup of an Algorithm The performance of a parallel algorithm is determined by calculating its speedup. Speedup is defined as the ratio of the worst-case execution time of the fastest known sequential algorithm for a particular problem to the worst-case execution time of the parallel algorithm. speedup = Worst case execution time of the fastest known sequential for a particular problem Worst case execution time of the parallel algorithm Number of Processors Used The number of processors used is an important factor in analyzing the efficiency of a parallel algorithm. The cost to buy, maintain, and run the computers are calculated. Larger the number of processors used by an algorithm to solve a problem, more costly becomes the obtained result. Total Cost Total cost of a parallel algorithm is the product of time complexity and the number of processors used in that particular algorithm. Total Cost = Time complexity × Number of processors used Therefore, the efficiency of a parallel algorithm is − Efficiency = Worst case execution time of sequential algorithm Worst case execution time of the parallel algorithm Parallel Algorithm – Models The model of a parallel algorithm is developed by considering a strategy for dividing the data and processing method and applying a suitable strategy to reduce interactions. In this chapter, we will discuss the following Parallel Algorithm Models − Data parallel model Task graph model CU IDOL SELF LEARNING MATERIAL (SLM)
114 Parallel and Distributed Computing Work pool model Master slave model Producer consumer or pipeline model Hybrid model Data Parallel In data parallel model, tasks are assigned to processes and each task performs similar types of operations on different data. Data parallelism is a consequence of single operations that is being applied on multiple data items. Data-parallel model can be applied on shared-address spaces and message-passing paradigms. In data-parallel model, interaction overheads can be reduced by selecting a locality preserving decomposition, by using optimized collective interaction routines, or by overlapping computation and interaction. The primary characteristic of data-parallel model problems is that the intensity of data parallelism increases with the size of the problem, which in turn makes it possible to use more processes to solve larger problems. Example − Dense matrix multiplication. Figure 3.19: Data Parallel CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 115 Parallel Random Access Machines (PRAM) is a model, which is considered for most of the parallel algorithms. Here, multiple processors are attached to a single block of memory. A PRAM model contains − A set of similar type of processors. All the processors share a common memory unit. Processors can communicate among themselves through the shared memory only. A memory access unit (MAU) connects the processors with the single shared memory. Figure 3.20: Parallel Random Access Machines Here, n number of processors can perform independent operations on n number of data in a particular unit of time. This may result in simultaneous access of same memory location by different processors. To solve this problem, the following constraints have been enforced on PRAM model − Exclusive Read Exclusive Write (EREW) − Here no two processors are allowed to read from or write to the same memory location at the same time. Exclusive Read Concurrent Write (ERCW) − Here no two processors are allowed to read from the same memory location at the same time, but are allowed to write to the same memory location at the same time. CU IDOL SELF LEARNING MATERIAL (SLM)
116 Parallel and Distributed Computing Concurrent Read Exclusive Write (CREW) − Here all the processors are allowed to read from the same memory location at the same time, but are not allowed to write to the same memory location at the same time. Concurrent Read Concurrent Write (CRCW) − All the processors are allowed to read from or write to the same memory location at the same time. There are many methods to implement the PRAM model, but the most prominent ones are: Shared memory model Message passing model Data parallel model Shared Memory Model Shared memory emphasizes on control parallelism than on data parallelism. In the shared memory model, multiple processes execute on different processors independently, but they share a common memory space. Due to any processor activity, if there is any change in any memory location, it is visible to the rest of the processors. As multiple processors access the same memory location, it may happen that at any particular point of time, more than one processor is accessing the same memory location. Suppose one is reading that location and the other is writing on that location. It may create confusion. To avoid this, some control mechanism, like lock/semaphore, is implemented to ensure mutual exclusion. Figure 3.21: Shared Memory Model CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 117 Shared memory programming has been implemented in the following − Thread libraries − The thread library allows multiple threads of control that run concurrently in the same memory location. Thread library provides an interface that supports multithreading through a library of subroutine. It contains subroutines for: – Creating and destroying threads – Scheduling execution of thread – passing data and message between threads – saving and restoring thread contexts Examples of thread libraries include − SolarisTM threads for Solaris, POSIX threads as implemented in Linux, Win32 threads available in Windows NT and Windows 2000, and JavaTM threads as part of the standard JavaTM Development Kit (JDK). Distributed Shared Memory (DSM) Systems − DSM systems create an abstraction of shared memory on loosely coupled architecture in order to implement shared memory programming without hardware support. They implement standard libraries and use the advanced user-level memory management features present in modern operating systems. Examples include Tread Marks System, Munin, IVY, Shasta, Brazos, and Cashmere. Program Annotation Packages − This is implemented on the architectures having uniform memory access characteristics. The most notable example of program annotation packages is OpenMP. OpenMP implements functional parallelism. It mainly focuses on parallelization of loops. The concept of shared memory provides a low-level control of shared memory system, but it tends to be tedious and erroneous. It is more applicable for system programming than application programming. Merits of Shared Memory Programming Global address space gives a user-friendly programming approach to memory. Due to the closeness of memory to CPU, data sharing among processes is fast and uniform. CU IDOL SELF LEARNING MATERIAL (SLM)
118 Parallel and Distributed Computing There is no need to specify distinctly the communication of data among processes. Process-communication overhead is negligible. It is very easy to learn. Demerits of Shared Memory Programming It is not portable. Managing data locality is very difficult. Message Passing Model Message passing is the most commonly used parallel programming approach in distributed memory systems. Here, the programmer has to determine the parallelism. In this model, all the processors have their own local memory unit and they exchange data through a communication network. Figure 3.22: Message Passing Model Processors use message-passing libraries for communication among themselves. Along with the data being sent, the message contains the following components − The address of the processor from which the message is being sent; Starting address of the memory location of the data in the sending processor; Data type of the sending data; Data size of the sending data; The address of the processor to which the message is being sent; Starting address of the memory location for the data in the receiving processor. CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 119 Processors can communicate with each other by any of the following methods − Point-to-Point Communication Collective Communication Message Passing Interface Point-to-Point Communication Point-to-point communication is the simplest form of message passing. Here, a message can be sent from the sending processor to a receiving processor by any of the following transfer modes − Synchronous mode − The next message is sent only after the receiving a confirmation that its previous message has been delivered, to maintain the sequence of the message. Asynchronous mode − To send the next message, receipt of the confirmation of the delivery of the previous message is not required. Collective Communication Collective communication involves more than two processors for message passing. Following modes allow collective communications − Barrier − Barrier mode is possible if all the processors included in the communications run a particular bock (known as barrier block) for message passing. Broadcast − Broadcasting is of two types − – One-to-all − Here, one processor with a single operation sends same message to all other processors. – All-to-all − Here, all processors send message to all other processors. Messages broadcasted may be of three types − Personalized − Unique messages are sent to all other destination processors. Non-personalized − All the destination processors receive the same message. CU IDOL SELF LEARNING MATERIAL (SLM)
120 Parallel and Distributed Computing Reduction − In reduction broadcasting, one processor of the group collects all the messages from all other processors in the group and combine them to a single message which all other processors in the group can access. Merits of Message Passing Provides low-level control of parallelism; It is portable; Less error prone; Less overhead in parallel synchronization and data distribution. Demerits of Message Passing As compared to parallel shared-memory code, message-passing code generally needs more software overhead. Message Passing Libraries There are many message-passing libraries. Here, we will discuss two of the most-used message-passing libraries − Message Passing Interface (MPI) Parallel Virtual Machine (PVM) Message Passing Interface (MPI) It is a universal standard to provide communication among all the concurrent processes in a distributed memory system. Most of the commonly used parallel computing platforms provide at least one implementation of message passing interface. It has been implemented as the collection of predefined functions called library and can be called from languages such as C, C++, Fortran, etc. MPIs are both fast and portable as compared to the other message passing libraries. Merits of Message Passing Interface Runs only on shared memory architectures or distributed memory architectures; Each processors has its own local variables; CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 121 As compared to large shared memory computers, distributed memory computers are less expensive. Demerits of Message Passing Interface More programming changes are required for parallel algorithm; Sometimes difficult to debug; and Does not perform well in the communication network between the nodes. Parallel Virtual Machine (PVM) PVM is a portable message passing system, designed to connect separate heterogeneous host machines to form a single virtual machine. It is a single manageable parallel computing resource. Large computational problems like superconductivity studies, molecular dynamics simulations, and matrix algorithms can be solved more cost effectively by using the memory and the aggregate power of many computers. It manages all message routing, data conversion, task scheduling in the network of incompatible computer architectures. Features of PVM Very easy to install and configure; Multiple users can use PVM at the same time; One user can execute multiple applications; It’s a small package; Supports C, C++, Fortran; For a given run of a PVM program, users can select the group of machines; It is a message-passing model, Process-based computation; Supports heterogeneous architecture. CU IDOL SELF LEARNING MATERIAL (SLM)
122 Parallel and Distributed Computing 3.8 Summary Parallel processing is a method in computing of running two or more processors (CPUs) to handle separate parts of an overall task. Breaking up different parts of a task among multiple processors will help reduce the amount of time to run a program. Any system that has more than one CPU can perform parallel processing, as well as multi-core processors which are commonly found on computers today. Multi-core processors are IC chips that contain two or more processors for better performance, reduced power consumption and more efficient processing of multiple tasks. These multi-core set-ups are similar to having multiple, separate processors installed in the same computer. Most computers may have anywhere from two-four cores; increasing up to 12 cores. Parallel processing is commonly used to perform complex tasks and computations. Data scientists will commonly make use of parallel processing for compute and data-intensive tasks. How Parallel Processing Works Typically a computer scientist will divide a complex task into multiple parts with a software tool and assign each part to a processor, then each processor will solve its part, and the data is reassembled by a software tool to read the solution or execute the task. Typically each processor will operate normally and will perform operations in parallel as instructed, pulling data from the computer’s memory. Processors will also rely on software to communicate with each other so they can stay in sync concerning changes in data values. Assuming all the processors remain in sync with one another, at the end of a task, software will fit all the data pieces together. Computers without multiple processors can still be used in parallel processing if they are networked together to form a cluster. Types of Parallel Processing There are multiple types of parallel processing; two of the most commonly used types include SIMD and MIMD. SIMD, or single instruction multiple data, is a form of parallel processing in which a computer will have two or more processors follow the same instruction set while each processor handles different data. SIMD is typically used to analyze large data sets that are based on the same specified benchmarks. CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 123 MIMD, or multiple instruction multiple data, is another common form of parallel processing which each computer has two or more of its own processors and will get data from separate data streams. Another, less used, type of parallel processing includes MISD, or multiple instruction single data, where each processor will use a different algorithm with the same input data. Difference between Serial and Parallel Processing Where parallel processing can complete multiple tasks using two or more processors, serial processing (also called sequential processing) will only complete one task at a time using one processor. If a computer needs to complete multiple assigned tasks, then it will complete one task at a time. Likewise, if a computer using serial processing needs to complete a complex task, then it will take longer compared to a parallel processor. An algorithm is a sequence of steps that take inputs from the user and after some computation, produces an output. A parallel algorithm is an algorithm that can execute several instructions simultaneously on different processing devices and then combine all the individual outputs to produce the final result. Parallelism is the process of processing several set of instructions simultaneously. It reduces the total computational time. Parallelism can be implemented by using parallel computers, i.e. a computer with many processors. Parallel computers require parallel algorithm, programming languages, compilers and operating system that support multitasking. What is an Algorithm? An algorithm is a sequence of instructions followed to solve a problem. While designing an algorithm, we should consider the architecture of computer on which the algorithm will be executed. As per the architecture, there are two types of computers − Sequential Computer and Parallel Computer. Depending on the architecture of computers, we have two types of algorithms: Sequential Algorithm − An algorithm in which some consecutive steps of instructions are executed in a chronological order to solve a problem. Parallel Algorithm − The problem is divided into sub-problems and are executed in parallel to get individual outputs. Later on, these individual outputs are combined together to get the final CU IDOL SELF LEARNING MATERIAL (SLM)
124 Parallel and Distributed Computing desired output. It is not easy to divide a large problem into sub-problems. Sub-problems may have data dependency among them. Therefore, the processors have to communicate with each other to solve the problem. It has been found that the time needed by the processors in communicating with each other is more than the actual processing time. So, while designing a parallel algorithm, proper CPU utilization should be considered to get an efficient algorithm. To design an algorithm properly, we must have a clear idea of the basic model of computation in a parallel computer. 3.9 Key Words/Abbreviations Single Instruction stream, Single Data stream (SISD) computers. Single Instruction stream, Multiple Data stream (SIMD) computers. Multiple Instruction stream, Single Data stream (MISD) computers. Multiple Instruction stream, Multiple Data stream (MIMD) computers. 3.10 Learning Activity 1. What is parallel processing with example? ----------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------- 2. What is parallel processing in OS? ----------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------- 3. What is serial and parallel processing? ----------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------- CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 125 3.11 Unit End Questions (MCQ and Descriptive) A. Descriptive Types Questions 1. What is Parallel Algorithm? 2. What is Concurrent Processing? 3. What is Parallelism? 4. What is an Algorithm? 5. What is Model of Computation? 6. What is Dataparallel Computation? 7. What is Taskparallel Computation? 8. What is Tasklatency? 9. What is Speedup? 10. What is Parallel Efficiency? 11. What is Simd? B. Multiple Choice/Objective Type Questions 1. Which of the following architecture is/are not suitable for realizing SIMD? (a) Vector Processor (b) Array Processor (c) Von Neumann (d) All of the above 2. CPU checks for an interrupt signal during (a) Starting of last Machine cycle (b) Last T-State of instruction cycle (c) First T-State of interrupt cycle (d) Fetch cycle 3. In a vectored interrupt (a) the branch address is assigned to a fixed location in memory (b) the interrupting source supplies the branch information to the processor through an interrupt vector CU IDOL SELF LEARNING MATERIAL (SLM)
126 Parallel and Distributed Computing (c) the branch address is obtained from a register in the processor (d) none of the above 4. An instruction pipeline can be implemented by means of (a) LIFO buffer (b) FIFO buffer (c) Stack (d) None of the above 5. A microprogram sequencer (a) generates the address of next micro instruction to be executed. (b) generates the control signals to execute a microinstruction. (c) sequentially averages all microinstructions in the control memory. (d) enables the efficient handling of a micro program subroutine. 6. Status bit is also called (a) Binary bit (b) Flag bit (c) Signed bit (d) Unsigned bit 7. If the value V(x) of the target operand is contained in the address field itself, the addressing mode is (a) immediate (b) direct (c) indirect (d) implied 8. The instructions which copy information from one location to another either in the processor’s internal register set or in the external main memory are called (a) Data transfer instructions (b) Program control instructions (c) Input-output instructions (d) Logical instructions 9. Content of the program counter is added to the address part of the instruction in order to obtain the effective address is called (a) relative address mode (b) index addressing mode (c) register mode (d) implied mode CU IDOL SELF LEARNING MATERIAL (SLM)
Synchronous Parallel Processing 127 10. Which of the following interrupt is non maskable (a) INTR (b) RST 7.5 (c) RST 6.5 (d) TRAP 11. On-chip memory which is local to every multithreaded Single Instruction Multiple Data (SIMD) Processor is called (a) Local Memory (b) Global Memory (c) Flash memory (d) Stack 12. A processor, assigned with a thread block, that executes a code, which we usually call a (a) Multithreaded DIMS Processor (b) Multithreaded SIMD Processor (c) Multithreaded queue (d) Multithreaded Stack 13. A code, known as Grid, which runs on a GPU consisting of a set of (a) 32 Thread (b) Unit block (c) 32 Block (d) Thread Block 14. Machine object created by hardware, managing, scheduling, and executing is a thread of (a) DIMS instructions (b) DMM instructions (c) SIMD instructions (d) SIM instructions Answers 1. (c), 2. (b), 3. (b), 4. (b), 5. (a), 6. (b), 7. (b), 8. (a), 9. (a), 10. (d), 11. (a), 12. (b), 13. (d), 14. (c). 3.12 References References of this unit have been given at the end of the book. CU IDOL SELF LEARNING MATERIAL (SLM)
128 Parallel and Distributed Computing UNIT 4 INTRODUCTION TO DISTRIBUTED SYSTEMS Structure: 4.0 Learning Objectives 4.1 Introduction 4.2 Definition, Goals and Issues 4.3 Distributed System Models 4.4 Hardware Concept in Distributed Operation System 4.5 The Software Concept of Distributed Systems 4.6 Middleware 4.7 Models of Middleware 4.8 Services Offered by Middleware 4.9 Client-Server Model 4.10 Summary 4.11 Key Words/Abbreviations 4.12 Learning Activity 4.13 Unit End Questions (MCQ and Descriptive) 4.14 References CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 129 4.0 Learning Objectives After studying this unit, you will be able to: Define distributed systems Describe types of distributed systems Elaborate distributed system models Describe client server model Able to discuss Issues and Goals of distributed systems Able to discuss related Hardware concepts and Software Concept Able to Explain Models of Middleware Describe Client Server model 4.1 Introduction A distributed system is the collection of autonomous computers that are connected using a communication network and they communicate with each other by passing messages. The different processors have their own local memory. They use a distribution middleware. They help in sharing different resources and capabilities to provide users with a single and integrated coherent network. Distributed computing is a field of computer science that studies distributed systems and the computer program that runs in a distributed system is called a distributed program. A distributed system requires concurrent Components, communication network and a synchronization mechanism. A distributed system allows resource sharing, including software by systems connected to the network. Examples of Distributed Systems/Applications of Distributed Computing: Intranets, Internet, WWW, email. Telecommunication networks: Telephone networks and Cellular networks. CU IDOL SELF LEARNING MATERIAL (SLM)
130 Parallel and Distributed Computing Network of branch office computers – Information system to handle automatic processing of orders, Real-time process control: Aircraft control systems, Electronic banking, Airline reservation systems, Sensor networks, Mobile and Pervasive Computing systems. Benefits and Challenges of Distributed Systems There are three reasons that teams generally decide to implement distributed systems: Horizontal Scalability: Since computing happens independently on each node, it is easy and generally inexpensive to add additional nodes and functionality as necessary. Reliability: Most distributed systems are fault-tolerant as they can be made up of hundreds of nodes that work together. The system generally doesn’t experience any disruptions if a single machine fails. Performance: Distributed systems are extremely efficient because work loads can be broken up and sent to multiple machines. However, distributed systems are not without challenges. Complex architectural design, construction, and debugging processes that are required to create an effective distributed system can be overwhelming. Three more challenges you may encounter include: Scheduling: A distributed system has to decide which jobs need to run, when they should run, and where they should run. Schedulers ultimately have limitations, leading to underutilized hardware and unpredictable runtimes. Latency: The more widely your system is distributed, the more latency you can experience with communications. This often leads to teams making tradeoffs between availability, consistency, and latency. Observability: Gathering, processing, presenting, and monitoring hardware usage metrics for large clusters is a significant challenge. How a Distributed System Works Hardware and software architectures are used to maintain a distributed system. Everything must be interconnected — CPUs via the network and processes via the communication system. CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 131 4.2 Definition, Goals and Issues Design Goal Distributed systems have four main goals: 1. Resource sharing. Whether storage facilities, data files, services, or networks, you may want to share these resources among applications. Why? It’s simple economics. It’s clearly cheaper to have one high-end reliable storage facility shared among multiple applications than buying and maintaining storage for each separately. 2. Abstraction. To hide the fact that processes and resources are distributed across multiple computers, possibly even geographically dispersed. In other words, as just discussed above, processes and resources are abstracted from the user. 3. Openness. An open distributed system is essentially a system built by components that can be easily used by, or integrated into other systems. Adhering to standardized interface rules, any arbitrary process (e.g. from a different manufacturer) with that interface can talk to a process with the same interface. Interface specifications should be complete (everything needed for an implementation is specified) and neutral (does not prescribe what an implementation should look like). Completeness and neutrality are key for interoperability and portability. Interoperability means that two implementations from different manufacturers can work together. Portability characterizes the extent to which an app developed for system A will work on system B without modification. Additionally, distributed systems should be extensible, allowing teams to easily add new or replace existing components without affecting those components staying in place. Flexibility is achieved by organizing a collection of relatively small and easily replaceable or adaptable components. 4. Scalability is also needed when there is a spike of users that needs more resources. A good example is the increase in viewership Netflix experiences every Friday evening. Scaling out means dynamically adding more resources (e.g., increase network capacity allowing for more video streaming) and scaling back once consumption has normalized. CU IDOL SELF LEARNING MATERIAL (SLM)
132 Parallel and Distributed Computing Tanenbaum defines a distributed system as a “collection of independent computers that appear to the users of the system as a single computer” There are two essential points in this definition: Independent: This means that, architecturally, the machines are capable of operating independently. Single computer: The second point is that the software enables this set of connected machines to appear as a single computer to the users of the system. This is known as the single system image and is a major goal in designing distributed systems that are easy to maintain and operate. The figure below shows simple distributed systems for a number of applications running through different operating system where the middleware takes responsibility for the heterogeneity of the communications. Figure 4.1: Distributed System A distributed system contains multiple nodes that are physically separate but linked together using the network. All the nodes in this system communicate with each other and handle processes in tandem. Each of these nodes contains a small part of the distributed operating system software. CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 133 A diagram to better explain the distributed system is: Figure 4.2: Distributed Operating System Why we need a distributed system is mainly for the following reasons: Economics: a collection of microprocessors offer a better price/performance than mainframes. Speed: a distributed system may have more total computing power than a mainframe. Enhanced performance through load distribution. Inherent distribution: Some applications are inherently distributed. Ex. a supermarket chain. Reliability: If one machine crashes, the system as a whole can still survive. Higher availability and improved reliability. Incremental growth: Computing power can be added in small increments. Modular expandability. Data sharing: allow many users to access to a common data base. Resource Sharing: expensive peripherals like color printers. Communication: enhance human-to-human communication, e.g., email, chat. CU IDOL SELF LEARNING MATERIAL (SLM)
134 Parallel and Distributed Computing Flexibility: spread the workload over the available machines. Mobility: Access the system, data or resources from any place or device. To be truly reliable, a distributed system must have the following Goals: Fault-Tolerant: It can recover from component failures without performing incorrect actions. Highly Available: It can restore operations, permitting it to resume providing services even when some components have failed. Recoverable: Failed components can restart themselves and rejoin the system, after the cause of failure has been repaired. Consistent: The system can coordinate actions by multiple components often in the presence of concurrency and failure. This underlies the ability of a distributed system to act like a non-distributed system. Scalable: It can operate correctly even as some aspect of the system is scaled to a larger size. Predictable Performance: The ability to provide desired responsiveness in a timely manner. Secure: The system authenticates access to data and services Our Definition of Distributed Systems has the following Significant Issues: Concurrency: concurrent program execution is the norm. I can do my work on my computer while you do your work on yours, sharing resources such as web pages or files when necessary. The capacity of the system to handle shared resources can be increased by adding more resources (for example. computers) to the network. No global clock: When programs need to cooperate they coordinate their actions by exchanging messages. Close coordination often depends on a shared idea of the time at which the programs’ actions occur. But it turns out that there are limits to the accuracy with which the computers in a network can synchronize their clocks – there is no single CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 135 global notion of the correct time. This is a direct consequence of the fact that the only communication is by sending messages through a network. Independent failures: All computer systems can fail, and it is the responsibility of system designers to plan for the consequences of possible failures. Common Examples for distributed systems include: network file system, network printer etc. ATM (cash machine) Distributed databases Network computing Global positioning systems Retail point-of-sale terminals Air-traffic control Enterprise computing WWW Types of Distributed Systems The nodes in the distributed systems can be arranged in the form of client/server systems or peer to peer systems. Details about these are as follows: Client/Server Systems In client server systems, the client requests a resource and the server provides that resource. A server may serve multiple clients at the same time while a client is in contact with only one server. Both the client and server usually communicate via a computer network and so they are a part of distributed systems. Peer to Peer Systems The peer to peer systems contains nodes that are equal participants in data sharing. All the tasks are equally divided between all the nodes. The nodes interact with each other as required as share resources. This is done with the help of a network. CU IDOL SELF LEARNING MATERIAL (SLM)
136 Parallel and Distributed Computing Advantages of Distributed Systems Some advantages of Distributed Systems are as follows: All the nodes in the distributed system are connected to each other. So nodes can easily share data with other nodes. More nodes can easily be added to the distributed system i.e., it can be scaled as required. Failure of one node does not lead to the failure of the entire distributed system. Other nodes can still communicate with each other. Resources like printers can be shared with multiple nodes rather than being restricted to just one. Disadvantages of Distributed Systems Some disadvantages of Distributed Systems are as follows: It is difficult to provide adequate security in distributed systems because the nodes as well as the connections need to be secured. Some messages and data can be lost in the network while moving from one node to another. The database connected to the distributed systems is quite complicated and difficult to handle as compared to a single user system. Overloading may occur in the network if all the nodes of the distributed system try to send data at once. 4.3 Distributed System Models In distributed architecture, components are presented on different platforms and several components can cooperate with one another over a communication network in order to achieve a specific objective or goal. In this architecture, information processing is not confined to a single machine rather it is distributed over several independent computers. CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 137 A distributed system can be demonstrated by the client-server architecture which forms the base for multi-tier architectures; alternatives are the broker architecture such as CORBA, and the Service-Oriented Architecture (SOA). There are several technology frameworks to support distributed architectures, including .NET, J2EE, CORBA, .NET Web services, AXIS Java Web services, and Globus Grid services. Middleware is an infrastructure that appropriately supports the development and execution of distributed applications. It provides a buffer between the applications and the network. It sits in the middle of system and manages or supports the different components of a distributed system. Examples are transaction processing monitors, data convertors and communication controllers etc. Middleware as an infrastructure for distributed system. Figure 4.3: Middleware as an Infrastructure for Distributed System The basis of a distributed architecture is its transparency, reliability, and availability. Advantages Resource sharing − Sharing of hardware and software resources. Openness − Flexibility of using hardware and software of different vendors. CU IDOL SELF LEARNING MATERIAL (SLM)
138 Parallel and Distributed Computing Concurrency − Concurrent processing to enhance performance. Scalability − Increased throughput by adding new resources. Fault tolerance − The ability to continue in operation after a fault has occurred. Disadvantages Complexity − They are more complex than centralized systems. Security − More susceptible to external attack. Manageability − More effort required for system management. Unpredictability − Unpredictable responses depending on the system organization and network load. Table 4.1: Centralized System vs. Distributed System Criteria Centralized system Distributed System Economics Availability Low High Complexity Consistency Low High Scalability Technology Low High Security Simple High Poor Good Homogeneous Heterogeneous High Low Client-Server Architecture The client-server architecture is the most common distributed system architecture which decomposes the system into two major subsystems or logical processes: Client − This is the first process that issues a request to the second process i.e., the server. Server − This is the second process that receives the request, carries it out, and sends a reply to the client. In this architecture, the application is modelled as a set of services that are provided by servers and a set of clients that use these services. The servers need not know about clients, but CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 139 the clients must know the identity of servers, and the mapping of processors to processes is not necessarily 1 : 1. Figure 4.4: 2 tier Client-Server Architecture Client-server Architecture can be classified into two models based on the functionality of the client. Thin-client Model In thin-client model, all the application processing and data management is carried by the server. The client is simply responsible for running the presentation software. Used when legacy systems are migrated to client server architectures in which legacy system acts as a server in its own right with a graphical interface implemented on a client. A major disadvantage is that it places a heavy processing load on both the server and the network. Thick/Fat-client model In thick-client model, the server is only in charge for data management. The software on the client implements the application logic and the interactions with the system user. CU IDOL SELF LEARNING MATERIAL (SLM)
140 Parallel and Distributed Computing Most appropriate for new C/S systems where the capabilities of the client system are known in advance. More complex than a thin client model especially for management. New versions of the application have to be installed on all clients. Figure 4.5: Thick/Fat-client Model Advantages Separation of responsibilities such as user interface presentation and business logic processing. Reusability of server components and potential for concurrency Simplifies the design and the development of distributed applications It makes it easy to migrate or integrate existing applications into a distributed environment. It also makes effective use of resources when a large number of clients are accessing a high-performance server. Disadvantages Lack of heterogeneous infrastructure to deal with the requirement changes. Security complications. Limited server availability and reliability. Limited testability and scalability. Fat clients with presentation and business logic together. CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 141 Multi-Tier Architecture (n-tier Architecture) Multi-tier architecture is a client–server architecture in which the functions such as presentation, application processing, and data management are physically separated. By separating an application into tiers, developers obtain the option of changing or adding a specific layer, instead of reworking the entire application. It provides a model by which developers can create flexible and reusable applications. Figure 4.6: Multi-Tier Architecture (n-tier Architecture) The most general use of multi-tier architecture is the three-tier architecture. A three-tier architecture is typically composed of a presentation tier, an application tier, and a data storage tier and may execute on a separate processor. Presentation Tier Presentation layer is the topmost level of the application by which users can access directly such as webpage or Operating System GUI (Graphical User interface). The primary function of this layer is to translate the tasks and results to something that user can understand. It communicates with other tiers so that it places the results to the browser/client tier and all other tiers in the network. Application Tier (Business Logic, Logic Tier, or Middle Tier) Application tier coordinates the application, processes the commands, makes logical decisions, evaluation, and performs calculations. It controls an application’s functionality by CU IDOL SELF LEARNING MATERIAL (SLM)
142 Parallel and Distributed Computing performing detailed processing. It also moves and processes data between the two surrounding layers. Data Tier In this layer, information is stored and retrieved from the database or file system. The information is then passed back for processing and then back to the user. It includes the data persistence mechanisms (database servers, file shares, etc.) and provides API (Application Programming Interface) to the application tier which provides methods of managing the stored data. Figure 4.7: 3-Tier Architecture Advantages Better performance than a thin-client approach and is simpler to manage than a thick- client approach. Enhances the reusability and scalability − as demands increase, extra servers can be added. Provides multi-threading support and also reduces network traffic. Provides maintainability and flexibility. CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 143 Disadvantages Unsatisfactory Testability due to lack of testing tools. More critical server reliability and availability. Broker Architectural Style Broker Architectural Style is a middleware architecture used in distributed computing to coordinate and enable the communication between registered servers and clients. Here, object communication takes place through a middleware system called an object request broker (software bus). Client and the server do not interact with each other directly. Client and server have a direct connection to its proxy which communicates with the mediator-broker. A server provides services by registering and publishing their interfaces with the broker and clients can request the services from the broker statically or dynamically by look-up. CORBA (Common Object Request Broker Architecture) is a good implementation example of the broker architecture. Components of Broker Architectural Style The components of broker architectural style are discussed through following heads: Broker Broker is responsible for coordinating communication, such as forwarding and dispatching the results and exceptions. It can be either an invocation-oriented service, a document or message – oriented broker to which clients send a message. It is responsible for brokering the service requests, locating a proper server, transmitting requests, and sending responses back to clients. It retains the servers’ registration information including their functionality and services as well as location information. It provides APIs for clients to request, servers to respond, registering or unregistering server components, transferring messages, and locating servers. CU IDOL SELF LEARNING MATERIAL (SLM)
144 Parallel and Distributed Computing Stub Stubs are generated at the static compilation time and then deployed to the client side which is used as a proxy for the client. Client-side proxy acts as a mediator between the client and the broker and provides additional transparency between them and the client; a remote object appears like a local one. The proxy hides the IPC (inter-process communication) at protocol level and performs marshaling of parameter values and un-marshaling of results from the server. Skeleton Skeleton is generated by the service interface compilation and then deployed to the server side, which is used as a proxy for the server. Server-side proxy encapsulates low-level system- specific networking functions and provides high-level APIs to mediate between the server and the broker. It receives the requests, unpacks the requests, unmarshals the method arguments, calls the suitable service, and also marshals the result before sending it back to the client. Bridge A bridge can connect two different networks based on different communication protocols. It mediates different brokers including DCOM, .NET remote, and Java CORBA brokers. Bridges are optional component, which hides the implementation details when two brokers interoperate and take requests and parameters in one format and translate them to another format. Figure 4.8: Components of Broker Model CU IDOL SELF LEARNING MATERIAL (SLM)
Introduction to Distributed Systems 145 Broker implementation in CORBA CORBA is an international standard for an Object Request Broker – a middleware to manage communications among distributed objects defined by OMG (object management group). Figure 4.9: CORBA Architecture Service-Oriented Architecture (SOA) A service is a component of business functionality that is well-defined, self-contained, independent, published, and available to be used via a standard programming interface. The connections between services are conducted by common and universal message-oriented protocols such as the SOAP Web service protocol, which can deliver requests and responses between services loosely. Service-oriented architecture is a client/server design which support business-driven IT approach in which an application consists of software services and software service consumers (also known as clients or service requesters). CU IDOL SELF LEARNING MATERIAL (SLM)
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251