5. Which of the following algorithms is NOT a mutual exclusion algorithm? a. Lins algorithm b. Chang and Roberts' algorithm c. Maekawa’s algorithm d. Rinsel algorithm Answers 1-a, 2-b, 3-d, 4-a, 5-b 12.9 REFERENCES Reference books George Coulouris, Jean Dollimore, Tim Kindberg, “Distributed Systems: Concepts and Design” (4th Edition), Addison Wesley/Pearson Education. Pradeep K Sinha, “Distributed Operating Systems: Concepts and design”, IEEE computer society press. Text Book References M.R. Bhujade, “Parallel Computing”, 2nd edition, New Age International Publishers2009. Andrew S. Tanenbaum and Maarten Van Steen, “Distributed Systems: Principles and Paradigms, 2nd edition, Pearson Education, Inc., 2007, ISBN: 0-13-239227-5. Websites: https://www.techopedia.com/definition/10062/wireless-communications https://www.computernetworkingnotes.com/ https://www.guru99.com 251 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 13 - CONSISTENCY AND REPLICATION STRUCTURE 13.0Learning Objectives 13.1Introduction 13.2Reasons for Replication 13.3Replication as scaling technique 13.4Data centric consistency models 13.5Client- centric consistency models 13.6Replica Management 13.7Summary 13.8Keywords 13.9Learning Activity 13.10 Unit end questions 13.11 References 13.0 LEARNING OBJECTIVES After studying this unit, you will be able to: Explain the reasons for replication Outline the objectives of consistency models Describe about replica management Differentiate pull and push protocols 13.1 INTRODUCTION Data replication is a key topic in distributed systems. In most cases, data is replicated to improve reliability or performance. Keeping copies consistent is one of the most difficult challenges. Informally, this means because when one copy is modified, the other copies must also be updated; otherwise, the duplicates will no longer be identical. In this chapter, we'll go over exactly what consistency of duplicated data entails, as well as the numerous methods for achieving it. We begin with a general overview of why replication is important and how it relates to scalability. We next move on to discussing what consistency actually entails. A common assumption in consistency models is that many processes will access shared data at the same 252 CU IDOL SELF LEARNING MATERIAL (SLM)
time. In these settings, consistency can be defined in terms of what processes may expect while reading and changing shared data while also being aware that others are doing so. In large-scale distributed systems, consistency models involving shared data are generally difficult to implement efficiently. Furthermore, simpler models, which are typically easier to implement, can be used in many circumstances. Client-centric consistency models are one type of consistency model that focuses on consistency from the standpoint of a single (potentially mobile) client. In a separate section, client-centric consistency models are examined. 13.2 REASONS FOR REPLICATION Data replication is done for two reasons: reliability and performance. To begin, data is replicated to improve a system's reliability. If a file system has been duplicated, switching to one of the replicas may allow you to continue working after one of the clones fails. It is also feasible to provide better security against corrupted data by storing several copies. Consider the case when a file is duplicated three times, and each compose operation is executed on each copy. We may protect ourselves from a single failed write operation by assuming that the value returned by at least two copies is the right one. Another purpose to replicate data is to improve performance. When a distributed state requires to scale in terms of both population and geographical area, performance replication is critical. Numbers are being scaled. When an increasing number of activities need to access data handled by a single server, this can happen. Performance can be increased in this scenario by replicating the server and then dividing the workload. Replication may be required when scaling to the scale of a geographical area. The underlying premise is that by putting a duplicate of data close to the process that uses it, the time it takes to access the data is reduced. As a result, the perceived performance of that process improves. This example also demonstrates how difficult it can be to assess the performance benefits of replication. Although a client process may appear to be performing better, it's possible that more network bandwidth is now being used to keep all replicas up to date. Who could be against replication if it serves to increase reliability and performance? When data is replicated, it comes at a cost, unfortunately. The issue with replication would be that having several copies might cause problems with consistency. When a duplicate is updated, it becomes distinct from the others. As a result, all copies must be modified in order to maintain consistency. The cost of replication is determined by when and where those alterations must be made. Consider improving Web page access speeds to gain a better understanding of the issue. If no particular precautions are taken, retrieving a page from either a remote Web server can take several seconds. Web browsers frequently save a copy of a previously retrieved Web page locally to increase efficiency (i.e., they cache a Web page). The browser will automatically 253 CU IDOL SELF LEARNING MATERIAL (SLM)
return the local copy if the user requests the page again. As far as the user is concerned, the access time is excellent. However, if a user insists on always having the most recent version of a page, he may be in for some bad luck. The issue is that if the page has also been amended in the meanwhile, the changes will not be propagated into cached copies, rendering them obsolete. One way to avoid returning a stale version to the user is to prevent the browser from keeping local copies in the first place, thus handing replication up to the server. However, if no duplicate is placed near the user, this technique may still result in slow access times. Another option is to allow the Web server invalidate or update each cached copy; however, this requires the server to maintain track of all caches and send messages to them. As a result, the server's overall performance may suffer. Below, we'll return to the performance vs. scalability debate. 13.3 REPLICATION AS SCALING TECHNIQUE Scaling strategies like as replication & caching for performance are frequently used. Performance concerns are the most common symptom of scalability challenges. Placing copies of data near to the processes that use them can increase performance and alleviate scalability issues by reducing access time. Maintaining up-to-date copies may necessitate extra network capacity, which is a potential trade-off that must be considered. Consider the case of a process P that reads a local replica N times per second and updates it M times per second. Assume that an update entirely refreshes the local replica's prior version. If N «M, or the access-to-update ratio, is low, many updated versions of a local replica will not be accessed by P, rendering network communication for those versions pointless. It could have been better in this situation not to instal a local replica so near to P, or to use an alternative update technique. We'll come back to these points later. However, a more major issue is that maintaining numerous copies constant could cause serious scaling issues. When the copies are the same, a collection of duplicates appears to be consistent. This implies that no matter which copy you read from, the output will always be the same. As a result, when an update operation is completed on one copy, the update must be propagated to all copies before proceeding to the next operation, regardless of which copy it is initiated or performed on. Tight consistency, which is supplied through what is also known as synchronous replication, is sometimes referred to informally (and imprecisely) as tight consistency. (In the next part, we'll define consistency in detail and introduce a variety of consistency models.) The important concept is that an update is conducted as a single atomic operation, or transaction, on all copies. Unfortunately, when operations must be completed fast, implementing atomicity involving large numbers of replicas that may be widely scattered across a large-scale network is intrinsically problematic. 254 CU IDOL SELF LEARNING MATERIAL (SLM)
The fact that we need to synchronise all clones creates challenges. In practise, this means that all clones must first agree on when a local update will be conducted. For example, replicas may be required to decide on a global sequence of operations based on Lamport timestamps, or allow a coordinator to do so. Global synchronisation takes a long time to complete, especially when replicas are dispersed throughout a wide-area network. We've reached a fork in the road. On the one hand, replication and caching can help with scalability issues, resulting in enhanced performance. On the other side, maintaining consistency across all copies typically necessitates global synchronisation, which is intrinsically slow. It's possible that the treatment is worse than the ailment. 13.4 DATA CENTRIC CONSISTENCY MODELS Read and write operations on shared data, accessible via (distributed) shared memory, a (distributed) shared database, or a (distributed) file system, have traditionally been studied in the context of consistency. In this part, we'll refer to a data store as a whole. Physically, a data store can be spread across numerous machines. Each process that can receive information from the store, in particular, is presumed to have a local (or close) copy of the complete store available. Write actions are transmitted to the other copies. When a data operation alters the data, it is classified as a write operation; otherwise, it is classified as a read operation. Figure 13.1 Distributed Storage A consistency system is usually a contract between the data storage and the processes. It states that if processes agree to follow specific rules, the store will function well. A process that conducts a read operation on such a data item normally expects the operation to produce a value that represents the results of the most recent write action on the data. It's impossible to tell which write operation is the most recent in the absence of a global clock. Other definitions must be provided as an alternative, resulting in a variety of consistency models. Each model essentially limits the values that can be returned by a 255 CU IDOL SELF LEARNING MATERIAL (SLM)
feedback phase on a data item. As one might imagine, the ones with substantial limits are simple to use, such as when developing applications, but those with small restrictions can be challenging at times. Of fact, the easy-to-use versions don't perform nearly along with the more challenging variants. Continuous Consistency Applications that employ numerical semantics for their data can use numerical deviations to measure inconsistency. The replication of records carrying stock market prices is an obvious example. In this scenario, an application could specify that the difference between two copies should not exceed $0.02, which would have been an absolute numerical deviation. Alternatively, a relative numerical variation could be stated, saying that no more than 0.5 percent should differ between two copies. In both examples, we can see that if a stock rise (and one of the duplicates is updated promptly) without exceeding the specified numerical deviations, the replicas are still regarded mutually consistent. The number of updates that were applied to a replica but have not yet been observed by others is also referred to as numerical deviation. A Web cache, for example, might not see a batch of activities performed by a Web server. The associated variance in the value also is referred to it by it weight in this circumstance. Finally, there are some types of applications in which update ordering can differ between copies as long as the variances are kept to a minimum. These modifications are applied tentatively to a local copy, pending global consent from all replicas, according to one perspective. As a result, before becoming permanent, some updates may need to be turned back and applied in a different order. Ordering deviations are more difficult to grasp intuitively than the other two consistency metrics. To help you understand, we've included some samples below. 13.5 CLIENT-CENTRIC CONSISTENCY MODELS The previous section's consistency models strive to provide a systemwide consistent representation of a data storage. An essential assumption is that multiple processes may be changing the data store at the same time, and that consistency must be maintained in the face of this concurrency. In the case of object-based entry consistency, for example, the data store ensures that when an object is called, the calling process receives a copy of the object that reflects all modifications to the object made thus far, possibly by other processes. During the call, there is also a guarantee that no other process will interfere, i.e., the calling process will have mutual exclusive access. Distributed systems require the ability to handle concurrent operations over shared data while retaining sequential consistency. Sequential consistency may be ensured only when processes use synchronisation techniques like transactions or locks for performance reasons. 256 CU IDOL SELF LEARNING MATERIAL (SLM)
We'll look at a specific type of distributed data store in this section. The data stores we study are defined by the absence of simultaneous updates, or by the ease with which such changes can be addressed when they do occur. The majority of procedures entail data reading. The eventual consistency model, which is used by these data storage, is a very weak consistency model. It turns out that many discrepancies can be disguised quite cheaply by adopting specific client-centric consistency models. Eventual Consistency It's up to you to decide how many processes should run concurrently and how much consistency should be guaranteed. Many situations exist where concurrency is only present in a limited manner. In many database systems, for example, most processes rarely perform update operations; instead, they largely read information from the server. Update activities are performed by only one or a few processes. The question then becomes how quickly just reading processes should be able to receive updates. Consider the Domain Name System (DNS), which is a global name system. The DNS name space is divided into domains, with each domain being allocated to a naming authority that serves as the domain's owner. Only that authority has the ability to make changes to its section of the name space. As a result, conflicts arising from two operations attempting to update the very same data (i.e., write write conflicts) are never encountered. The only read- write conflicts that need to be addressed are when one process tries to change a data item while the other is seeking to read it at the same time. It turns out that propagating an update in a lazy manner is typically appropriate, meaning that a reading process will only receive an update after some time has elapsed since the update occurred. The World Wide Web is another example. Web pages are almost always updated by a single authority, including a webmaster or the page's actual owner. In most cases, there are no write- write issues to resolve. Browsers and Web proxies, on the other hand, are frequently set to save a fetched page in a local cache and return that page upon each request to improve performance. Both forms of Web caches have the potential to return out-of-date Web pages, which is an important feature. To put it another way, the cached page provided to the requesting client is an earlier version than the one available on the Web server. Many users, it turns out, find this inconsistency tolerable (to a certain degree). 257 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 13.2 Mobile Users The database is accessed by the mobile user by transparently connecting to one of the replicas. To put it another way, the software program on the user's portable computer has no idea which copy it is running on. Assume that the user performs numerous updates before disconnecting. He returns to the database later, perhaps after moving to a new place or using a different access device. As demonstrated in Figure, the user may then be connected to an external replica than before. However, the user may notice inconsistent behaviour if the prior modifications have not yet been propagated. He would expect to see all of the previously made adjustments, but instead, it appears as if nothing has changed at all. This is a common scenario for eventually-consistent data stores, and it arises from the fact that users may operate on distinct replicas at times. The issue can be solved by using client - centered consistency. In essence, client-centric consistency ensures that a single client's accesses to a data store are consistent. There are no guarantees that many clients will be able to access the same resource at the same time. 13.6 REPLICA MANAGEMENT One of the most important decisions for any distributed system that enables replication is where, when, by whom and replicas should be placed, as well as which procedures to use to keep replicas consistent. The placement problem should be divided into two subproblems: replica server placement and content placement. The distinction is minor but significant, and the two concerns are frequently confused. Finding the optimum locations to locate a server that can host (part of) a data store is what replica-server placement is all about. Finding the optimal servers for content placement is what content placement is all about. It's worth noting that this usually indicates we're seeking for the best location for a single data point. 258 CU IDOL SELF LEARNING MATERIAL (SLM)
Obviously, replica servers have to be set up first before content can be placed. Take a look at some of these two different placement issues in the next sections, followed by a description of the basic processes for handling repeated information. Replica-Server Placement For the simple reason that it is typically more of a management and commercial issue than an optimization problem, replica server location is not a well-studied problem. Nonetheless, examining client and network features might help you make better selections. There are several methods for calculating the optimal replica server placement, but they all come down to an optimization issue in which the best K out of N locations must be chosen (K N). These are recognised to be computationally difficult problems that can only be solved through heuristics. The distance between clients and places is the beginning point for Qiu et al. (2001). Latency and bandwidth are two ways to measure distance. Given that k servers have already been established, their solution selects one server at a time so that the average distance between such a server and its clients is as small as possible (meaning that there are N - k locations left). Content- Replication and Placement Let's shift our focus away from server location and onto content placement. Three main types of replicas can be identified logically arranged when it comes to content replication and placement, as seen in Figure. The original set of replicas that make up a distributed data storage are known as permanent replicas. In many situations, there are only a few permanent reproductions. Take a look at a website, for example. A website is often distributed in one of two ways. The first type of distribution involves replicating the files that make up a site among a small number of servers at a single location. Whenever a request is received, it is routed to one of the servers using a round-robin approach, for example. Figure 13.3 Replica Management 259 CU IDOL SELF LEARNING MATERIAL (SLM)
The term \"mirroring\" describes the second type of distributed Web site. A Web site is replicated to a small number of servers, known as mirror sites, that are geographically distributed over the Internet in this scenario. Clients typically select one of the several mirror sites from a list provided to them. Mirrored Web sites, like cluster-based Web sites, have a limited number of replicas that are more or less statically constructed. The database can be distributed and replicated over a cluster of servers, with shared-nothing architecture emphasising that CPUs do not share discs or main memory. A database can also be spread and duplicated across a number of geographically scattered locations. In most federated databases, this design is used. Content Distribution The propagation of (updated) material to the required replica servers is also part of replica management. There are a number of trade-offs to consider, which we'll go through next. Operations vs. State What is to be disseminated is a key design consideration. In general, there are three options: 1. Only send out an update notification. 2. Copy information from one copy to the other. 3. Repeat the update operation on other copies. Invalidation protocols are responsible for spreading a notification. Other copies are notified that an update has occurred and that the data they hold is no longer valid via an invalidation protocol. Only a portion of a copy is invalidated if the invalidation specifies which section of the database server has been modified. The crucial point is that only a notification is sent out. Depending on the consistency model to be maintained, whenever an operation through an invalidated copy is required, that copy must normally be updated first. The fundamental benefit of invalidation protocols would be that they consume very little network bandwidth. bandwidth. The only data that has to be sent is a list of which records are no longer valid. When there are a lot of update activities compared to read operations, or when the read-to- write ratio is low, such protocols function well. Consider a data store whose changes are distributed by delivering the updated data to all copies. If the size of the updated data is huge, and updates happen more frequently than read operations, then may have a situation where two updates happen one after the other without a read operation in between. As a result, propagating the initial update to all replicas is basically pointless, as the second update will overwrite it. It would have been more efficient to send a notification that the data had been updated instead. 260 CU IDOL SELF LEARNING MATERIAL (SLM)
The second option is to transfer updated data among replicas, which is useful when the read- to-write ratio is high. In that circumstance, there is a good chance that an update will be effective, meaning that the contains extra will be read before the next update. Instead of propagating updated data, the changes might be logged and just those logs transferred to conserve bandwidth. Furthermore, transfers are frequently aggregated, meaning numerous alterations are bundled into a single message, reducing communication overhead. The third technique is to notify each duplicate which update action it should do rather than transferring any data revisions (and sending only the parameter values that those operations need). This method, often known as active replication, presupposes that each copy is represented by a process capable of keeping its related data up to date \"actively\" by performing operations (Schneider, 1990). The fundamental advantage of active replication is that, if the amount of a parameters associated with just an operation is small, updates may typically be sent with minimum bandwidth costs. Furthermore, the operations can be of any complexity, which could lead to even more gains in replica consistency. On the other hand, each duplicate may demand additional processing power, particularly in circumstances when processes are relatively sophisticated. Pull Versus Push Protocols Another design consideration is whether changes should be pushed or removed. Updates are propagated to certain other replicas in a push-based technique, often known as server-based protocols, without those replicates even asking for them. Push-based techniques are frequently utilised to communicate between permanent & server - initiated replicas, but they can also be used to update client caches. When replicas must maintain a relatively high level of consistency, server-based protocols are used. To put it another way, copies must be kept similar. The fact that permanent & server-initiated replicas, as well as huge shared caches, are frequently shared by numerous clients who, in the end, mostly conduct read operations, necessitates a high level of consistency. As a result, each replica's read-to-update ratio is relatively high. Push-based protocols are efficient in these situations because each pushed update can be expected to be useful to one or more readers. Furthermore, push-based protocols provide consistent data as soon as it is requested. In a pull-based method, on the other hand, a server or client requests that another server deliver it any changes it has at the time. Client caches frequently employ pull-based protocols, sometimes known as client-based protocols. Checking whether cached data items still are up to date is a popular approach used with Web caches, for example. When a cache sends a frame for things that are still available locally, it checks with the originating Web server to see if the data items have changed since they were cached. When an update occurs, the changed data is first transmitted to the cache and then returned to the requesting client. If no changes were made, the cached data is returned. To put it another way, the client polls its 261 CU IDOL SELF LEARNING MATERIAL (SLM)
server to see if an update is required. When the read-to-update ratio is low, a pull-based strategy is effective. This is frequently the case with single-client (nonshared) client caches. Even when a cache is shared by multiple clients, a pull-based strategy may be more efficient when the stored data items are only used once in a while. 13.7 SUMMARY The technique of storing data in several sites or nodes is known as data replication. It is beneficial in terms of increasing data availability. It basically involves moving data from a database from one server to another so that all users have access to the same information. As a result, users can access data relevant to their jobs without interfering with both the work of others in a distributed database. Data replication entails the continuous duplicating of transactions so that the duplicate is always up to date and synced with the source. In data replication, however, data is available in multiple locations, but a specific relation must reside in only one. Full replication, in which the entire database is stored at each site, is possible. Partial replication is also possible, in which some frequently used database fragments are replicated while others are not. The technique of storing the same data in different locations for improve data availability and accessibility, as well as system resilience and reliability, is known as data replication. Data replication is frequently used for disaster recovery, ensuring that an accurate backup is available at all times in the event of a disaster, hardware failure, or a data breach. Data access can also be sped up with the use of a replica, especially in organisations with a large number of locations. When accessing data from North American data centres, users in Asia and Europe may experience latency. Bringing a copy of the data closer to the user can speed up access and balance network demand. A contract between a (distributed) data store and processes in which the data store specifies exactly what the results of reads and writes are in the presence of concurrency. A consistency model is an agreement between a distributed data store and processes in which the processes agree to follow specific rules while the store guarantees accurate operation. The degree of consistency which should be maintained for shared memory data is referred to as a consistency model. 13.8 KEYWORDS Platform- The lowest layer of hardware and software layers, typically hardware plus operating system. 262 CU IDOL SELF LEARNING MATERIAL (SLM)
Replication- A method of increasing the reliability of systems based on multiple copies of data or devices. Router -The basic building block of an internet. A router is a computer that attaches to two or more networks and forwards packets according to information found in its routing table. Routers in the Internet run the IP protocol. RPC (Remote Procedure Call) -Procedure calls between different processes. Distributed system - A system of networked computers which communicate and coordinate their actions only by-passing messages. 13.9 LEARNING ACTIVITY 1. Analyse, what will happen, if memory access in a uniform way is applied. 2. Suppose your institute has 50 computer system in a lab. You are asked to set up a local network for those systems. What kind of parallel architecture, you will follow? 13.10UNIT END QUESTIONS 263 A. Descriptive Questions Short questions 1. Define data replication. 2. List any two reasons for replication. 3. How will you compare scaling strategies with replication? 4. Define eventual consistency. 5. What is continuous consistency? Long Questions 1. Describe operation and state. 2. Explain pull and push protocols. 3. Explain the process of replica management. 4. Specify the reasons for replication in detail. 5. Describe the various consistency levels B. Multiple Choice Questions CU IDOL SELF LEARNING MATERIAL (SLM)
1. Data replication is done for two reasons: reliability and ______ a. Care b. security c. performance d. resistance 2. A ________system is usually a contract between the data storage and the processes. a. Distributed system b. File replication c. consistency d. Google functions 3. Distributed systems require the ability to handle ____operations a. concurrent b. Administrator c. complex d. Work stations 4. ____ is a global name system. a. Replica b. Random c. TCP d. DNS 5. The database is accessed by the mobile user by transparently connecting to one of the _____. a. Remote procedure calls b. TCP c. replicas d. data store 264 CU IDOL SELF LEARNING MATERIAL (SLM)
Answers 1-c, 2- c, 3- a, 4- d, 5- c 13.11REFERENCES Reference books George Coulouris, Jean Dollimore, Tim Kindberg, “Distributed Systems: Concepts and Design” (4th Edition), Addison Wesley/Pearson Education. Pradeep K Sinha, “Distributed Operating Systems: Concepts and design”, IEEE computer society press. Text Book References M.R. Bhujade, “Parallel Computing”, 2nd edition, New Age International Publishers2009. Andrew S. Tanenbaum and Maarten Van Steen, “Distributed Systems: Principles and Paradigms, 2nd edition, Pearson Education, Inc., 2007, ISBN: 0-13-239227-5. Websites: https://www.techopedia.com/definition/10062/wireless-communications https://www.computernetworkingnotes.com/ https://www.guru99.com 265 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 14 - DISTRIBUTED FILE SYSTEMS 1 STRUCTURE 14.0 Learning Objectives 14.1Introduction to Distributed File Systems 14.2Characteristics of File Systems 14.3Distributed file system requirements 14.4Good features of Distributed File System 14.5File Models 14.6File Accessing Models 14.7File-Caching Schemes 14.8Summary 14.9Keywords 14.10 Learning Activity 14.11 Unit end questions 14.12 References 14.0 LEARNING OBJECTIVES After studying this unit, you will be able to: Learn the characteristics of file systems Explain the good features od distributed file system Differentiate file accessing and file caching schemes List the characteristics of file accessing models 14.1 INTRODUCTION TO DISTRIBUTED FILE SYSTEM Another design consideration is whether changes should be pushed or removed. Updates are propagated to certain other replicas in a push-based technique, often known as server-based protocols, without those replicates even asking for them. Push-based techniques are frequently utilised to communicate between permanent & server - initiated replicas, but they can also be used to update client caches. When replicas must maintain a relatively high level of consistency, server-based protocols are used. To put it another way, copies must be kept similar. 266 CU IDOL SELF LEARNING MATERIAL (SLM)
The fact that permanent & server-initiated replicas, as well as huge shared caches, are frequently shared by numerous clients who, in the end, mostly conduct read operations, necessitates a high level of consistency. As a result, each replica's read-to-update ratio is relatively high. Push-based protocols were efficient in these situations because each pushed update could be expected to be useful to one or more readers. Furthermore, push-based protocols provide consistent data as soon as it is requested. In a pull-based method, on the other hand, a server or client requests that another server deliver it any changes it has at the time. Client caches frequently employ pull-based protocols, sometimes known as client-based protocols. Checking whether cached data items still are up to date is a popular approach used with Web caches, for example. When a cache sends a frame for things that are still available locally, it checks with the originating Web server to see if the data items have changed because they were cached. When an update occurs, the changed data is first transmitted to the cache but instead returned to the requesting client. If no changes were made, the cached data is returned. To put it another way, the client polls its server to see if an update is required. When the read-to-update ratio is low, a pull- based strategy is effective. This is frequently the case with single-client (nonshared) client caches. Even when a cache is shared by multiple clients, a pull-based strategy may be more efficient when the stored data items are only used once in a while. 14.2 CHARACTERISTICS OF FILE SYSTEMS The organisation, storage, retrieval, naming, sharing, and protection of information are all handled by file systems. They provide a programming interface that defines the file abstraction, removing the need for programmers to worry about storage allocation and layout specifics. Disks or other non-volatile storage medium are used to store files. Data and characteristics are both stored in files. The data is made up of a series of data pieces (usually 8-bit bytes) that can be read and written at any point in the sequence. The attributes are stored in a single record that includes information like the file's length, timestamps, file type, owner's identity, and access control lists. Figure shows an example of an attribute record structure. The file system manages the shaded properties, which are not generally updatable by user programmes. File systems are designed for store and handle huge amounts of data, including the ability to create, rename, and delete files. The use of folders aids in the naming of files. A directory is a file that maps text names to internal file identifiers. It's usually a specific type of file. The common hierarchic file-naming scheme and multi-part pathnames of files used in UNIX as well as other operating systems are the result of directories containing the names of other directories. File systems are also in charge of file access control, restricting access to files based on user authorizations as well as the type of access asked (reading, updating, executing and so on). Metadata is a term that refers to all of the additional information kept by a file system that is required for file management. It contains file attributes, directories, and all of the file system's 267 CU IDOL SELF LEARNING MATERIAL (SLM)
other persistent data. An example layered module structure for implementing a non- distributed file system in a traditional operating system is shown in Figure. Each layer is only as strong as the layers behind it. A distributed file system provides all of components listed above, as well as extra components to handle client-server communication and distributed file naming and location. File system Operations In UNIX systems, the primary operations on files which are available to applications are summarised in Figure. The kernel implements these system calls, which application programmers typically access using procedure libraries like the C Standard Input/Output Library or Java file classes. The primitives are listed here to provide an indication of an operations that file services were expected to enable, as well as to compare to the file service interfaces that will be discussed later. The UNIX operations follow a programming model wherein the file system stores some file state information for each running programme. This consists of a list of all currently open files, each with a read-write pointer indicating where the next read as well as write operation will be performed within the file. The file system is in charge of enforcing file access controls. When a file is opened in a local file system like UNIX, it does so by comparing the rights granted to the user's identity in the access control list to the mode of access requested inside the open system call. The file is opened if the privileges match the mode, and the mode is saved in the file system state information. 14.3 DISTRIBUTED FILE SYSTEM REQUIREMENTS Several of the requirements & potential hazards in the design for distributed services were discovered in the early stages of distributed file systems research. Initially, they provided access and location transparency; but, performance, scalability, concurrency management, fault tolerance, and security requirements arose and were satisfied in later stages of development. In the subsections that follow, we go over these and other prerequisites. Transparency As the most heavily used service in an intranet, the file service's functionality & performance are crucial. Many of the transparency criteria for distributed systems stated should be supported by the design of the file service. The design must strike a compromise between transparency's flexibility and scalability and software complexity & performance. Current file services address any or all of the following types of transparency: Transparency of access: Client applications should be oblivious of file distribution. For access to both local and distant files, a single set of operations is given. Remote files can be accessed without alteration by programmes built to operate on local files. 268 CU IDOL SELF LEARNING MATERIAL (SLM)
Client programmes should be able to perceive a consistent file name space. User programmes view the same name space regardless of where they are executed, and files or groups of files can be moved without affecting their path names. When files are relocated, neither the client applications nor the system administration tables on client nodes need to be modified. This enables file mobility — files, or more typically, sets or volumes of files, can be transferred manually or automatically by system administrators. Transparency in performance: While the load mostly on service varies within such a specified range, client programmes should continue to execute satisfactorily. Transparency in scalability: The service may be scaled incrementally to handle a wide range of loads and network sizes. • Changes to a file made by one client shall not interfere with the functioning of other clients accessing or editing the same file at the same time. This is the well-known concurrency control problem. Concurrency control enabling access to shared data is well recognised in many applications, and approaches for implementing it are well-known, although they are expensive. In providing recommended or necessary file- or record-level locking, most modern file services adhere to modern UNIX standards. A file may be represented by numerous copies of its data at different places in a file server that enables replication. This has two advantages: it allows multiple servers to share the load of providing services to clients accessing the same set of files, thereby increasing the service's scalability, and it improves fault tolerance by allowing clients to locate some other server that holds a copy of the file if the first one fails. Few file services fully enable replication;however, the majority do support local caching of files or sections of files, which is a sort of replication. A description of the Coda replicated file service is included in the data replication section. Heterogeneity of hardware and operating systems • Service interfaces should be designed so that client / server software can run on a variety of operating systems and computers. This is an important criterion of openness. Fault tolerance • The file service's central position in distributed systems necessitates its continued operation in the face of client / server failures. For simple servers, however, a moderately mistake design is simple. The design can be based with at invocation semantics to cope with transient communication failures, or it can be based on at-least-once semantics with a server protocol developed in terms of idempotent operations to ensure that duplicated requests do not result in erroneous file modifications. The servers can be stateless, which means that if they fail, they can be restarted and the service restored without having to restore the previous state. File replication is required for disconnection and server failure tolerance, which is more difficult to perform. 269 CU IDOL SELF LEARNING MATERIAL (SLM)
• One-copy update semantics are available in traditional file systems such as those supplied by UNIX. This is a paradigm for concurrent file access in which all processes accessing or changing a given file see the same file contents that they would see if just a single copy of the data files existed. When files are replicated or cached at several locations, there is an inevitability that changes performed at one location will not be propagated to all of the other sites that store copies, resulting in some deviation from one-copy semantics. 14.4 GOOD FEATURES OF DISTRIBUTED FILE SYSTEM There are two main uses for employing files: 1. Permanent data storage on a secondary storage medium. 2. Information sharing between applications. A file system is an operating system component that handles file management tasks such as file organisation, storage, retrieval, naming, sharing, and protection. The following services are provided by a properly distributed file system: 1.Storage service: A logical view of the storage system is provided by allocating and managing space on a secondary storage device. 2.True file service: Includes file-sharing semantics, file caching, file replication, concurrency management, and multiple copy update protocol, among other things. 3.Name/Directory service: -In charge of directory-related tasks such as creating and deleting directories, adding new files to directories, deleting files from directories, changing file names, moving files from one directory to another, and so on. A good distributed file system should have the following characteristics 1.Transparency: - a. Transparency of structure: Clients should not be aware of the number or location of file servers and storage devices. Note that performance, scalability, and reliability were all improved by using numerous file servers. b. Transparency of access: Both local and distant files should have the same level of access. c. Transparency file names: The file name should give no indication of where the file is located. When migrating from one node to another, the file's name must not be modified. d. Transparency in replication: Clients do not need to be aware of the presence or location of additional file copies. 270 CU IDOL SELF LEARNING MATERIAL (SLM)
2. User mobility: Users should be able to work on multiple nodes at different times rather than being forced to work on a single node. 3. Performance: The average amount of time required to satisfy client requests is commonly used to measure the file system's performance. 4. Simplicity and ease of use: The user interface to the file system should be straightforward, and the number of instructions should be kept to a minimum. 5. Scalability: A successful distributed file system should be able to scale quickly a s the number of nodes and users in the system grows. 6. High availability: A distributed file system should continue to work even if certain components fail, such as a link, a node, or a storage device. Multiple and independent file servers should control multiple and independent storage devices in a highly dependable and scalable distributed file system. 7. High reliability: The risk of data loss should be kept to a minimum. Backup copies of key files should be generated automatically by the system. 8. Security: Users should have faith in the privacy of their information. 9. Heterogeneity: Shared data should be accessible on a variety of platforms (e.g., Unix workstation, Wintel platform etc). 14.5 FILE MODELS A distributed file system (DFS) allows users of physically dispersed computers to exchange data and storage resources through the usage of a common file system. A cluster of workstations & mainframes connected by such a local area network is a typical DFS configuration (LAN). Each of the connected machines' operating systems includes a DFS. In the design of such systems, this study establishes the viewpoint that stresses the distributed structure and decentralisation of both data and control. It describes and examines the ideas of transparency, fault tolerance, & scalability in the context of distributed file systems. The idea of distributed operation, according to the article, is essential for a fault-tolerant and scalable DFS design. It also discusses alternate sharing semantics and ways for granting remote file access. An overview of modern UNIX-based systems, including UNIX United, Locus, Sprite, Sun's Network File System, and ITC's Andrew, highlights the principles and numerous implementations and design possibilities. Based on the evaluation of these systems, it is clear that to achieve acceptable distributed file system design, a departure from extending centralised file systems more than a communication network is required. The VFS's main concept is to introduce a single file model that can represent all supported filesystems. The file model given by the classic Unix filesystem is strictly mirrored by this paradigm. This is unsurprising, given Linux's desire to run its native filesystem with as little 271 CU IDOL SELF LEARNING MATERIAL (SLM)
overhead as possible. Each filesystem implementation, on the other hand, must translate its physical organisation into the VFS's common file model. In the common file model, for example, each directory is treated as a file containing a list of files and other directories. However, a File Allocation Table (FAT) is used by some non-Unix disk-based filesystems to keep the position of each item in the directory tree. Directories are not files in these filesystems. To adhere to the VFS's common file model, Linux implementations of these kind of FAT-based filesystems should be able to create files matching to directories on the fly as needed. Such files only reside in kernel memory as objects. To put it another way, the Linux kernel never hardcode a function to handle an operation like read( ) or ioctl( ). ( ). Instead, it must utilise a pointer for each action, with the pointer pointing to the appropriate function for the filesystem in question. Let's have a look at how the read( ) function in Figure would be transformed into a call particular to the MS-DOS filesystem by the kernel. The kernel, like any other system call, invokes the matching sys read( ) service procedure when the application calls read( ). As we'll learn later in this chapter, the file is represented in kernel memory by a file data structure. This data structure has a field called f op that includes pointers to MS-DOS file- specific functions, such as a read-file function. sys read( ) locates the function's pointer and calls it. As a result, the application's read( ) function is renamed to: Similarly, the write( ) command causes an Ext2 write function associated with both the output file to be executed. In summary, the kernel is in charge of assigning the correct set of pointers to the files variable associated for each open file, and then invoking the function relevant to the filesystem to which the f op field links. The common file model can be thought of as object-oriented, where an object is a software construct that defines a data structure as well as the methods that operate on it. Linux is not written in an object-oriented language like C+ + for efficiency reasons. Objects are thus implemented as data structures, with some fields linking to functions which correspond to the methods of the object. The following object types make up the common file model: The superblock is a type of object. This variable holds information about a mounted filesystem. This object usually corresponds to something like a filesystem control block stored on disc for disk-based filesystems. 272 CU IDOL SELF LEARNING MATERIAL (SLM)
The inode object is used to store common information about a file. This object usually corresponds to a file control block stored on disc in disk-based filesystems. Each inode object has an inode number that is used to uniquely identify the file within the filesystem. The file object keeps track of how an open file interacts with a running process. This data is only stored in kernel memory during the time a process accesses a file. The dentry object keeps track of how a directory entry is linked to its related file. This information is stored on disc in a unique method by each disk-based filesystem. Figure shows how processes interact using files using a basic example. The same file was opened by three different processes, two of which used the same hard link. Each of the three processes needs its own file object in this example, whereas only two dentry objects are needed—one for each hard link. Both dentry objects refer to the same inode object, that identifies the superblock object as well as the common disc file when used together. Figure 14.1 Interaction of file system The VFS serves a vital function in system performance in addition to providing a consistent interface for various filesystem systems. The dentry cache, which contains the most recently used dentry objects, speeds up its translation from a file pathname to the inode from the last pathname component. In general, a disc cache is a software process that allows the kernel to store certain information that is ordinarily saved on a disc in RAM so that subsequent accesses to that data can be satisfied rapidly without requiring a slow disc access. Apart from the dentry cache, Linux makes use of other disc caches such as the buffer cache and the page cache. 273 CU IDOL SELF LEARNING MATERIAL (SLM)
Take note of how a disc cache varies from a hardware cache or even a memory cache, which have nothing to do with discs or other devices. A hardware cache is a fast static RAM that helps requests directed to the slower dynamic RAM to be processed faster. A memory cache is a software feature that allows you to avoid using the Kernel Memory Allocator. 14.6 FILE ACCESSING MODELS Whenever a file is used, information from the file is read & accessed into computer memory, and there are numerous ways to do it. For certain systems, there is just one way to access files. Other systems, such as IBM's, provide a variety of access mechanisms, and determining which one is best for a given application is a key design challenge. Sequential-Access, Direct Access, and Index Sequential Method are the three methods for gaining access to a file in a computer system. The simplest method of access is sequential access. The data throughout the file is handled in the order that it appears in the file, one entry after the other. This is by far the most popular method of access; for example, editors and compilers frequently use this method. The majority of a file's operations are read and write. A read action -read next- reads the file's next position and advances a file pointer, that keeps track of the I/O location. In the same way, for the write next command, add to the end of the document and then move on to the newly written data. Data is accessed in a sequential fashion, one record after another. When we perform the read command, the pointer advances by one. When we use the write command, memory is allocated, and the pointer is moved to the end of the file. For tape, such a strategy is reasonable. Direct Access - Direct access, often known as relative access, is another option. A logical record with a fixed length that helps the computer to publish it quickly. arranged in no particular way Because disc permits random access to any file block, direct access is predicated on the disc model of a file. The file is seen as a numbered sequence of blocks or records for immediate access. As a result, we can read block 14, then block 59, and finally block 17, before writing block 17. For a direct access file, the order between reading and writing is unrestricted. The first relative block of a file is 0 and then 1 and so on. A block number given by the customer to the operating system is generally a relative block number. The alternative technique of accessing a file is the index sequential method, which is constructed on top of a sequential access approach. These methods are used to create a file index. The index, similar to a book's index, contains a pointer to the various blocks. To locate 274 CU IDOL SELF LEARNING MATERIAL (SLM)
a record inside the file, we first search the index, and then use the pointer to navigate straight to the file. Points to remember: Sequential access is the foundation of this system. It uses index to manage the pointer. Models for Accessing Files This is dependent on the method for accessing external files as well as the data access unit. 1. Obtaining remote access to files When a client requests access to a remote file, a distributed file system may utilise one of the following models to fulfil the request: a. Model of remote service The servers node is where a client's request is processed. As a result, the client's request for file access is transmitted across the network as just a message to the server, which processes the request and returns the result to the client. The amount of messages delivered and the overhead per message must be kept to a minimum. b. Model of data caching By caching the data collected from the server node, this model seeks to minimise the preceding model's network traffic. This makes advantage of the discovered in file accesses' locality feature. To keep the cache size bounded, a replacement policy like LRU is used. While this architecture minimises network traffic, it must address the cache coherency problem during writes, also because local cached copy of the data must be updated, as well as the original file at the server node and copies in any other caches. Data-caching model has the following advantages over Remote service model: Because it decreases network traffic, network contention, and file server contention, the data - caching approach offers the promise of better speed and system scalability. As a result, caching is used in practically all distributed file systems. NFS, for example, employs the remote service concept while also incorporating caching for improved speed. 2. Data Transfer Unit The unit of data transfer is a significant design issue in file systems that adopt the data - caching concept. The proportion of a file that is transported to and from clients as a result of a single read or write transaction is referred to as this. Model of file-level transfer 275 CU IDOL SELF LEARNING MATERIAL (SLM)
When file data is transferred in this manner, the full file is moved. Advantages: Because the file is only transferred once in response to a client request, it is more efficient that transferring pages at a time, which requires extra network protocol overhead. Because it only accesses the server once, it reduces server load and network traffic. This method is more scalable. It is impervious to server and network problems after the complete file is cached at the client site. Insufficient storage space mostly on client PC is a disadvantage. This method fails when dealing with really large files, particularly when the client is running on a diskless workstation. Moving a complete file when just a small portion is required is inefficient. Model of transfer at the block level The transfer of files is done in file blocks. A file block is a fixed-length contiguous section of a file (can also be an equal to a virtual memory page size). Advantages: Client nodes do not need a lot of storage space. When only a small piece of a file is required, it removes the need to copy the complete file. Disadvantages: Multiple server requests are required to view one complete file, resulting in increased network traffic and network protocol overhead. The block-level transfer model is used by NFS. Model of byte-level transfer A byte is the smallest unit of data transport. Because it allows search and processing of an arbitrary amount of a file, determined by an offset within the same file and length, the model gives maximum flexibility. The variable-length data for different entry requests makes cache management more difficult. Model for record-level transfer This approach is utilised with structured files, and the record is the unit of transfer. Semantics for File Sharing A shared file can be accessed by multiple people at the same time. The definition of when a user's edits to file data are visible to other users is an important design challenge for any file system. Semantics of UNIX: This assures that all actions are conducted in exact time sequence, and that every read operation on even a file sees the effects of all prior write operations on that file. 14.7 FILE-CACHING SCHEMES A cache is a high-speed storing data layer in computing that saves a portion of data that is often temporary in nature so that subsequent requests for that data can be served up faster 276 CU IDOL SELF LEARNING MATERIAL (SLM)
than accessing the data's primary storage location. Caching allows you to quickly reuse data that has been previously retrieved or computed. The data in a cache is usually kept in rapid access hardware like RAM (Random-access memory), but it can also be utilised in conjunction with a software component. The basic goal of a cache is to improve data recovery performance by eliminating the need to contact the slower storage layer behind it. A cache, which trades capacity for speed, typically stores a portion of data transiently, as opposed to databases, which normally store entire and durable data. RAM and In-Memory Engines: Because RAM and In-Memory engines allow high request rates and IOPS (Input/Output operations per second), caching improves data retrieval performance and lowers cost at scale. Additional resources will be required to support the very same scale using traditional databases & disk-based systems. These extra resources raise the cost, yet they still fall short of an In-Memory cache's low latency performance. Caches can be used in a variety of ways, including operating systems, networking layers such as Content Delivery Networks (CDNs) and DNS, web applications, and databases. Many read-heavy application workloads, including such Q&A portals, gaming, media sharing, & social networking, might benefit from caching to reduce latency and increase IOPS. Database queries, computationally complex calculations, API requests/responses, and web artefacts such as HTML, JavaScript, and picture files are all examples of cached data. An In-Memory data layer operating as a cache benefits compute-intensive applications that manipulate data sets, like recommendation engines or high-performance computing simulations. Very huge data sets must be accessed in real time across clusters of servers that can span hundreds of nodes in these applications. Manipulation of this data in a disk-based repository is a significant constraint for many applications due to the speed of an underlying hardware. Design Patterns: A specialised caching layer in a distributed computing environment allows systems and applications can run independently from the cache with their very own lifecycles without influencing the cache. The cache acts as a core layer that may be accessed from a variety of systems, each with its own lifespan and topology. This is especially true in a system where application units can be scaled in and out dynamically. If the cache is on the same node as the application or systems that use it, scalability may have an impact on the cache's integrity. Furthermore, local caches only help the local application that consumes the data when they are used. Data can span numerous cache servers and be kept in a central location again for benefit of all data consumers in a distributed caching environment. Caching Best Practices: It's critical to establish the accuracy of the questionnaire being cached before building a cache layer. A successful cache has a high hit rate, indicating that the data was available when it was fetched. When the data fetched isn't in the cache, it's called a cache miss. TTLs (Time to Live) controls can be used to expire data in a timely manner. Another factor to consider is whether the cache environment needs to just be highly 277 CU IDOL SELF LEARNING MATERIAL (SLM)
available, which In-Memory engines like Redis can provide. In some circumstances, rather than caching data from a primary location, an In-Memory layer might be employed as a standalone data storage layer. To assess whether or not the data resident in the In-Memory engine is sufficient, specify an appropriate RTO (Recovery Time Objective—the time it takes to recover from such an outage) and RPO (Recovery Point Objective—the final point or transaction recorded in the recovery). Most RTO and RPO needs can be met by combining the design methods and characteristics of several In-Memory engines. Caching frequently-read files will benefit servers with file-read-intensive workloads. To achieve decent speed, the programme usually relies on CMS's FSREAD cache & minidisk caching, however both features have their limitations. The reusable server kernel provides a file caching strategy based on VM Data Spaces to circumvent these limitations and expand the caching facilities available to the server writer. 1 A file cache is nothing more than a data area whose contents — files — are managed by the server kernel. The number and amount of file caches created by the server author is up to him; he has access to both APIs and operator commands for creating & deleting file caches. The server programme requests that files be cached in these data spaces solely through APIs; in response to a server's requests, the server kernel gets to read files using standard CMS file APIs and stores them in data spaces, removing them only when they become stale or even when data space storage becomes limited. When space is limited, the server kernel eliminates files in an LRU (least recently used) order. The server application is oblivious to such removal. To boost I/O efficiency, a centralised time-sharing system has included file caching. The file is cached in main memory to allow for repeated access to the same information without requiring a disc transfer. It also aids scalability and stability in distributed file systems. For a distributed file caching system, there are a few options. i. The location of the cache The cache location is the location where even the cached data is kept. When the original file is on the server disc, there are three possible cache locations. a. Server's main memory - if no caching is employed, the file is usually moved from the server's disc to the server's memory first. Following that, the file is transferred to the client's main memory. b. Client's disc - the second alternative is to use the client's disc as a cache. Keeping files cached at the client's location has several advantages: it reduces network access costs, improves dependability, is unaffected by data loss in a crash, and has more memory than the main memory cache. Disadvantages include the inability to handle diskless workstations and the high cost of disc access. 278 CU IDOL SELF LEARNING MATERIAL (SLM)
c. Client's main memory — the cache can also be stored in the client's main memory. Both the network and disc access costs are eliminated with this option. ii. Propagation of Modifications When the cache is located on the client side of a distributed file system, it may be cached on numerous nodes. In such a system, if any node modified the cached files, all of the nodes required to be consistent. Then it has to be copied across all nodes. As a result, the distributed file system follows a predefined modification propagation pattern. It features two different schemes: a. Write-through schemes — when a cache item is modified, the new value is sent to the server instantly to update the master copy of a file. In the event of a crash, this removes the lost updated file. However, it has a terrible writing performance. b. Delayed-write schemes – writing through a delayed-write scheme improves read performance but increases network traffic. When a cache entry is modified in this technique, it is maintained in the client's cache, and when all of the entries are modified, the entire cache is sent simultaneously. It also works in specific ways, such as write on cache ejection, periodic write, and write on close. iii. Validation strategies for caches It's now evident that several copies of cached files exist if they're retained on multiple client nodes. When the cache is modified on the client side, the modifications propagation specifies when the master copy should be updated on the server. When the other copies at numerous Nodes should be updated, this approach is used. There are two approaches: a. Client-Initiated Approach — In this method, the client asks the server if it has the same copy as the master copy. There are three ways to check again. Before each access, double-checking Periodic inspections at regular intervals It uses session semantics to check whenever clients open the file. b. Server-Initiated Approach — Because client-initiated approaches cause traffic congestion, server-initiated approaches are employed instead. The client informs the file server that this is opening a file in which mode in this method (reading, writing, or both). The file usages are constantly monitored by a server. When a client saves a file, an alert is sent to the server, along with any changes made to the file. It updates the file at its finish when it receives the notification. 14.8 SUMMARY 279 CU IDOL SELF LEARNING MATERIAL (SLM)
A file system (commonly abbreviated to fs) governs how data is stored and retrieved in computers. Data deposited in a storage medium without a file system would be one huge body of data with no way of knowing where one piece of data ends and the next begins. The data is easily extracted and identified by splitting it into bits and giving each one a name. Each group of data is referred to as a \"file\" in the same manner that a paper-based data management system is referred to. A \"file system\" is the structure and logic rules that are used to organise groups of data and their names. There are numerous types of file systems. Each one has its own structure and logic, as well as characteristics such as speed, adaptability, security, and size. Some file systems were created with specific applications in mind. The ISO 9660 file system, for example, is developed exclusively for optical discs. File systems can be utilised on a wide range of storage devices that employ various types of media. Hard disc drives are still important storage devices in 2019 and are expected to be so for the foreseeable future. SSDs, magnetic tapes, and optical discs are among the other types of media employed. The computer's main memory (random-access memory, RAM) is utilised in some circumstances, such as with tmpfs, to establish a temporary file system for short-term use. Some file systems are utilised on local data storage devices, while others employ a network protocol to facilitate file access (for example, NFS, SMB, or 9P clients). Some file systems are \"virtual,\" which means that the given \"files\" (called virtual files) are computed on demand (such as procfs and sysfs) or are just a mapping into a different file system that serves as a backing store. The file system controls who has access to both the content and metadata of files. It is in charge of organising storage space; dependability, efficiency, and tuning for the physical storage medium are all significant design issues. When unused space or single files are not contiguous, file system fragmentation develops. Files are generated, changed, and removed when a file system is utilised. The file system allots space for data when a file is created. Some file systems allow or require you to specify an initial space allocation followed by incremental space allocations as the file grows. As files are erased, the space that they were assigned becomes accessible for other files to use. As a result, you'll have a mix of used and unused spaces of varied sizes. This is referred to as free space fragmentation. When a file is generated and there isn't enough contiguous space to allocate it all at once, the space must be allocated in chunks. When a file is updated to become larger, it may surpass the space initially allocated to it, necessitating the assignment of additional allocation elsewhere, fragmenting the file. 280 CU IDOL SELF LEARNING MATERIAL (SLM)
Disk quotas are used by system administrators in some operating systems to control the amount of disc space allocated. 14.9 KEYWORDS Asynchronous communication-A type of communication in which the send operation is non-blocking and receive can be blocking (more common) or non- blocking. Dependability -The set of requirements placed on a computer system which ensures its correctness, security and fault-tolerance. Intranet -A portion of a network that is separately administered and has a boundary that can be configured to enforce local security policies. IP (Internet Protocol) -The protocol that defines both the format of packets used on a TCP/IP Internet and the mechanism for routing a packet to its destination. Connection-oriented communication service -A communication service, e.g., TCP, based on establishing a data stream connection to ensure reliable in-sequence delivery of data. 14.10 LEARNING ACTIVITY 1. A search engine is a web server that responds to client requests to search in its stored indexes and (concurrently) runs several webscrawlers tasks to build and update the indexes. What are the requirements for synchronization between these concurrent activities? 2. Consider a hypothetical car hire company and sketch out a three-tier solution to the provision of their underlying distributed car hire service. Use this to illustrate the benefits and drawbacks of a three-tier solution considering issues such as performance, scalability, dealing with failure and also maintaining the software over time. 14.11 UNIT END QUESTIONS 281 A. Descriptive Questions Short questions 1. Define push-based technique. CU IDOL SELF LEARNING MATERIAL (SLM)
2. List any two characteristics of file systems. 3. What is meta data? 4. What are the three methods for gaining access to a file in computer system? 5. What is FAT? Long Questions 1. Explain the operations of file system 2. Describe the requirements of file system 3. Explain about file caching schemes. 4. Describe about file models. 5. Explain the characteristics of a distributed file system. B. Multiple Choice Questions 1. In distributed file system, file name does not reveal the file’s ___________ a. Local name b. Physical storage location c. Migration d. Mutex file 2. Which one of the following hides the location where in the network the file is stored? a. hidden distributed file system b. escaped distribution file system c. transparent distributed file system d. Remote access 3. In a distributed file system, when a file’s physical storage location changes ___________ a. file name need not to be changed b. file name needs to be changed c. file’s host name needs to be changed d. file’s local name needs to be changed 282 CU IDOL SELF LEARNING MATERIAL (SLM)
4. The file once created cannot be changed is called ___________ a. mutex file b. mutable file c. Immutable file d. None of these 5. _______ is not possible in distributed file system. a. Migration b. Client interface c. Remote access d. Strategy Answers 1-b, 2-c, 3- a, 4- c, 5-a 14.12 REFERENCES Reference books George Coulouris, Jean Dollimore, Tim Kindberg, “Distributed Systems: Concepts and Design” (4th Edition), Addison Wesley/Pearson Education. Pradeep K Sinha, “Distributed Operating Systems: Concepts and design”, IEEE computer society press. Text Book References M.R. Bhujade, “Parallel Computing”, 2nd edition, New Age International Publishers2009. Andrew S. Tanenbaum and Maarten Van Steen, “Distributed Systems: Principles and Paradigms, 2nd edition, Pearson Education, Inc., 2007, ISBN: 0-13-239227-5. Websites: https://www.techopedia.com/definition/10062/wireless-communications https://www.computernetworkingnotes.com/ https://www.guru99.com 283 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 15-DISTRIBUTED FILE SYSTEMS 2 STRUCTURE 15.0Learning Objectives 15.1Introduction to Distributed File Systems 15.2File Replication 15.3Network File System 15.4Andrew File System 15.5Hadoop Distributed File System and Map Reduce 15.6Summary 15.7Keywords 15.8Learning Activity 15.9Unit end questions 15.10 References 15.0 LEARNING OBJECTIVES After studying this unit, you will be able to: Explain the basics of file replication Outline the importance of file system in a distributed manner Describe the Hadoop architecture List the features of map reduce 15.1 INTRODUCTION TO DISTRIBUTED FILE SYSTEMS Given the importance of data sharing in distributed systems, it's no surprise that distributed file systems are the foundation for so many distributed applications. Distributed file programs allow numerous processes to share data securely and reliably over extended periods of time. As a result, they've become the foundation for distributed applications and systems. NFS uses a file system paradigm that is nearly identical to that of UNIX-based computers. Files are viewed as uninterpreted byte sequences. They are grouped hierarchically into a naming graph, with nodes representing folders and files. NFS, like any other UNIX file system, provides both hard and symbolic links. Files are named, but they are accessible via a UNIX-like file handle, which we will go over in detail later. To put it another way, a client must first seek up a file's name in a naming service and retrieve the accompanying file handle 284 CU IDOL SELF LEARNING MATERIAL (SLM)
before accessing it. In addition, each file does have a number of attributes, the values of which can be looked up and altered. The general file operations provided by NFS versions 3 and 4 are shown in the diagram. The create operation is being used to create a file, although in NFSv3 and NFSv4, it has somewhat different meanings. The technique is used to create regular files in version 3. Separate actions are used to create special files. Hard links are created via the link procedure. Symbolic linkages are created using symlink. Subdirectories were created with Mkdiris. The mknod operation is used to generate special files such as device files, sockets, and named pipes. In NF1Sv4, the situation is completely different, as create is now used to create nonregular files such as symbolic links, directories, or special files. Hard links are still formed with a separate link operation, but ordinary files are generated with the open operation, which would be new to NFS and represents a significant departure from the previous method to file handling. NFS was created to allow stateless file servers up until version 4. 15.2 FILE REPLICATION In distributed systems, replication is essential for high availability & fault tolerance. With the trend toward mobile computing and, as a result, disconnected operation, high availability is becoming more important. For services supplied in safety-critical and other vital systems, fault tolerance is a constant worry. The first section of this chapter looks at systems that apply one single operation to a group of replicated objects at a time. It starts with an architectural component description and a system model of services that use replication. We explain group membership management as a component of group communication, which is especially crucial for fault-tolerant applications. After that, the chapter discusses how to achieve fault tolerance. It explains the linearizability and sequential consistency accuracy criteria before delving into two methods: passive (primary-backup) replication, wherein clients communicate with a single replica, and active replication, wherein clients connect with all replicas via multicast. Three systems providing highly available services are studied as case studies. Updates are transferred slowly across replicas of shared data in the gossip and Bayou architectures. To guarantee consistency in Bayou, the approach of operational transformation is applied. A highly accessible file service, such as Coda, is an example. Replication is a service-enhancement strategy. The following are some of the reasons for replication: Enhancement of performance: Data caching at clients and servers is a well-known performance optimization technique. To avoid the latency of obtaining resources from the originating server, browsers and proxy servers cache copies of web resources. Furthermore, data is sometimes transparently copied between multiple source servers in the same domain. 285 CU IDOL SELF LEARNING MATERIAL (SLM)
By linking all of the server IP addresses to the site's DNS name, such as www.aWebSite.org, the burden is distributed throughout the servers. In a round-robin approach, a DNS search of www.aWebSite.org returns one of the multiple servers' IP addresses. For more complicated services dependent on data duplicated across thousands of computers, more sophisticated load-balancing solutions are required. Dilley et al. [2002], for example, describe the Akamai content distribution network's DNS name resolution strategy. Immutable data replication is simple: it improves performance at a low cost to the system. Overheads are incurred in the form of procedures designed to ensure that clients receive up- to-date data while replicating changing data, such as that found on the Web. As a result, replication's effectiveness as a performance-enhancing strategy is limited. Increased service availability: Users want services to be available at all times. That is, the percentage of time a service is available with acceptable response times should be close to 100%. Apart from delays caused by pessimistic concurrency control conflicts (due to data locking), server failures, network partitions, and disconnected operation (communication disconnections that are often unplanned and are a side effect of user mobility) are also issues that affect high availability. The second reason that works against high availability is network partitions (see Section 15.1) and disconnected operation. As they move around, mobile users may choose to unplug their computers or become mistakenly separated from a wireless network. A person using a laptop aboard a train, for example, may not have access to networking (wireless networking may be interrupted, or they may have no such capability). To be able to work in these situations – also known as disconnected working or disconnected operation – the user would often prepare by moving frequently used data from their usual surroundings to the laptop, such as the contents of a shared diary. However, there is often a cost to being available during a period of disconnection: when the user reads or changes the diary, they risk reading material that has been changed in the interim by someone else. They might, for example, schedule an appointment for a time window that has already been taken. Disconnected working is only possible if the user (or the programme acting on the user's behalf) is able to deal with stale data and resolve any conflicts that emerge afterwards. Fault tolerance: Data that is widely available does not always imply that it is completely accurate. It could be out of date, or two users on opposite sides of a network partition could make conflicting updates that need to be reconciled. A fault-tolerant service, on the other hand, ensures that, despite a given number and type of faults, it always behaves correctly. The accuracy refers to the timeliness of data provided to the customer as well as the impact of the client's operations on the data. Correctness can also refer to the timeliness of a service's replies, as in the case of an air traffic control system, where accurate data is required on short timescales. 286 CU IDOL SELF LEARNING MATERIAL (SLM)
Fault tolerance can be achieved using the same basic strategy used for high availability — duplicating data and functionality between systems. If up to f of f + 1 servers fail, at least one should be able to continue to provide service. And if up to f servers can fail byzantine, a group of 2f + 1 servers can theoretically provide a right service by having the correct servers outvote the failed servers (who may supply spurious values). However, fault tolerance is more nuanced than this basic statement suggests. The system must accurately handle the coordination of its components in order to retain correctness guarantees in the event of failures, which can happen at any time. 15.3 NETWORK FILE SYSTEM (NFS) The architecture of Sun NFS is depicted as a virtual file system. It is based on the abstract model described in the previous section. The NFS protocol – a collection of remote procedure calls that allows clients to perform actions on a distant file storage – is supported by all NFS implementations. The NFS protocol is operating system agnostic, but it was designed for use in UNIX networks, and we'll go over the UNIX implementation of the protocol (version 3). Each machine that functions as an NFS server has the NFS server module installed in its kernel. The client module translates requests for files in a remote file system into NFS protocol operations, which are then transmitted to the NFS server module on the machine that holds the relevant file system. Remote procedure calls are used to communicate between the NFS client and server components. The RPC mechanism created by Sun for use with NFS. The NFS protocol is compatible with both UDP and TCP, and it can be configured to utilise either. Clients can bind to services in a specific host by name using the port mapper service. Any process can send requests to an NFS server over the RPC interface; provided the requests are valid and include valid user credentials, they will be processed. The input of signed user credentials, as well as data encryption for privacy and integrity, can be required as an optional security feature. NFS enables access transparency: user applications can issue file actions for both local and remote files without discrimination, as shown in Figure. Other distributed file systems that accept UNIX system calls may be present, and if so, they might be incorporated similarly. A virtual file system (VFS) module has been introduced to the UNIX kernel to distinguish between local and remote files and to convert between NFS's UNIX-independent file identifiers and the internal file identifiers used by UNIX and other file systems. VFS also keeps track of the filesystems that are currently available both locally and remotely, and it routes each request to the correct local system module (the UNIX file system, the NFS client module or the service module for another file system). File handles are the file identifiers used in NFS. Clients have no idea what a file handle is, but it contains any information the server needs to identify one file from another. The file handle 287 CU IDOL SELF LEARNING MATERIAL (SLM)
is derived from the file's i-node number in UNIX versions of NFS by adding two extra fields, as shown below (the i-node number of a UNIX file is a number that serves to identify and locate the file within the file system in which the file is stored). The UNIX mountable filesystem is used by NFS as the unit of file grouping mentioned in the previous section. The filesystem identification field is a one-of-a-kind number assigned to each filesystem at the time of its creation (and in the UNIX implementation is stored in the superblock of the file system). Because i-node numbers are reused in the traditional UNIX file system when a file is removed, the i-node generation number is required. A generation number is saved with each file in the VFS extensions to the UNIX file system, and it is incremented each time the i-node number is utilised (for example, in a UNIX create system call). When a client mounts a remote file system, it gets the first file handle. In the results of lookup, create, and mkdir operations (see Figure 12.9), file handles are passed between server to client, and in the parameter lists of all server actions, file handles are passed from client to server. Each mounted file system has one VFS structure and each open file has one v-node in the virtual file system layer. A VFS structure connects a remote file system to a local directory it's mounted on. A flag in the v-node indicates whether a file was local or remote. The v-node holds a reference to the entry of the local file if the file is local (an i-node in a UNIX implementation). If the file is remote, it contains the remote file's file handle. • The NFS client module fulfils the role of the client module within our architectural paradigm, providing an interface that may be used by traditional application programmes. It, however, differs from our model client module in that it accurately emulates the semantics of ordinary UNIX file system primitives and is integrated with the UNIX kernel. It is built into the kernel rather than being distributed as a library for use by client processes so that: • User programmes can use UNIX system calls to access files without recompiling or reloading; • All user-level processes are served by a single client module, which has a shared cache of recently utilized blocks (described below); the encryption key used it to authenticate user IDs is provided to the server (see below) can be kept in the kernel, preventing user-level clients from impersonating it. Each client machine's virtual file system works in tandem with the NFS client module. It works in the same way as a standard UNIX file system, moving blocks of files from and to the server and caching them in local memory wherever possible. It makes use of the buffer cache as the local input/output system. However, because multiple clients on different host machines can access the very same remote file at the same time, a new and serious cache consistency issue occurs. 288 CU IDOL SELF LEARNING MATERIAL (SLM)
Authentication and access control • Unlike a traditional UNIX file system, an NFS server is stateless, which means it does not maintain files open for its clients. As a result, on each request, the server must check the user's identity against the file's access authorization attributes to see if the user is authorised to view the file in the manner requested. Clients must transmit user authentication information (for example, the traditional UNIX 16- bit user ID & group ID) with each request, which is then compared to the access permission in the file attributes. These extra arguments aren't shown in our NFS protocol overview in Figure because they're provided automatically by the RPC system. This access-control method has a security flaw in its most basic form. On each host, an NFS server provides a standard RPC interface on a well-known port. Any process can act as a client, requesting access to or updates to a file from the server. The client can alter the RPC calls to also include the user ID of any user, allowing the client to impersonate that user without its knowledge or permission. The addition of an option with in RPC protocol for DES encryption of the user's authentication information has solved this security flaw. Kerberos has recently been merged with Sun NFS to provide a more robust and complete answer to the challenges of user authentication & security; we'll go over this in more detail below. Figure shows a simplified representation of an RPC interface offered by NFS version 3 servers (as defined in RFC 1813 [Callaghan et al. 1995]). The read, write, getattr, and setattr NFS file access actions are nearly identical to the Read, Write, GetAttributes, and SetAttributes methods specified for our flat file service model. The lookup procedure in Figure 12.9, as well as the majority of other directory actions, are comparable to those in the directory service architecture. The file and directory actions are combined into a single service; both creation and insertion the file names in directories are handled by a single create operation, which accepts the new file's text name and the target directory's file handle as parameters. Create, remove, rename, link, symlink, readlink, mkdir, rmdir, readdir, and statfs are the other NFS operations on directories. With the exception of readdir, that provides a representation-independent mechanism for reading the contents in directories, and statfs, which provides status information about remote file systems, they are similar to their UNIX counterparts. Mount service • A separate mount customer service that runs at the user level from each NFS server computer allows clients to mount subtrees of distant filesystems. A file with the well-known name (/etc/exports) on each server contains the names of local filesystems which are available to remote mounting. Each filesystem name has an access list that specifies which hosts are allowed to mount the filesystem. 289 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 15.1 Local and Remote File Systems Clients request mounting of such a remote filesystem using a modified form of the UNIX mount command, specifying the remote host's name, the pathname of such a directory in the distant filesystem, and the local name in which it is to be mounted. Any subtree of the needed remote filesystem can be used as the remote directory, allowing clients to mount any section of the remote filesystem. The mount command has been changed to communicate. Using a mount protocol, communicate with the remote host's mount service process. If the client possesses access permission for both the relevant filesystem, this is an RPC protocol that provides an operation that accepts a directory pathname and returns actual file handle of the provided directory. The VFS layer and the NFS client are given the server's location (IP address and port number) as well as the file handle again for remote directory. The nodes students and staff in Client's local file store are mounted over the nodes persons and users in filesystems on Server 1 and Server 2. This means that programmes running on Client can use pathnames like /usr/students/jon and /usr/staff/ann to access files on Server 1 and Server 2. 15.4 ANDREW FILE SYSTEM 290 CU IDOL SELF LEARNING MATERIAL (SLM)
AFS, like NFS, gives UNIX programmes on workstations transparent access to remote shared files. AFS files are accessed using standard UNIX file primitives, allowing existing UNIX programmes to access AFS folders without the need to recompile or modify them. NFS and AFS are compatible. AFS servers store ‘local' UNIX files, but the servers' filing system is NFS-based, so files are referred to by NFS-style file handles rather than i-node numbers, and they may be accessed remotely over NFS. In terms of design and execution, AFS is very different from NFS. The main reason for the variations is that scalability was identified as the most critical design goal. Other distributed data files are not built to handle huge numbers of active users as well as AFS. The caching of whole files on client nodes is a crucial method for improving scalability. AFS has two distinct design features: Whole-file serving: AFS servers send the whole contents of directories and files to client computers (files greater than 64 kbytes are delivered in 64-kbyte chunks in AFS-3). Caching of entire files: Once a copy of a file or even a chunk has been transmitted to a client computer, it is cached on the local disc. Several hundred of the computer's most recently utilised files are stored in the cache. The cache is persistent, surviving client computer reboots. When possible, local copies of files are used to meet open requests from clients instead of remote copies. 1. When a user process on a client computer performs an open system function for a file inside the shared file space, the server hosting the file is located and a request for a duplicate of the file is issued. 2. The copy is saved in the client computer's local UNIX file system. The copy is then opened, and the client receives the UNIX file descriptor as a result. 3. Processes on the client computer apply subsequent read, write, and other operations on the file to the local copy. 4. If the local copy has indeed been modified, the contents are being sent back to the server when the client process performs a close system call. The file's contents and timestamps are updated by the server. The copy on the client's local disc is kept in case a user -level procedure on the same workstation requires it again. We'll go through AFS's actual performance later, but based on the design characteristics given above, we can make several general observations & predictions now: • Local cached copies of shared files that really are infrequently updated (such as those containing this same code of UNIX commands and libraries) and files that are normally accessed by only a single user (such as most of the files in a user's home directory and its subtree) are likely to remain valid besides long periods – in the first case because they are not updated, and in the second case because if they are, they are likely to be overwritten. The vast majority of file accesses are made to these types of files. 291 CU IDOL SELF LEARNING MATERIAL (SLM)
On each workstation, a significant percentage of the disc space – say, 100 megabytes – can be dedicated to the local cache. This is usually adequate for setting up a workable collection of files for a single user. The availability of sufficient cache storage for the construction of a working set guarantees that files used often on a specific workstation are generally stored in the cache until they are required again. The design technique is based on various assumptions regarding the average and maximum file size in UNIX systems, as well as the locality of file references. Observations of typical UNIX workloads at academic as well as other environments [Satyanarayanan 1981, Ousterhout et al. 1985, Floyd 1986] led to these assumptions. The following are the most relevant findings: - Files are small; the majority are less than 10 kilobytes in size. - File read operations are far more common than write operations (about six times more common). - Sequential access is the most prevalent, while random access is uncommon. - The majority of files are written and read by a single user. When a file is shared, it is common for only one user to make changes to it. - File references are made in bursts. If a file has recently been referred, it is very likely that it will be cited again in the near future. These findings were utilised to inform the analysis and design of AFS, rather than to limit the functionality available to users. The file types indicated in the first step above operate best with AFS. One form of file that does not fall into any of these categories is databases, which are often shared by many users and updated frequently. The designers of AFS explicitly excluded database storage from their design goals, claiming that the constraints imposed by different naming structures (that is, content-based access) and the need for fine-grained data access, concurrency control, and atomicity of updates make designing a distributed database system that is also a distributed file system difficult. They suggest that the provision of infrastructure for distributed databases should be dealt with separately. 292 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 15.2 Andrew File System The example above depicts how AFS works, but it leaves many concerns concerning its implementation unsolved. Among the most significant are the following: • How does AFS obtain control when a client issues an open or shut system call referring to either a file in the shared file space? Where is the requested file held on the server? In workstations, how much space is set aside for cached files? When files are updated by several clients, how does AFS ensure that the cached copies of files are up-to-date? Below are the answers to these questions. AFS is implemented as 2 software components known as Vice and Venus, which run as UNIX processes. The distribution of Vice & Venus processes is depicted in Figure. Venus is a user-level UNIX program that operates in each client computer & corresponds to a client module in our abstract model, while Vice is the name given to the server programme that runs as a user-level UNIX program in each server computer. Local or shared files are available to user processes operating on workstations. Local files are treated like any other UNIX file. They're kept on a workstation's hard drive and are only accessible to local user programmes. Shared files are kept on servers, and copies are cached on workstations' local discs. Figure depicts the name space that user processes see. It's a standard UNIX directory structure, with a subtree (named cmu) containing all the shared files. The division of the file name space in local and shared files results in a loss of location transparency, but users other than system administrators are unlikely to notice. 293 CU IDOL SELF LEARNING MATERIAL (SLM)
Only temporary files (/tmp) & processes required for workstation initialization are stored locally. Other common UNIX files (such those found in /bin, /lib, and so on) are implemented via symbolic links across local directories to shared space files. Users can access their files from every workstation because their folders are in the shared space. Each workstation and server's UNIX kernel is a customised version of BSD UNIX. The changes are intended to intercept open, close, and other file system calls that refer to files in the common name space then pass them to the client computer's Venus process. For performance considerations, another kernel patch is added, which will be discussed later. One of the file partitions on each workstation's local drive is used as a cache, storing cached copies of shared space files. When a new file is acquired from a server, Venus manages the cache, eliminating the least recently used files to free up space if the partition is full. Once a workable collection of the current user's files and commonly used system files has been cached, the workstation cache is usually large enough to handle several hundred average- sized files, rendering the workstation completely independent of the Vice servers. Figure 15.3 File Name Space In the following ways, AFS is similar to the abstract file service model described: The Vice servers provide a flat file service, while the Venus processes in the workstations provide the hierarchic directory structure needed by UNIX user programmes. A 96-bit file identification (fid), similar to a UFID, is assigned to each file and directory inside the shared file space. The Venus operations convert client-issued pathnames to fids. 294 CU IDOL SELF LEARNING MATERIAL (SLM)
15.5 HADOOP DISTRIBUTED FILE SYSTEM AND MAP REDUCE Hadoop is a system for distributing the processing of large data volumes among clusters of computers utilizing simple programming techniques. Hadoop is a platform for handling ‘Big Data' in simple terms. Doug Cutting is the creator of Hadoop. Mike Cafarella was also the one who came up with the idea. It's built to scale from a single server to thousands of devices, each with its own computation and storage capabilities. Hadoop is a free and open-source database. The Hadoop Distributed File System (HDFS) is the storage component of Apache Hadoop, while the processing component is based on the Map-Reduce programming style. During a cluster, Hadoop divides files into big blocks and distributes them among nodes. The bundled code is subsequently transferred to nodes, which process the data in parallel. On clusters of computers, MapReduce is a programming approach for processing and producing massive data collections. Google was the first to introduce it. MapReduce is a large-scale parallelization notion or approach. It's based on the map () and reduce () functions in functional programming. The MapReduce programme is divided into three stages: The job of a mapper is to process input data. The map function is applied to the local data by each node. Shuffle: Data is redistributed based on output keys, and nodes are redistributed. (The map function generates the output keys.) Reduce: Nodes are now processed in parallel into each group of output data. HDFS is a file storage system for very large files, as the name implies. It has a number of distinguishing advantages, such as scalability and distributed nature, that make it ideal for Big Data operations. HDFS is a file system that stores data on commodity hardware and may run on large clusters with the ability to stream data for real-time processing. Though HDFS is comparable to other distributed file systems, it has certain unique characteristics and advantages that make it so widely used in Big Data Hadoop projects. Here are a few of the most crucial: Working with very huge data sets HDFS is designed to scale. One of its major assets is this. HDFS succeeds where other distributed file systems fail miserably. It can store and retrieve petabytes of data on demand, making it ideal for Big Data applications. It is able to store massive amounts of data by distributing it among hundreds, if not thousands, of cheap and easily available commodity hardware. HDFS has the highest data bandwidth of any competing file system. Using low-cost commodity hardware to store data 295 CU IDOL SELF LEARNING MATERIAL (SLM)
HDFS was designed from the ground up to handle Big Data workloads. Cost overruns are one of the most significant challenges when working with Big Data. If not properly managed, the hardware and infrastructure might cost millions of dollars. HDFS is a godsend in this situation since it can run on low-cost commodity hardware. Hadoop can be simply deployed on standard personal computers, and HDFS works perfectly in this environment. All of this translates to significant cost savings and the ability to scale at will. Ability to write only once and read multiple times HDFS files can be written once and read as many times as necessary. The primary assumption is that once a file has been written, it will not be overwritten, allowing it to be retrieved several times without difficulty. This directly adds to HDFS's great overall performance, as well as resolving the issue of data coherency. Providing real-time data access When working with Big Data applications, it is critical to retrieve data in a streaming fashion. This is what HDFS accomplishes so well because of its capacity to enable data access in real time. Rather than providing minimal latency in accessing a single file, the focus is on giving fast throughput to vast amounts of data. A lot of emphasis is placed on achieving streaming data or retrieving it at lightning rates, but less emphasis is placed on how this data is kept. This helpful Cloudera post explains how to set up HDFS on a cluster. Exceptional throughput The high throughput of Big Data applications is one of its distinguishing features. HDFS accomplishes this through a number of unique features and capabilities. The task is broken down into smaller chunks and split among several systems. As a result, the various components function individually and in parallel to complete the task at hand. Because data is read in parallel, the processing time is dramatically reduced, resulting in great throughput regardless of the size of the data files. Rather of moving data, we're moving computation. This is a unique feature of the Hadoop Distributed File System that allows you to bring data processing closer to the data source rather of sending data over the network. When dealing with large amounts of data, moving it becomes particularly difficult, resulting in overburdened networks and slower data processing. HDFS solves this problem by allowing you to have application interfaces close to the data storage location for speedier calculation. Data replication and fault tolerance Because the data is kept on low-cost commodity hardware, there must be a trade-off, which manifests itself in the frequent failure of nodes or commodity hardware. However, HDFS circumvents this issue by storing data in at least three nodes. To be resilient in the event of node failure, two nodes are on the same rack whereas the third node is on a different rack. 296 CU IDOL SELF LEARNING MATERIAL (SLM)
Because of the unique design of HDFS file storage, it provides a sufficient fault recovery mechanism, easy data replication, increased scalability, and data accessible to the entire system. An in-depth look at Apache MapReduce Hadoop clusters for Big Data applications use MapReduce as their data processing engine. The Mapper and Reducer are the two functions that make up the basic architecture of a MapReduce programme. These two MapReduce pillars can guarantee that any developer can write programmes to process data stored in a distributed file system like HDFS. The Mapper is in charge of grouping data into buckets and aggregating it before passing it on to the Reducer. The Reducer then aggregates the Mapper output to remove redundancy and decrease it while using Wordcount to keep track of how many times it is received from the Mapper. MapReduce has a number of different advantages, and we've mentioned a few of the most essential ones below: Hadoop applications use the Hadoop Distributed File System (HDFS) as their primary data storage system. HDFS is a distributed file system that uses a Name Node and Data Node architecture to allow high-performance data access across highly scalable Hadoop clusters. Hadoop is an open-source distributed processing system for big data applications that controls data processing and storage. HDFS is an important component of the Hadoop ecosystem. It provides a secure platform for managing large data sets and supporting big data analytics applications. What is HDFS and how does it work? HDFS allows data to be transferred quickly between compute nodes. It was initially tightly tied with MapReduce, a data processing framework that filters and divides work among cluster nodes, then organises and condenses the findings into a cohesive answer to a query. Similarly, when HDFS receives data, it divides it into individual blocks and distributes them throughout the cluster's nodes. Data is written to the server once, then read and reused several times with HDFS. A primary Name Node in HDFS keeps track of where file data in the cluster is stored. On a commodity hardware cluster, HDFS also has several Data Nodes, typically one per node. In the data centre, the Data Nodes are usually grouped together in the same rack. For storage, data is divided down into individual blocks and distributed among the numerous Data Nodes. Additionally, blocks are copied between nodes, allowing for extremely efficient parallel processing. The Name Node understands which Data Node includes which blocks where in the machine cluster the Data Nodes are located. The Name Node also controls file access, including reads, writes, creates, and deletes, as well as data block replication between Data Nodes. 297 CU IDOL SELF LEARNING MATERIAL (SLM)
The Data Nodes and the Name Node work together to create the Name Node. As a result, the cluster may dynamically adjust to changing server capacity requirements in real time by adding or removing nodes as needed. The Name Node and the Data Nodes are all in constant communication to decide whether the Data Nodes need to execute specific tasks. As a result, the Name Node is always aware of each DataNode's status. If the Name Node notices that one of the Data Nodes isn't functioning properly, it can reassign the DataNode's responsibility to another node that has the same data block. Data Nodes can also interact with one another, allowing them to work together during routine file operations. Furthermore, the HDFS is built to be extremely fault-tolerant. Each piece of data is replicated — or copied — numerous times by the file system, which then distributes the copies to various nodes, with at least one copy placed on a different server rack from the others. NameNodes and Data Nodes are two components of the HDFS architecture. A primary/secondary architecture is used by HDFS. The Name Node of an HDFS cluster is the main server that handles the file system namespace and regulates client file access. The Name Node, being the central component of a Hadoop Distributed File System, maintains and controls the file system namespace and grants appropriate access permissions to clients. The Data Nodes in the system are in charge of the storage attached to the nodes they run on. HDFS exposes a file system namespace and allows for the storage of user data in files. A file is divided into one or more blocks, each of which is kept in a collection of Data Nodes. The Name Node is responsible for file system namespace actions such as file and directory opening, closing, and renaming. The Name Node is also in charge of the block-to-Data Node mapping. The Data Nodes handle read and write requests from the file system's clients. When the Name Node orders them to, they also handle block creation, deletion, and replication. Traditional hierarchical file organisation is supported by HDFS. An application or a user can make directories and then store files within them. A user can create, remove, rename, or move files from one directory to another in the file system namespace hierarchy, just as they do in most other file systems. Any modification to the file system namespace or its characteristics is recorded by the Name Node. The number of replicas of even a file that the HDFS should keep can be specified by an application. The replication factor of a file, which is stored in the Name Node, is the number of copies of that file. HDFS's features HDFS has a number of features that make it especially useful, including: Replication of data. This is used to ensure that data is always accessible and that data loss is avoided. When a node crashes or a piece of hardware fails, for example, duplicated data from other nodes in the cluster can be used to keep processing going while the data is recovered. 298 CU IDOL SELF LEARNING MATERIAL (SLM)
Fault tolerance and dependability are two important factors to consider. Fault tolerance and reliability are ensured by HDFS' capacity to replicate file blocks and store them across nodes in a large cluster. Availability is high. Data is available even if the Name Node or a Data Node fails, as previously noted, due to replication across notes. Scalability. Because HDFS stores data on multiple nodes in the cluster, a cluster can scale to hundreds of nodes as demand grows. Exceptional throughput. Data can be handled in parallel on a cluster of nodes since HDFS stores data in a distributed fashion. This, together with data proximity (see the following bullet), reduces processing time and allows for high throughput. Locality of data. Instead of having the data relocate to where the computing unit is, HDFS computes on the Data Nodes where the data sits. This strategy reduces network congestion and increases overall performance by shortening the distance between data and the computing process. What are the advantages of HDFS? The following are five major benefits of using HDFS: Efficiency in terms of cost. The Data Nodes that store the data use low-cost off-the-shelf hardware to save money on storage. There is also no licencing fee because HDFS is open source. Storage of large data sets. HDFS stores data in a variety of sizes and formats, including structured and unstructured data, ranging from gigabytes to petabytes. Recovery time after a hardware failure is short. HDFS is designed to detect and recover from errors on its own. Portability. HDFS can be used on any hardware platform and is compatible with a variety of operating systems, including Windows, Linux, and Mac OS X. Data access in real time. HDFS is designed for high data speed, making it ideal for accessing live data. Use cases and examples for HDFS Yahoo implemented the Hadoop Distributed File System as part of its online ad placement & search engine requirements. Yahoo, like other web-based businesses, juggled a number of apps that were accessible by an expanding number of users who were generating an increasing amount of data. eBay, Facebook, LinkedIn, and Twitter are just a few of the organisations that have embraced HDFS to power big data analytics to meet Yahoo's needs. 299 CU IDOL SELF LEARNING MATERIAL (SLM)
HDFS has applications that go beyond ad serving and search engine requirements. The New York Times used it for large-scale picture conversions, while Media6Degrees used it for log processing & machine learning, LiveBet used it for log storage and odds analysis, Joost used it for session analysis, & Fox Audience Network used it for log analysis and data mining. Many open source data lakes use HDFS as their backend. HDFS is used by enterprises in a variety of industries to manage large data sets, including: Electric utilities. The power sector uses phasor measurement units (PMUs) to monitor the health of smart grids throughout their transmission networks. At selected transmission stations, these high-speed sensors monitor current and voltage by amplitude and phase. These organisations use PMU data to detect network segment system failures and alter the grid accordingly. They might, for example, switch to a backup power source or adjust the load. PMU networks generate millions of files per second;thus, power companies may benefit from low-cost, high-availability file systems like HDFS. Marketing. Marketers that want to run targeted marketing initiatives need to know a lot about their target audiences. CRM systems, direct mail responses, point-of-sale systems, Facebook, and Twitter are all good places for marketers to collect this information. An HDFS cluster is the most cost-effective place to put data before processing it because much of it is unstructured. Oil and gas companies. Oil and gas firms deal with a wide range of data forms, including movies, 3D earth models, and machine sensor data, as well as very big data sets. An HDFS cluster can serve as a viable platform for the required big data analytics. Research. Because data analysis is an important element of research, HDFS clusters are a cost-effective approach to store, process, and analyse massive amounts of data. Data replication in HDFS The HDFS format includes data replication, which ensures that data is available even if a node or piece of hardware fails. The data is separated into blocks and replicated over multiple nodes in the cluster, as previously indicated. As a result, if one node fails, the user can still access the data stored on other machines. The replication process is maintained by HDFS at regular intervals. 15.6 SUMMARY To duplicate anything means to make a copy or several copies of it. In terms of file replication, this would imply that you have a source or primary file for which you want to produce a single copy or several copies. As a result, replication control should be handled automatically and transparently to the user. On a replicated file, a read operation is performed by reading any copy of the file, and a write operation is performed by writing to any and all available copies of 300 CU IDOL SELF LEARNING MATERIAL (SLM)
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304