Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore CU-BCA-SEM-III-Database Management System- Second Draft-converted

CU-BCA-SEM-III-Database Management System- Second Draft-converted

Published by Teamlease Edtech Ltd (Amita Chitroda), 2021-04-29 05:40:06

Description: CU-BCA-SEM-III-Database Management System- Second Draft-converted

Search

Read the Text Version

▪ read_item(T, X): This log entry records that transaction T reads the value of database item X. ▪ write_item(T, X, old_value, new_value): This log entry records that transaction T changes the value of the database item X from old_value to new_value. The old value is sometimes known as a before an image of X, and the new value is known as an afterimage of X. ▪ commit(T): This log entry records that transaction T has completed all accesses to the database successfully and its effect can be committed (recorded permanently) to the database. ▪ abort(T): This records that transaction T has been aborted. ▪ checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was in an inconsistent state, and all the transactions were committed. A transaction T reaches its commit point when all its operations that access the database have been executed successfully i.e. the transaction has reached the point at which it will not abort (terminate without completing). Once committed, the transaction is permanently recorded in the database. Commitment always involves writing a commit entry to the log and writing the log to disk. At the time of a system crash, an item is searched back in the log for all transactions T that have written a start_transaction(T) entry into the log but have not written a commit(T) entry yet; these transactions may have to be rolled back to undo their effect on the database during the recovery process Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e. reverse the operations of a transaction. This involves examining a transaction for the log entry write_item (T, x, old_value, new_value) and set the value of item x in the database to old-value. There are two major techniques for recovery from non-catastrophic transaction failures: deferred updates and immediate updates. Deferred update – This technique does not physically update the database on disk until a transaction has reached its commit point. Before reaching the commit, all transaction updates are recorded in the local transaction workspace. If a transaction fails before reaching its commit point, it will not have changed the database in any way so UNDO is not needed. It may be necessary to REDO the effect of the operations that are recorded in the local transaction workspace, because their effect may not yet have been written in the database. Hence, a deferred update is also known as the No-undo/redo algorithm Immediate update – In the immediate update, the database may be updated by some operations of a transaction before the transaction reaches its commit point. However, these operations are recorded in a log on disk before they are applied to the database, making recovery still possible. If a transaction fails to reach its commit point, the effect of its 201 CU IDOL SELF LEARNING MATERIAL (SLM)

operation must be undone i.e. the transaction must be rolled back hence we require both undo and redo. This technique is known as undo/redo algorithm. Caching/Buffering – In this one or more disk pages that include data items to be updated are cached into main memory buffers and then updated in memory before being written back to disk. A collection of in-memory buffers called the DBMS cache is kept under control of DBMS for holding these buffers. A directory is used to keep track of which database items are in the buffer. A dirty bit is associated with each buffer, which is 0 if the buffer is not modified else 1 if modified. Shadow paging – It provides atomicity and durability. A directory with n entries is constructed, where the ith entry points to the ith database page on the link. When a transaction began executing the current directory is copied into a shadow directory. When a page is to be modified, a shadow page is allocated in which changes are made and when it is ready to become durable, all pages that refer to the original are updated to refer new replacement page. Some of the backup techniques are as follows: Full database backup – In this full database including data and database, Meta information needed to restore the whole database, including full-text catalogues are backed up in a predefined time series. Differential backup – It stores only the data changes that have occurred since the last full database backup. When the same data has changed many times since the last full database backup, a differential backup stores the most recent version of changed data. For this first, we need to restore a full database backup. Transaction log backup – In this, all events that have occurred in the database, like a record of every single statement executed is backed up. It is the backup of transaction log entries and contains all transaction that had happened to the database. Through this, the database can be recovered to a specific point in time. It is even possible to perform a backup from a transaction log if the data files are destroyed and not even a single committed transaction is lost. 15.2 DATABASE SECURITY Database security encompasses a range of security controls designed to protect the Database Management System (DBMS). The types of database security measures your business should use include protecting the underlying infrastructure that houses the database such as the network and servers), securely configuring the DBMS, and the access to the data itself. Database security controls Database security encompasses multiple controls, including system hardening, access, DBMS configuration, and security monitoring. These different security controls help to manage the circumventing of security protocols. System hardening and monitoring 202 CU IDOL SELF LEARNING MATERIAL (SLM)

The underlying architecture provides additional access to the DBMS. It is vital that all systems are patched consistently, hardened using known security configuration standards, and monitored for access, including insider threats. DBMS configuration It is critical that the DBMS be properly configured and hardened to take advantage of security features and limit privileged access that may cause a misconfiguration of expected security settings. Monitoring the DBMS configuration and ensuring proper change control processes helps ensure that the configuration stays consistent. Authentication Database security measures include authentication, the process of verifying if a user’s credentials match those stored in your database, and permitting only authenticated users’ access to your data, networks, and database platform. Access A primary outcome of database security is the effective limitation of access to your data. Access controls authenticate legitimate users and applications, limiting what they can access in your database. Access includes designing and granting appropriate user attributes and roles and limiting administrative privileges. Database auditing Monitoring (or auditing) actions as part of a database security protocol delivers centralized oversight of your database. Auditing helps to detect, deter, and reduce the overall impact of unauthorized access to your DBMS. Backups A data backup, as part of your database security protocol, makes a copy of your data and stores it on a separate system. This backup allows you to recover lost data that may result from hardware failures, data corruption, theft, hacking, or natural disasters. Encryption Database security can include the secure management of encryption keys, protection of the encryption system, management of a secure, off-site encryption backup, and access restriction protocols. Application security Database and application security framework measures can help protect against common known attacker exploits that can circumvent access controls, including SQL injection. Why is database security important? Safeguarding the data your company collects and manages is of utmost importance. Database security can guard against a compromise of your database, which can lead to financial loss, reputation damage, consumer confidence disintegration, brand erosion, and non-compliance of government and industry regulation. Database security safeguards defend against a myriad of security threats and can help protect your enterprise from: 203 CU IDOL SELF LEARNING MATERIAL (SLM)

▪ Deployment failure ▪ Excessive privileges ▪ Privilege abuse ▪ Platform vulnerabilities ▪ Unmanaged sensitive data ▪ Backup data exposure ▪ Weak authentication ▪ Database injection attacks 15.3 INTRODUCTION TO DISTRIBUTED DATA PROCESSING Data processing is ingesting massive amounts of data in the system from several different sources such as IoT devices, social platforms, satellites, wireless networks, software logs etc. & running the business logic/algorithms on it to extract meaningful information from it. Running algorithms on the data & extracting information from it is also known as Data Analytics. Data analytics helps businesses use the information extracted from the raw, unstructured, semi-structured data in terabytes, petabytes scale to create better products, understand what their customers want, understand their usage patterns, & subsequently evolve their service or the product. Distributed data processing is diverging massive amount of data to several different nodes running in a cluster for processing. All the nodes execute the task allotted parallelly, they work in conjunction with each other connected by a network. The entire set-up is scalable & highly available. Figure 15.1 Distributed Data Processing 204 CU IDOL SELF LEARNING MATERIAL (SLM)

Processing data in a distributed environment helps accomplish the task in a significantly less amount of time as opposed to when running on a centralized data processing system solely due to the reason that here the task is shared by many resources/machines & executed parallelly instead of being run synchronously arranged in a queue. Since the data is processed in lesser time, it is cost-effective for businesses & helps them to move fast. Running a workload in a distributed environment also makes it more scalable, elastic & available. There is no single point of failure. The workload can be scaled both horizontally & vertically. Data is made redundant & replicated across the cluster to avoid any sort of data loss. Working of Distributed Data Processing: In a distributed data processing system a massive amount of data flows through several different sources into the system. This process of data flow is known as Data ingestion. Once the data streams in there are different layers in the system architecture which break down the entire processing into several different parts. Figure 15.2 Distributed Data Processing Layers Data Collection & Preparation Layer: This layer takes care of collecting data from different external sources & preparing it to be processed by the system. When the data streams in it has no standard structure. It is raw, unstructured or semi- structured in nature. It may be a blob of text, audio, video, image format, tax return forms, insurance forms, medical bills etc. The task of the data preparation layer is to convert the data 205 CU IDOL SELF LEARNING MATERIAL (SLM)

into a consistent standard format, also to classify it as per the business logic to be processed by the system. The layer is intelligent enough to achieve all this without any sort of human intervention. Data Security Layer: Moving data is vulnerable to security breaches. The role of the data security layer is to ensure that the data transit is secure by watching over it throughout, applying security protocols, encryption & stuff. Data Storage Layer: Once the data streams in it has to be persisted. There are different approaches to do this. If the analytics is run on streaming data in real-time in-memory distributed caches are used to store & manage data. On the contrary, if the data is being processed in a traditional way like batch processing distributed databases built for handling big data are used to store stuff. Data Processing Layer: This is the layer contains logic which is the real deal, it is responsible for processing the data. The layer runs business logic on the data to extract meaningful information from it. Machine learning, predictive, descriptive, decision modelling is primarily used for this. Data Visualization Layer All the information extracted is sent to the data visualization layer which typically contains browser-based dashboards which display the information in the form of graphs, charts & infographics etc. Types of Distributed Data Processing There are primarily two types of it. Batch Processing & Real-time streaming data processing. Batch Processing Batch processing is the traditional data processing technique where chunks of data are streamed in batches & processed. The processing is either scheduled for a certain time of a day or happens in regular intervals or is random but not real-time. Real-time Streaming Data Processing In this type of data processing, data is processed in real-time as it streams in. Analytics is run on the data to get insights from it. A good use case of this is getting insights from sports data. As the game goes on the data ingested from social media & other sources is analysed in real-time to figure the viewers’ sentiments, players stats, predictions etc. Pros 1. Distributed data processing facilitates scalability, high availability, fault tolerance, replication, redundancy which is typically not available in centralized data processing systems. 206 CU IDOL SELF LEARNING MATERIAL (SLM)

2. Parallel distribution of work facilitates faster execution of work. Enforcing security, authentication & authorization workflows becomes easier as the system is more loosely coupled. Cons 1. Setting up & working with a distributed system is complex. Well, that’s expected having so many nodes working in conjunction with each other, maintaining a consistent shared state. 2. The management of distributed systems is complex. Since the machines are distributed it entails additional network latency which engineering teams have to deal with. Strong consistency of data is hard to maintain when everything is so distributed. 15.4 SUMMARY • Recovery techniques are heavily dependent upon the existence of a special file known as a system log. It contains information about the start and end of each transaction and any updates which occur in the transaction. The log keeps track of all transaction operations that affect the values of database items. This information is needed to recover from transaction failure. • Data Collection & Preparation Layer: This layer takes care of collecting data from different external sources & preparing it to be processed by the system. • Data Security Layer: Moving data is vulnerable to security breaches. The role of the data security layer is to ensure that the data transit is secure by watching over it throughout, applying security protocols, encryption & stuff. • Data Storage Layer: Once the data streams in it have to be persisted. There are different approaches to do this. • Data Processing Layer: This is the layer that contains logic which is the real deal, it is responsible for processing the data. • Data Visualization Layer: All the information extracted is sent to the data visualization layer which typically contains browser-based dashboards which display the information in the form of graphs, charts & infographics etc. 15.5 KEYWORDS ▪ Commit (T): This log entry records that transaction T has completed all accesses to the database successfully and its effect can be committed (recorded permanently) to the database. ▪ Abort (T): This records that transaction T has been aborted. ▪ Checkpoint: Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in a storage disk. 207 CU IDOL SELF LEARNING MATERIAL (SLM)

15.6 LEARNING ACTIVITY Blood Donation Management You can create a database project for a blood donation clinic. You should start by adding donor names and assign them unique IDs. Add their details and relevant information such as blood type, medical report, and contact number. Similarly, add patient names with unique IDs, details on their medical conditions, and blood types. After you’ve created a database of patients and donors, you can work on a database for the blood bank. There, you’ll have to add the name of the blood bank, its staff details, operating hours, and address. You can add the Recovery module also. ___________________________________________________________________________ ____________________________________________________________________ 15.7 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. What is meant by database recovery? 2. Define commit, abort and checkpoint? 3. What is meant by shadow paging? 4. Define data security layer? 5. Write short notes on Data Visualization Layer. Long Questions 1. Explain in detail about distributed data processing. 2. Discuss about database recovery. 3. Explain in details database recovery techniques. 4. What is meant by database recovery? Explain the process of database recovery. 5. List and briefly explain the types of Distributed Data Processing. B. Multiple choice Questions 1. In the ___________ scheme, a transaction that wants to update the database first creates a complete copy of the database. a. Shadow copy b. Shadow Paging c. Update log records d. All of these 208 CU IDOL SELF LEARNING MATERIAL (SLM)

2. The ____________ scheme uses a page table containing pointers to all pages; the page table itself and all updated pages are copied to a new location. a. Shadow copy b. Shadow Paging c. Update log records d. All of these 3. ____________ using a log record sets the data item specified in the log record to the old value. a. Deferred modification b. Late-modification c. Immediate modification d. Undo 4. Which of the following terms does refer to the correctness and completeness of the data in a database? a. Data security b. Data constraint c. Data independence d. Data integrity 5. A distributed database is which of the following? a. A single logical database that is spread to multiple locations and is interconnected by a network b. A loose collection of file that is spread to multiple locations and is interconnected by a network c. A single logical database that is limited to one location. d. A loose collection of file that is limited to one location. Answers 1.(a) 2. (b) 3. (d) 4. (d) 5. (a) 15.8 REFERENCES Text Books: • T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson Education, New Delhi. • T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi. • T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley Publications, New Delhi. 209 CU IDOL SELF LEARNING MATERIAL (SLM)

Reference Books: • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill, International Edition. • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications. 210 CU IDOL SELF LEARNING MATERIAL (SLM)


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook