▪ read_item(T, X): This log entry records that transaction T reads the value of database           item X.        ▪ write_item(T, X, old_value, new_value): This log entry records that transaction T           changes the value of the database item X from old_value to new_value. The old value           is sometimes known as a before an image of X, and the new value is known as an           afterimage of X.        ▪ commit(T): This log entry records that transaction T has completed all accesses to the           database successfully and its effect can be committed (recorded permanently) to the           database.        ▪ abort(T): This records that transaction T has been aborted.      ▪ checkpoint: Checkpoint is a mechanism where all the previous logs are removed from             the system and stored permanently in a storage disk. Checkpoint declares a point           before which the DBMS was in an inconsistent state, and all the transactions were           committed.  A transaction T reaches its commit point when all its operations that access the database have  been executed successfully i.e. the transaction has reached the point at which it will not abort  (terminate without completing). Once committed, the transaction is permanently recorded in  the database. Commitment always involves writing a commit entry to the log and writing the  log to disk. At the time of a system crash, an item is searched back in the log for all  transactions T that have written a start_transaction(T) entry into the log but have not written a  commit(T) entry yet; these transactions may have to be rolled back to undo their effect on the  database during the recovery process    Undoing – If a transaction crashes, then the recovery manager may undo transactions i.e.  reverse the operations of a transaction. This involves examining a transaction for the log  entry write_item (T, x, old_value, new_value) and set the value of item x in the database to  old-value. There are two major techniques for recovery from non-catastrophic transaction  failures: deferred updates and immediate updates.  Deferred update – This technique does not physically update the database on disk until a  transaction has reached its commit point. Before reaching the commit, all transaction updates  are recorded in the local transaction workspace. If a transaction fails before reaching its  commit point, it will not have changed the database in any way so UNDO is not needed. It  may be necessary to REDO the effect of the operations that are recorded in the local  transaction workspace, because their effect may not yet have been written in the database.  Hence, a deferred update is also known as the No-undo/redo algorithm  Immediate update – In the immediate update, the database may be updated by some  operations of a transaction before the transaction reaches its commit point. However, these  operations are recorded in a log on disk before they are applied to the database, making  recovery still possible. If a transaction fails to reach its commit point, the effect of its                                          201    CU IDOL SELF LEARNING MATERIAL (SLM)
operation must be undone i.e. the transaction must be rolled back hence we require both undo  and redo. This technique is known as undo/redo algorithm.  Caching/Buffering – In this one or more disk pages that include data items to be updated are  cached into main memory buffers and then updated in memory before being written back to  disk. A collection of in-memory buffers called the DBMS cache is kept under control of  DBMS for holding these buffers. A directory is used to keep track of which database items  are in the buffer. A dirty bit is associated with each buffer, which is 0 if the buffer is not  modified else 1 if modified.  Shadow paging – It provides atomicity and durability. A directory with n entries is  constructed, where the ith entry points to the ith database page on the link. When a  transaction began executing the current directory is copied into a shadow directory. When a  page is to be modified, a shadow page is allocated in which changes are made and when it is  ready to become durable, all pages that refer to the original are updated to refer new  replacement page.  Some of the backup techniques are as follows:  Full database backup – In this full database including data and database, Meta information  needed to restore the whole database, including full-text catalogues are backed up in a  predefined time series.  Differential backup – It stores only the data changes that have occurred since the last full  database backup. When the same data has changed many times since the last full database  backup, a differential backup stores the most recent version of changed data. For this first, we  need to restore a full database backup.  Transaction log backup – In this, all events that have occurred in the database, like a record  of every single statement executed is backed up. It is the backup of transaction log entries and  contains all transaction that had happened to the database. Through this, the database can be  recovered to a specific point in time. It is even possible to perform a backup from a  transaction log if the data files are destroyed and not even a single committed transaction is  lost.    15.2 DATABASE SECURITY    Database security encompasses a range of security controls designed to protect the Database  Management System (DBMS). The types of database security measures your business should  use include protecting the underlying infrastructure that houses the database such as the  network and servers), securely configuring the DBMS, and the access to the data itself.  Database security controls  Database security encompasses multiple controls, including system hardening, access, DBMS  configuration, and security monitoring. These different security controls help to manage the  circumventing of security protocols.  System hardening and monitoring                                          202    CU IDOL SELF LEARNING MATERIAL (SLM)
The underlying architecture provides additional access to the DBMS. It is vital that all  systems are patched consistently, hardened using known security configuration standards, and  monitored for access, including insider threats.  DBMS configuration  It is critical that the DBMS be properly configured and hardened to take advantage of  security features and limit privileged access that may cause a misconfiguration of expected  security settings. Monitoring the DBMS configuration and ensuring proper change control  processes helps ensure that the configuration stays consistent.  Authentication  Database security measures include authentication, the process of verifying if a user’s  credentials match those stored in your database, and permitting only authenticated users’  access to your data, networks, and database platform.  Access  A primary outcome of database security is the effective limitation of access to your data.  Access controls authenticate legitimate users and applications, limiting what they can access  in your database. Access includes designing and granting appropriate user attributes and roles  and limiting administrative privileges.  Database auditing  Monitoring (or auditing) actions as part of a database security protocol delivers centralized  oversight of your database. Auditing helps to detect, deter, and reduce the overall impact of  unauthorized access to your DBMS.  Backups  A data backup, as part of your database security protocol, makes a copy of your data and  stores it on a separate system. This backup allows you to recover lost data that may result  from hardware failures, data corruption, theft, hacking, or natural disasters.  Encryption  Database security can include the secure management of encryption keys, protection of the  encryption system, management of a secure, off-site encryption backup, and access restriction  protocols.  Application security  Database and application security framework measures can help protect against common  known attacker exploits that can circumvent access controls, including SQL injection.    Why is database security important?  Safeguarding the data your company collects and manages is of utmost importance. Database  security can guard against a compromise of your database, which can lead to financial loss,  reputation damage, consumer confidence disintegration, brand erosion, and non-compliance  of government and industry regulation.  Database security safeguards defend against a myriad of security threats and can help protect  your enterprise from:                                          203    CU IDOL SELF LEARNING MATERIAL (SLM)
▪ Deployment failure      ▪ Excessive privileges      ▪ Privilege abuse      ▪ Platform vulnerabilities      ▪ Unmanaged sensitive data      ▪ Backup data exposure      ▪ Weak authentication      ▪ Database injection attacks    15.3 INTRODUCTION TO DISTRIBUTED DATA PROCESSING    Data processing is ingesting massive amounts of data in the system from several different  sources such as IoT devices, social platforms, satellites, wireless networks, software logs etc.  & running the business logic/algorithms on it to extract meaningful information from it.  Running algorithms on the data & extracting information from it is also known as Data  Analytics.  Data analytics helps businesses use the information extracted from the raw, unstructured,  semi-structured data in terabytes, petabytes scale to create better products, understand what  their customers want, understand their usage patterns, & subsequently evolve their service or  the product.  Distributed data processing is diverging massive amount of data to several different nodes  running in a cluster for processing. All the nodes execute the task allotted parallelly, they  work in conjunction with each other connected by a network. The entire set-up is scalable &  highly available.    Figure 15.1 Distributed Data Processing      204           CU IDOL SELF LEARNING MATERIAL (SLM)
Processing data in a distributed environment helps accomplish the task in a significantly less  amount of time as opposed to when running on a centralized data processing system solely  due to the reason that here the task is shared by many resources/machines & executed  parallelly instead of being run synchronously arranged in a queue.  Since the data is processed in lesser time, it is cost-effective for businesses & helps them to  move fast. Running a workload in a distributed environment also makes it more scalable,  elastic & available. There is no single point of failure. The workload can be scaled both  horizontally & vertically.  Data is made redundant & replicated across the cluster to avoid any sort of data loss.    Working of Distributed Data Processing:  In a distributed data processing system a massive amount of data flows through several  different sources into the system. This process of data flow is known as Data ingestion.  Once the data streams in there are different layers in the system architecture which break  down the entire processing into several different parts.                              Figure 15.2 Distributed Data Processing Layers    Data Collection & Preparation Layer: This layer takes care of collecting data from  different external sources & preparing it to be processed by the system.  When the data streams in it has no standard structure. It is raw, unstructured or semi-  structured in nature. It may be a blob of text, audio, video, image format, tax return forms,  insurance forms, medical bills etc. The task of the data preparation layer is to convert the data                                          205    CU IDOL SELF LEARNING MATERIAL (SLM)
into a consistent standard format, also to classify it as per the business logic to be processed  by the system.  The layer is intelligent enough to achieve all this without any sort of human intervention.    Data Security Layer: Moving data is vulnerable to security breaches. The role of the data  security layer is to ensure that the data transit is secure by watching over it throughout,  applying security protocols, encryption & stuff.  Data Storage Layer: Once the data streams in it has to be persisted. There are different  approaches to do this.  If the analytics is run on streaming data in real-time in-memory distributed caches are used to  store & manage data.  On the contrary, if the data is being processed in a traditional way like batch processing  distributed databases built for handling big data are used to store stuff.  Data Processing Layer: This is the layer contains logic which is the real deal, it is  responsible for processing the data.  The layer runs business logic on the data to extract meaningful information from it. Machine  learning, predictive, descriptive, decision modelling is primarily used for this.  Data Visualization Layer  All the information extracted is sent to the data visualization layer which typically contains  browser-based dashboards which display the information in the form of graphs, charts &  infographics etc.    Types of Distributed Data Processing  There are primarily two types of it. Batch Processing & Real-time streaming data processing.  Batch Processing  Batch processing is the traditional data processing technique where chunks of data are  streamed in batches & processed. The processing is either scheduled for a certain time of a  day or happens in regular intervals or is random but not real-time.    Real-time Streaming Data Processing  In this type of data processing, data is processed in real-time as it streams in. Analytics is run  on the data to get insights from it.  A good use case of this is getting insights from sports data. As the game goes on the data  ingested from social media & other sources is analysed in real-time to figure the viewers’  sentiments, players stats, predictions etc.    Pros      1. Distributed data processing facilitates scalability, high availability, fault tolerance,          replication, redundancy which is typically not available in centralized data processing          systems.                                          206    CU IDOL SELF LEARNING MATERIAL (SLM)
2. Parallel distribution of work facilitates faster execution of work. Enforcing security,          authentication & authorization workflows becomes easier as the system is more          loosely coupled.    Cons      1. Setting up & working with a distributed system is complex. Well, that’s expected          having so many nodes working in conjunction with each other, maintaining a          consistent shared state.      2. The management of distributed systems is complex. Since the machines are          distributed it entails additional network latency which engineering teams have to deal          with.          Strong consistency of data is hard to maintain when everything is so distributed.    15.4 SUMMARY        • Recovery techniques are heavily dependent upon the existence of a special file known           as a system log. It contains information about the start and end of each transaction and           any updates which occur in the transaction. The log keeps track of all transaction           operations that affect the values of database items. This information is needed to           recover from transaction failure.        • Data Collection & Preparation Layer: This layer takes care of collecting data from           different external sources & preparing it to be processed by the system.        • Data Security Layer: Moving data is vulnerable to security breaches. The role of the           data security layer is to ensure that the data transit is secure by watching over it           throughout, applying security protocols, encryption & stuff.        • Data Storage Layer: Once the data streams in it have to be persisted. There are           different approaches to do this.        • Data Processing Layer: This is the layer that contains logic which is the real deal, it is           responsible for processing the data.        • Data Visualization Layer: All the information extracted is sent to the data           visualization layer which typically contains browser-based dashboards which display           the information in the form of graphs, charts & infographics etc.    15.5 KEYWORDS        ▪ Commit (T): This log entry records that transaction T has completed all accesses to           the database successfully and its effect can be committed (recorded permanently) to           the database.        ▪ Abort (T): This records that transaction T has been aborted.      ▪ Checkpoint: Checkpoint is a mechanism where all the previous logs are removed             from the system and stored permanently in a storage disk.                                                        207                  CU IDOL SELF LEARNING MATERIAL (SLM)
15.6 LEARNING ACTIVITY    Blood Donation Management  You can create a database project for a blood donation clinic. You should start by adding  donor names and assign them unique IDs. Add their details and relevant information such as  blood type, medical report, and contact number. Similarly, add patient names with unique  IDs, details on their medical conditions, and blood types.  After you’ve created a database of patients and donors, you can work on a database for the  blood bank. There, you’ll have to add the name of the blood bank, its staff details, operating  hours, and address.  You can add the Recovery module also.    ___________________________________________________________________________  ____________________________________________________________________    15.7 UNIT END QUESTIONS    A. Descriptive Questions    Short Questions        1. What is meant by database recovery?      2. Define commit, abort and checkpoint?      3. What is meant by shadow paging?      4. Define data security layer?      5. Write short notes on Data Visualization Layer.    Long Questions        1. Explain in detail about distributed data processing.      2. Discuss about database recovery.      3. Explain in details database recovery techniques.      4. What is meant by database recovery? Explain the process of database recovery.      5. List and briefly explain the types of Distributed Data Processing.    B. Multiple choice Questions        1. In the ___________ scheme, a transaction that wants to update the database first           creates a complete copy of the database.               a. Shadow copy               b. Shadow Paging               c. Update log records               d. All of these                                          208    CU IDOL SELF LEARNING MATERIAL (SLM)
2. The ____________ scheme uses a page table containing pointers to all pages; the      page table itself and all updated pages are copied to a new location.           a. Shadow copy           b. Shadow Paging           c. Update log records           d. All of these    3. ____________ using a log record sets the data item specified in the log record to the      old value.           a. Deferred modification           b. Late-modification           c. Immediate modification           d. Undo    4. Which of the following terms does refer to the correctness and completeness of the      data in a database?           a. Data security           b. Data constraint           c. Data independence           d. Data integrity        5. A distributed database is which of the following?               a. A single logical database that is spread to multiple locations and is                    interconnected by a network               b. A loose collection of file that is spread to multiple locations and is                    interconnected by a network               c. A single logical database that is limited to one location.               d. A loose collection of file that is limited to one location.    Answers  1.(a) 2. (b) 3. (d) 4. (d) 5. (a)    15.8 REFERENCES    Text Books:      • T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson           Education, New Delhi.      • T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi.      • T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley           Publications, New Delhi.                                          209    CU IDOL SELF LEARNING MATERIAL (SLM)
Reference Books:      • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts,           McGraw-Hill, International Edition.      • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications.                                          210    CU IDOL SELF LEARNING MATERIAL (SLM)
                                
                                
                                Search
                            
                            Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
 
                    