Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore CU-BCA-SEM-III-Database Management System- Second Draft-converted

CU-BCA-SEM-III-Database Management System- Second Draft-converted

Published by Teamlease Edtech Ltd (Amita Chitroda), 2021-04-29 05:40:06

Description: CU-BCA-SEM-III-Database Management System- Second Draft-converted

Search

Read the Text Version

Figure 11.5: Example for EER-to-relational mapping • option 8B: multiple relations-subclass relations only • create a relation L i for each subclass Si, with the attributes Attrs(Li) = {attributes of Si} {k, a1, …, an} and PK(Li) = k • only works for a specialization whose subclasses are total 151 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 11.6: Example for EER-to-relational mapping • option 8C: single relation with one type attribute • create a single relation L with attributes Attrs(L) = {k, a1, …, an} {attributes of S1} … {attributes of Sm} {t} and PK(L) = k • the attribute t is called a type attribute that indicates the subclass to which each tuple belongs • works only for a specialization whose subclasses are disjoint • option 8D: single relation with multiple type attributes • Create a single relation schema L with attributes Attrs(L) = {k, a1, …, an} U…U{attributes of S1} U… U{attributes of Sm} U{t1, …, tm} and PK(L) = k • each ti is a Boolean type attribute indicating whether a tuple belongs to subclass Si • works for a specialization whose subclasses are overlapping 152 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 11.7: Example for EER-to-relational mapping Mapping of shared subclasses • shared subclass: a subclass of several superclasses, indicating multiple inheritance • apply any of the options in step 8 to a shared subclass Figure 11.8: Example for mapping of shared subclasses 153 Mapping of categories Category? CU IDOL SELF LEARNING MATERIAL (SLM)

A subclass of the union of two or more superclasses that can have different keys because they can be of different entity types Step 9: mapping of categories • mapping a category whose defining superclasses have different keys • specify a new key attribute, called a surrogate key • include the surrogate key attribute as foreign key in each relation • corresponding to a superclass of the category • e.g., OWNER category • mapping a category whose superclasses have the same key • no need for a surrogate key • e.g., REGISTERED_VEHICLE Example for EER category mapping 154 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 11.9: Example for EER category mapping Discussion and Summary of Mapping for ER Model Constructs One of the main points to note in a relational schema, in contrast to an ER schema, is that relationship types are not represented explicitly; instead, they are represented by having two attributes A and B, one a primary key and the other a foreign key (over the same domain) included in two relations S and T. Two tuples in S and T are related when they have the same value for A and B. By using the EQUIJOIN operation (or NATURAL JOIN if the two join attributes have the same name) over S.A and T.B, we can combine all pairs of related tuples from S and T and materialize the relationship. When a binary 1:1 or 1: N relationship type is involved, a single join operation is usually needed. For a binary M: N relationship type, two 155 CU IDOL SELF LEARNING MATERIAL (SLM)

join operations are needed, whereas for n-ary relationship types, n joins are needed to fully materialize the relationship instances. Table 11.1 Correspondence between ER and Relational Models For example, to form a relation that includes the employee name, project name, and hours that the employee works on each project, we need to connect each EMPLOYEE tuple to the related PROJECT tuples via the WORKS_ON relation in Figure 11.2. Hence, we must apply the EQUIJOIN operation to the EMPLOYEE and WORKS_ON relations with the join condition Ssn = Essn, and then apply another EQUIJOIN operation to the resulting relation and the PROJECT relation with join condition Pno = Pnumber. In general, when multiple relationships need to be traversed, numerous join operations must be specified. A relational database user must always be aware of the foreign key attributes in order to use them correctly in combining related tuples from two or more relations. This is sometimes considered to be a drawback of the relational data model, because the foreign key/primary key correspondences are not always obvious upon inspection of relational schemas. If an EQUIJOIN is performed among attributes of two relations that do not represent a foreign key/primary key relationship, the result can often be meaningless and may lead to spurious data. For example, the reader can try joining the PROJECT and DEPT_LOCATIONS relations on the condition Dlocation = Plocation and examine the result In the relational schema we create a separate relation for each multivalued attribute. For a particular entity with a set of values for the multivalued attribute, the key attribute value of the entity is repeated once for each value of the multivalued attribute in a separate tuple because the basic relational model does not allow multiple values (a list, or a set of values) for an attribute in a single tuple. For example, because department 5 has three locations, three 156 CU IDOL SELF LEARNING MATERIAL (SLM)

tuples exist in the DEPT_LOCATIONS relation; each tuple specifies one of the locations.In our example, we apply EQUIJOIN to DEPT_LOCATIONS and DEPARTMENT on the Dnumber attribute to get the values of all locations along with other DEPARTMENT attributes. In the resulting relation, the values of the other DEPARTMENT attributes are repeated in separate tuples for every location that a department has. The basic relational algebra does not have a NEST or COMPRESS operation that would produce a set of tuples of the form {<‘1’, ‘Houston’>, <‘4’, ‘Stafford’>, <‘5’, {‘Bellaire’, ‘Sugarland’, ‘Houston’}>} from the DEPT_LOCATIONS relation. This is a serious drawback of the basic normalized or flat version of the relational model. The object data model and object-relational systems do allow multivalued attributes. 11.2 SUMMARY ▪ We showed how a conceptual schema design in the ER model can be mapped to a relational database schema. ▪ An algorithm for ER-to-relational mapping was given and illustrated by examples from the COMPANY database. ▪ summarized the correspondences between the ER and relational model constructs ▪ and constraints. Next, we added additional steps to the algorithm in Mapping EER Model Constructs to Relations for mapping the constructs from the EER model into the relational model. ▪ Similar algorithms are incorporated into graphical database design tools to create a relational schema from a conceptual schema design automatically. 11.3 KEYWORDS • ER Model: Entity Type • Relational Model: Entity Relation • EER: Extended Entity-Relationship (EE-R) Model 11.4 LEARNING ACTIVITY 1. Consider the ER design for the CONFERENCE_REVIEW database that was modelled using a tool like ERwin or Rational Rose. Using the SQL schema generation feature of the modeling tool, generate the SQL schema for an any database. ___________________________________________________________________________ ____________________________________________________________________ 157 CU IDOL SELF LEARNING MATERIAL (SLM)

2. Consider the EER design for the GRADE_BOOK database that was modelled using a tool like ERwin or Rational Rose. Using the SQL schema generation feature of the modeling tool, generate the SQL schema for an Oracle database. ___________________________________________________________________________ ____________________________________________________________________ 11.5 UNIT END QUESTIONS A. Descriptive Questions. Short Questions 1. What is meant by merged relation approach? 2. Define EQUIJOIN? 3. Write short notes on Natural Join. 4. What is binary relationship? 5. Define 1: N ary relation? Long Questions 1. Is it possible to successfully map a binary M: N relationship type without requiring a new relation? Why or why not? 2. Consider the below EER diagram for a car dealer. Map the EER schema into a set of relations. For the VEHICLE to CAR/TRUCK/SUV generalization show the relational schema design under each of those options. 3. Discuss the options for mapping EER model constructs to relations. 4. Discuss the correspondences between the ER model constructs and the relational model constructs. Show how each ER model construct can be mapped to the relational model and discuss any alternative mappings. 5. Explain in detail about ER-to-Relational Mapping Algorithm. B. Multiple choice Questions 158 CU IDOL SELF LEARNING MATERIAL (SLM)

1. Which of the following gives a logical structure of the database graphically? a. Entity-relationship diagram b. Entity diagram c. Database diagram d. Architectural representation 2. The entity relationship set is represented in E-R diagram as________. a. Double diamonds b. Undivided rectangles c. Dashed lines d. Diamond 3. The most common type of relationship encountered in data modelling is __________ relationship? a. Unary b. Binary c. Ternary d. Associative 4. Which is not a basic construct of an E-R model? a. Relationships b. Entity types c. Identifiers d. Attributes 5. _________ specifies the number of instances of one entity that can or must be associated with each instance of another? a. Cardinality constraint b. Entity instance c. Associative entity d. Multi-valued attribute Answer 1.(a) 2. (d) 3. (b) 4. (d) 5. (a) 11.6 REFERENCES Text Books: • T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson Education, New Delhi. 159 CU IDOL SELF LEARNING MATERIAL (SLM)

• T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi. • T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley Publications, New Delhi. Reference Books: • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill, International Edition. • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications. 160 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 12: DATA NORMALIZATION 1 Structure 12.0 Learning objective 12.1 Functional Dependencies 12.2 Summary 12.3 Keywords 12.4 Learning Activity 12.5 Unit End Questions 12.6 References 12.0 LEARNING OBJECTIVE After studying this unit, student will be able to ▪ Explain the purpose of normalization. ▪ Describe types of functional dependencies and its uses. 12.1 FUNCTIONAL DEPENDENCIES Functional Dependence (FD) is a constraint in a Database Management System that specifies the relationship of one attribute to another (DBMS). Functional Dependency aids in the preservation of storage data integrity. It is critical for distinguishing between good and poor database architecture. A functional dependency is denoted by an arrow \"→\". The functional dependency of X on Y is represented by X → Y. Let's understand Functional Dependency in DBMS with example. Example: Table 12.1: Employee Table Employee number Employee Name Salary City 1 Dana 50000 San Francisco 2 Francis 38000 London 3 Andrew 25000 Tokyo In this example, if we know the value of Employee number, we can obtain Employee Name, city, salary, etc. By this, we can say that the city, Employee Name, and salary are functionally depended on Employee number. Closures of a set of functional dependencies 161 CU IDOL SELF LEARNING MATERIAL (SLM)

A Closure is a set of FDs is a set of all possible FDs that can be derived from a given set of FDs. It is also referred as a Complete set of FDs. If F is used to donate the set of FDs for relation R, then a closure of a set of FDs implied by F is denoted by F+. Let's consider the set F of functional dependencies given below: F = {A -> B, B -> C, C -> D} from F, it is possible to derive following dependencies. A -> A ...By using Rule-4, Self-Determination. A -> B ...Already given in F.A -> C ...By using rule-3, Transitivity. A -> D ...By using rule-3, Transitivity. Now, by applying Rule-6 Union, it is possible to derive A+ -> ABCD and it can be denoted using A -> ABCD. All such type of FDs derived from each FD of F form a closure of F. Steps to determine F+example: ▪ Determine each set of attributes X that appears as a left hand side of some FD in F. ▪ Determine the set X+ of all attributes that are dependent on X, as given in above example. ▪ In other words, X+ represents a set of attributes that are functionally determined by X based on F. And, X+ is called the Closure of X under F. ▪ All such sets of X+, in combine, Form a closure of F. Rules of Functional Dependencies Below are the Three most important rules for Functional Dependency in Database: ▪ Reflexive rule –. If X is a set of attributes and Y is_subset_of X, then X holds a value of Y. ▪ Augmentation rule: When x -> y holds, and c is attribute set, then ac -> bc also holds. That is adding attributes which do not change the basic dependencies. ▪ Transitivity rule: This rule is very much similar to the transitive rule in algebra if x -> y holds and y -> z holds, then x -> z also holds. X -> y is called as functionally that determines y. Types of Functional Dependencies in DBMS 1. Multivalued Dependency 2. Trivial Functional Dependency 3. Non-Trivial Functional Dependency 4. Transitive Dependency 1. Multivalued Dependency in DBMS Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes in a single table. A multivalued dependency is a complete constraint between two sets of attributes in a relation. It requires that certain tuples be present in a relation. Example: Table 12.2: Car Table 162 CU IDOL SELF LEARNING MATERIAL (SLM)

Car_model Maf_year Color H001 2017 Metallic H001 2017 Green H005 2018 Metallic H005 2018 H010 2015 Blue H033 2012 Metallic Gray In this example, maf_year and color are independent of each other but dependent on car_model. In this example, these two columns are said to be multivalue dependent on car_model. This dependence can be represented like this: car_model -> maf_year car_model-> colour Trivial Functional Dependency in DBMS The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in that attribute. So, X -> Y is a trivial functional dependency if Y is a subset of X. Table 12.3: Trivial Functional Dependency Relation Emp_id Emp_name AS555 Harry AS811 George AS999 Kevin Consider this table of with two columns Emp_id and Emp_name. {Emp_id, Emp_name} -> Emp_id is a trivial functional dependency as Emp_id is a subset of {Emp_id,Emp_name}. Non-Trivial Functional Dependency in DBMS Functional dependency which also known as a nontrivial dependency occurs when A->B holds true where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is considered as a non-trivial dependency. 163 CU IDOL SELF LEARNING MATERIAL (SLM)

Table 12.4: Non-Trivial Functional Dependency Relation Company CEO Age Microsoft Satya Nadella 51 Google Sundar Pichai 46 57 Apple Tim Cook Example: (Company} -> {CEO} (if we know the Company, we know the CEO name) But CEO is not a subset of Company, and hence it's non-trivial functional dependency. Transitive Dependency in DBMS A Transitive Dependency is a type of functional dependency which happens when t is indirectly formed by two functional dependencies. Let's understand with the following Transitive Dependency Example. Example: Table 12.5: Transitive Functional Dependency Relation Company CEO Age Microsoft Satya Nadella 51 Google Sundar Pichai 46 Alibaba Jack Ma 54 {Company} -> {CEO} (if we know the compay, we know its CEO's name) {CEO} -> {Age} If we know the CEO, we know the Age Therefore according to the rule of rule of transitive dependency: {Company} -> {Age} should hold, that makes sense because if we know the company name, we can know his age. Advantages of Functional Dependency ▪ Functional Dependency avoids data redundancy. Therefore, same data do not repeat at multiple locations in that database ▪ It helps you to maintain the quality of data in the database ▪ It helps you to defined meanings and constraints of databases 164 CU IDOL SELF LEARNING MATERIAL (SLM)

▪ It helps you to identify bad designs ▪ It helps you to find the facts regarding the database design 12.2 SUMMARY ▪ Functional Dependency is when one attribute determines another attribute in a DBMS system. ▪ Axiom, Decomposition, Dependent, Determinant, Union are key terms for functional dependency ▪ Four types of functional dependency are 1) Multivalued 2) Trivial 3) Non-trivial 4) Transitive ▪ Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes in a single table ▪ The Trivial dependency occurs when a set of attributes which are called a trivial if the set of attributes are included in that attribute ▪ Nontrivial dependency occurs when A->B holds true where B is not a subset of A ▪ A transitive is a type of functional dependency which happens when it is indirectly formed by two functional dependencies ▪ Transitive dependency can only occur in a relation of three or more attributes. 12.3 KEYWORDS • Dependent: It is displayed on the right side of the functional dependency diagram. • Union: It suggests that if two tables are separate, and the PK is the same • Determinant: It is displayed on the left side of the functional dependency Diagram. • Axiom: Axioms is a set of inference rules used to infer all the functional dependencies on a relational database. • Decomposition: It is a rule that suggests if you have a table that appears to contain two entities which are determined by the same primary key then you should consider breaking them up into two different tables. 12.4 LEARNING ACTIVITY You can build a solution that saves student records for an educational institution. Handling student records is no easy feat. You need to keep their name, subjects, fees, any provision of concession, and their academic progress. A DBMS-based solution will allow the client to save a lot of time and effort. Your design goal should be to have separate files for each student where the data will store information about the student. You can start by adding the following sections: ▪ Student’s Name ▪ Fees 165 CU IDOL SELF LEARNING MATERIAL (SLM)

▪ Subjects (or Stream) ▪ Grades (or Marks) ▪ Concessions (or Scholarship) ▪ Additional Information You can write all the dependency in this project and mention the relation ___________________________________________________________________________ ____________________________________________________________________ 12.5 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. What is functional dependency? 2. Define multivalued dependency? 3. What is meant by transitive function dependency? 4. List out the types of functional dependency? 5. What are the advantages of functional dependency? Long Questions 1. What is functional dependency? Explain FD and multivalued functional dependency with example. 2. Explain various types of Functional dependency. 3. List various functional dependency. Explain any one with example. 4. Discuss transitive dependency with suitable example. 5. Explain the following: a. Full functional dependency b. Partial functional dependency c. Multivalue dependency d. Transitive dependency B. Multiple choice Questions 1. A functional dependency f on R is _______ by a set of functional dependencies F on r if every instance of r(R) that satisfies f also satisfies F. a. Logically Defined b. Logically Derived c. Logically implied d. None of these 166 CU IDOL SELF LEARNING MATERIAL (SLM)

2. If a functional dependency is reflexive, B is a subset of A and A is the set of attributes, then a. B→A holds b. A→B holds c. AB→C holds d. None of these 3. If A→B, A→ C then which of the following is true? a. A→BC b. A→B c. A→C d. All of these 4. If B is an attribute and A→B, Then B is said to be ________ by a. a. Logically implied b. Functionally implied c. Logically determined d. Functionally determined 5. If F is a set of functional dependencies, then the closure of F is denoted by? a. F* b. Fo c. F+ d. F Answer 1.(c) 2. (b) 3. (d) 4. (d) 5. (c) 12.6 REFERENCES Text Books: • T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson Education, New Delhi. • T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi. • T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley Publications, New Delhi. Reference Books: • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill, International Edition. • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications. 167 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 13: DATA NORMALIZATION 2 Structure Learning objective 13.0 First Normal Form 13.1 Second Normal Form 13.2 Third Normal Form 13.3 Summary 13.4 Keywords 13.5 Learning Activity 13.6 Unit End Questions 13.7 References 13.8 13.0 LEARNING OBJECTIVE After studying this unit, student will be able to ▪ Describe database normalization. ▪ Elaborate the different types of normalization methods (1NF,2NF and 3NF) ▪ Explain the database design by normalization. 13.1 FIRST NORMAL FORM Normalization is a method of organizing the data in the database which helps you to avoid data redundancy, insertion, update & deletion anomaly. It is a process of analysing the relation schemas based on their different functional dependencies and primary key. Normalization is inherent to relational database theory. It may have the effect of duplicating the same data within the database which may result in the creation of additional tables. First Normal Form (1NF): The first normal form expects you to follow a few simple rules while designing your database, and they are: Rule 1: Single Valued Attributes Each column of your table should be single valued which means they should not contain multiple values. We will explain this with help of an example later, let's see the other rules for now. Rule 2: Attribute Domain should not change This is more of a \"Common Sense\" rule. In each column the values stored must be of the same kind or type. Rule 3: Unique name for Attributes/Columns This rule expects that each column in a table should have a unique name. This is to avoid confusion at the time of retrieving data or performing any other operation on the stored data. 168 CU IDOL SELF LEARNING MATERIAL (SLM)

If one or more columns have same name, then the DBMS system will be left confused. Rule 4: Order doesn't matters This rule says that the order in which you store the data in your table doesn't matter. Table 13.1: Student Table Table 13.2: Student Table with First Normal Form Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2. 13.2 SECOND NORMAL FORM Second Normal Form (2NF): For a table to be in the Second Normal Form, 1. It should be in the First Normal form. 2. And, it should not have Partial Dependency. To be in second normal form, a relation must be in first normal form and relation must not contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table. Dependency: Let's take an example of a Student table with columns student_id, name, reg_no (registration number), branch and address (student's home address). Table 13.3: Student Table student_id name reg_no branch Address In this table, student_id is the primary key and will be unique for every row, hence we can use student_id to fetch any row of data from this table Even for a case, where student names are same, if we know the student_id we can easily fetch the correct record. Table 13.4: Student Table Fetch data 169 CU IDOL SELF LEARNING MATERIAL (SLM)

student_id name reg_no branch Address 10 Akon 07-WY CSE Kerala 11 Akon 08-WY IT Gujarat Hence, we can say a Primary Key for a table is the column or a group of columns (composite key) which can uniquely identify each record in the table. I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask for name of student with student_id 10 or 11, I will get it. So all I need is student_id and every other column depends on it, or can be fetched using it. This is Dependency and we also call it Functional Dependency. Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is called partial dependency. Table 13.5: Student Table with Partial Dependency STUD_NO COURSE_NO COURSE_FEE 1 C1 1000 2 C2 1500 1 C4 2000 4 C3 1000 4 C1 1000 2 C5 2000 Here, COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO; COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO; COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO; Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key {STUD_NO, COURSE_NO}; But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency and so this relation is not in 2NF. 170 CU IDOL SELF LEARNING MATERIAL (SLM)

To convert the above relation to 2NF, we need to split the table into two tables such as: Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE Table 13.6: Table 1 STUD_NO COURSE_NO 1 C1 2 C2 1 C4 4 C3 4 C1 Table 13.7: Table 2 COURSE_NO COURSE_FEE C1 C2 1000 C3 1500 C4 1000 C5 2000 2000 NOTE: 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are 100 students taking C1 course, we don’t need to store its Fee as 1000 for all the 100 records, instead once we can store it in the second table as the course fee for C1 is 1000. Example 2 – Consider following functional dependencies in relation R (A, B, C, D) AB -> C [A and B together determine C] BC -> D [B and C together determine D] In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any proper subset of AB doesn’t determine any non-prime attribute. 13.3 THIRD NORMAL FORM A table is said to be in the Third Normal Form when, 171 CU IDOL SELF LEARNING MATERIAL (SLM)

1. It is in the Second Normal form. 2. And, it doesn't have Transitive Dependency. A relation is in third normal form, if there is no transitive dependency for non-prime attributes as well as it is in second normal form. A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency X –> Y X is a super key. Y is a prime attribute (each element of Y is part of some candidate key). Table 13.8: Student Table with Third Normal Form Transitive dependency – If A->B and B->C are two FDs then A->C is called transitive dependency. Example – In relation STUDENT given in Table, FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO -> STUD_AGE} Candidate Key: {STUD_NO} For this relation in table 13.8, STUD_NO -> STUD_STATE and STUD_STATE -> STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal form. To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as: STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE) STATE_COUNTRY (STATE, COUNTRY) Example 2 – Consider relation R (A, B, C, D, E) A -> BC, CD -> E, B -> D, E -> A All possible candidate keys in above relation are {A, E, CD, BC} All attribute is on right sides of all functional dependencies are prime. Converting from 2NF to 3NF: • Identify the primary key in the 2NF relation. • Identify functional dependencies in the relation. 172 CU IDOL SELF LEARNING MATERIAL (SLM)

• If transitive dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their dominant. Property_Owner to 3NF Relations Table 13.9 Property_owner Relation Property_No P_Address Rent Owner_No Owner_Name Transitive Dependency: Customer_No+Property_No)->Owner_No Owner_No ->OName Property_for_Rent Table 13.10 Property Table Property_No P_Address Rent Owner_No C045 PG34 Nerul,Navi 450 C093 Mumbai C093 PG78 Nerul,NaviMumbai 500 PG36 Kalyan,Thane 350 Table 13.11 Owner Table Owner_No Owner_Name C045 Sanjay More C093 Mahavir Jain Process of Decomposition Customer_ Rental1NF Property_O Customer Rental wner 2NF Property_for_ Owner 3NF Rent Fig.13.1: Process of Decomposition 173 CU IDOL SELF LEARNING MATERIAL (SLM)

Boyce and Codd Normal Form (BCNF): Boyce and Codd Normal Form is a higher version of the Third Normal form. This form deals with certain type of anomaly that is not handled by 3NF. A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following conditions must be satisfied: 1. R must be in 3rd Normal Form and, for each functional dependency (X → Y), X should be a super Key. 2. A relation R is in BCNF if R is in Third Normal Form and for every FD, LHS is super key. A relation is in BCNF if in every non-trivial functional dependency X –> Y, X is a super key. Example – Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D, AC->BE, B->E} ▪ Step 1. As we can see, (AC)+ = {A, C, B, E, D} but none of its subset can determine all attribute of relation, So AC will be candidate key. A or C can’t be derived from any other attribute of the relation, so there will be only 1 candidate key {AC}. ▪ Step 2. Prime attributes are those attribute which are part of candidate key {A, C} in this example and others will be non-prime {B, D, E} in this example. ▪ Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or composite attribute. The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset of candidate key AC). The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be super key or RHS should be prime attribute. So the highest normal form of relation will be 2nd Normal form. Example 2 –For example consider relation R (A, B, C) A -> BC, B -> A and B both are super keys so above relation is in BCNF. Table 13.12 EMPLOYEE Table EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO 174 CU IDOL SELF LEARNING MATERIAL (SLM)

264 India Designing D394 283 264 India Testing D394 300 364 UK Stores D283 232 364 UK Developing D283 549 In the above table Functional dependencies are as follows: EMP_ID → EMP_COUNTRY EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} Candidate key: {EMP-ID, EMP-DEPT} The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. To convert the given table into BCNF, we decompose it into three tables: Table 13.13: EMP_COUNTRY Table EMP_ID EMP_COUNTRY 264 India 264 India Table 13.14 EMP_DEPT Table EMP_DEPT DEPT_TYPE EMP_DEPT_NO Designing Testing D394 283 Stores D394 300 D283 232 Developing 549 D283 Table 13.15 EMP_DEPT_MAPPING Table EMP_ID EMP_DEPT D394 283 175 CU IDOL SELF LEARNING MATERIAL (SLM)

D394 300 D283 232 D283 549 Functional dependencies: EMP_ID → EMP_COUNTRY EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} Candidate keys: For the first table: EMP_ID For the second table: EMP_DEPT For the third table: {EMP_ID, EMP_DEPT} 13.4 SUMMARY ▪ Database designing is critical to the successful implementation of a database management system that meets the data requirements of an enterprise system. ▪ Normalization Process in DBMS helps produce database systems that are cost-effective and have better security models. ▪ Functional dependencies are a very important component of the normalize data process. ▪ Most database systems are normalized database up to the third normal forms. ▪ Primary Key: ▪ A primary key cannot be NULL ▪ A primary key value must be unique ▪ The primary key values should rarely be changed ▪ The primary key must be given a value when a new record is inserted. ▪ A composite key is a primary key composed of multiple columns used to identify a record uniquely. ▪ A foreign key helps connect table and references a primary key. 13.5 KEYWORDS • A primary: is a single column value used to identify a database record uniquely. • A composite key: is a primary key composed of multiple columns used to identify a record uniquely • 1NF: A relation is in 1NF if it contains an atomic value. • 2NF: A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key. • 3NF: A relation will be in 3NF if it is in 2NF and no transition dependency exists. 176 CU IDOL SELF LEARNING MATERIAL (SLM)

13.6 LEARNING ACTIVITY Consider the following Relational Database. Doctor (dno, dname, dcity) Patient (opdno, pat_name, addr,disease) The relation between patient and Doctor is many to many. Create a RBD in 3NF and solve any five of the following. 1. Insert a row in Doctor Table. 2. Find names of patient who are treated by ‘Dr. Deshpande. 3. Display names of doctors who live in ‘Pune’ city. 4. Count number of patients suffering from ‘Cancer’. 5. Add ‘Discharge Date’ Column to patient table. 6. Display total no. of patients treated by each doctor. ___________________________________________________________________________ ____________________________________________________________________ 13.7 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. Define normalization? 2. What is the first normal form? 3. Define 2NF? 4. What is meant by 3NF? 5. Define the procedures to verify that given relational schema R is in 2NF or NOT, If NOT then Convert it to 2NF? Long Questions 1. Given a relation R (A, B, C, D) and Functional Dependency set FD = {AB → CD, B → C}, determine whether the given R is in 2NF? If not convert it into 2 NF. 2. What are the three steps in normalizing data? 3. Given a relation R (P, Q, R, S, T) and Functional Dependency set FD = {PQ → R, S → T}, determine whether the given R is in 2NF? If not convert it into 2 NF. 4. Implement the Hospital Patient Database Management System using Normalization. 5. Explain the following terms: a. 1 NF b. 2 NF c. 3 NF 177 CU IDOL SELF LEARNING MATERIAL (SLM)

B. Multiple choice Questions 1. In the __________ normal form, a composite attribute is converted to individual attributes. a. First b. Second c. Third d. Fourth 2. A table on the many side of a one to many or many to many relationship must: a. Be in Second Normal Form (2NF) b. Be in Third Normal Form (3NF) c. Have a single attribute key d. Have a composite key 3. Tables in second normal form (2NF): a. Eliminate all hidden dependencies b. Eliminate the possibility of insertion anomalies c. Have a composite key d. Have all non-key fields depend on the whole primary key 4. Which forms simplifies and ensures that there are minimal data aggregates and repetitive groups: a. 1NF b. 2NF c. 3NF d. All of these 5. Which forms are based on the concept of functional dependency: a. 1NF b. 2NF c. 3NF d. 4NF Answer 1. (a) 2. (d) 3. (a) 4. (c) 5. (c) 13.8 REFERENCES Text Books: 178 CU IDOL SELF LEARNING MATERIAL (SLM)

• T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson Education, New Delhi. • T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi. • T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley Publications, New Delhi. Reference Books: • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill, International Edition. • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications. 179 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 14: CONCURRENCY CONTROL 1 Structure 14.0 Learning objective 14.1 Transaction Processing 14.2 Locking Techniques and Associated 14.3 Database Recovery 14.4 Security and Authorization 14.5 Summary 14.6 Keywords 14.7 Learning Activity 14.8 Unit End Questions 14.9 References 14.0 LEARNING OBJECTIVE After studying this unit students will be able to • State the database transactions and their properties. • Define concurrency control and how it helps in maintaining the database’s integrity. • Explain the concepts in Database Recovery. • Explain the Security and Authorization methods. 14.1 TRANSACTION PROCESSING A database transaction is a logical processing unit in a database management system that contains one or more database access operations. All types of database access operation which are held between the beginning and end transaction statements are considered as a single logical transaction in DBMS. During the transaction the database is inconsistent. Only once the database is committed the state is changed from one consistent state to another. Database Transactions ▪ A transaction is a program unit whose execution may or may not change the contents of a database. ▪ The transaction concept in DBMS is executed as a single unit. ▪ If the database operations do not update the database but only retrieve data, this type of transaction is called a read-only transaction. ▪ A successful transaction can change the database from one CONSISTENT STATE to another ▪ DBMS transactions must be atomic, consistent, isolated and durable 180 CU IDOL SELF LEARNING MATERIAL (SLM)

▪ If the database were in an inconsistent state before a transaction, it would remain in the inconsistent state after the transaction. States of Transactions Figure 14.1: Database Transactions Active • The active state is the first state of every transaction. In this state, the transaction is being executed. • For example: Insertion or deletion or updating a record is done here. But all the records are still not saved to the database. Partially Committed • In the partially committed state, a transaction executes its final operation, but the data is still not saved to the database. • In the total mark calculation example, a final display of the total marks step is executed in this state. Failed • If any of the checks made by the database recovery system fails, then the transaction is said to be in the failed state. • In the example of total mark calculation, if the database is not able to fire a query to fetch the marks, then the transaction will fail to execute. Aborted • If any of the checks fail and the transaction has reached a failed state then the database recovery system will make sure that the database is in its previous consistent state. If not then it will abort or roll back the transaction to bring the database into a consistent state. • If the transaction fails in the middle of the transaction then before executing the transaction, all the executed transactions are rolled back to its consistent state. • After aborting the transaction, the database recovery module will select one of the two operations: 1. Re-start the transaction 181 CU IDOL SELF LEARNING MATERIAL (SLM)

2. Kill the transaction Committed If a transaction executes all its operations successfully, it is said to be committed. All its effects are now permanently established on the database system 1. Once a transaction states execution, it becomes active. It can issue READ or WRITE operation. 2. Once the READ and WRITE operations complete, the transactions becomes partially committed state. 3. Next, some recovery protocols need to ensure that a system failure will not result in an inability to record changes in the transaction permanently. If this check is a success, the transaction commits and enters into the committed state. 4. If the check is a failure, the transaction goes to the Failed state. 5. If the transaction is aborted while it's in the active state, it goes to the failed state. The transaction should be rolled back to undo the effect of its write operations on the database. 6. The terminated state refers to the transaction leaving the system. ACID Properties: ACID Properties are used for maintaining the integrity of the database during transaction processing. The ACID in DBMS stands for Atomicity, Consistency, Isolation, and Durability. Atomicity: A transaction is a single unit of operation. You either execute it entirely or do not execute it at all. There cannot be partial execution. Consistency: Once the transaction is executed, it should move from one consistent state to another. Isolation: Transaction should be executed in isolation from other transactions (no Locks). During concurrent transaction execution, intermediate transaction results from simultaneously executed transactions should not be made available to each other. (Level 0,1,2,3) Durability: · After successful completion of a transaction, the changes in the database should persist. Even in the case of system failures. ACID Property in DBMS with an example: Transaction 1: Begin X=X+50, Y = Y-50 END Transaction 2: Begin X=1.1*X, Y=1.1*Y END Transaction 1 is transferring $50 from account X to account Y. Transaction 2 is crediting each account with a 10% interest payment. 182 CU IDOL SELF LEARNING MATERIAL (SLM)

If both transactions are submitted together, there is no guarantee that Transaction 1 will execute before Transaction 2 or vice versa. Irrespective of the order, the result must be as if the transactions take place serially one after the other. 14.2 LOCKING TECHNIQUES AND ASSOCIATED Concurrency Control in Database Management System is a procedure of managing simultaneous operations without conflicting with each other. It ensures that Database transactions are performed concurrently and accurately to produce correct results without violating data integrity of the respective Database. Concurrent access is quite easy if all users are just reading data. There is no way they can interfere with one another. Though for any practical Database, it would have a mix of READ and WRITE operations and hence the concurrency is a challenge. DBMS Concurrency Control is used to address such conflicts, which mostly occur with a multi-user system. Therefore, Concurrency Control is the most important element for proper functioning of a Database Management System where two or more database transactions are executed simultaneously, which require access to the same data. Problem 1: Lost Update Problems (W - W Conflict) The problem occurs when two different database transactions perform the read/write operations on the same database items in an interleaved manner (i.e., concurrent execution) that makes the values of the items incorrect hence making the database inconsistent. For example: Consider the below diagram where two transactions TX and TY, are performed on the same account A where the balance of account A is $300. Figure 14.2: Lost Update Problem 183 CU IDOL SELF LEARNING MATERIAL (SLM)

▪ At time t1, transaction TX reads the value of account A, i.e., $300 (only read). ▪ At time t2, transaction TX deducts $50 from account A that becomes $250 (only deducted and not updated/write). ▪ Alternately, at time t3, transaction TY reads the value of account A that will be $300 only because TX didn't update the value yet. ▪ At time t4, transaction TY adds $100 to account A that becomes $400 (only added but not updated/write). ▪ At time t6, transaction TX writes the value of account A that will be updated as $250 only, as TY didn't update the value yet. ▪ Similarly, at time t7, transaction TY writes the values of account A, so it will write as done at time t4 that will be $400. It means the value written by TX is lost, i.e., $250 is lost. Dirty Read Problems (W-R Conflict) The dirty read problem occurs when one transaction updates an item of the database, and somehow the transaction fails, and before the data gets rollback, the updated database item is accessed by another transaction. There comes the Read-Write Conflict between both transactions. For example: Consider two transactions TX and TY in the below diagram performing read/write operations on account A where the available balance in account A is $300: Figure 14.3: Dirty Read Problem ▪ At time t1, transaction TX reads the value of account A, i.e., $300. ▪ At time t2, transaction TX adds $50 to account A that becomes $350. ▪ At time t3, transaction TX writes the updated value in account A, i.e., $350. ▪ Then at time t4, transaction TY reads account A that will be read as $350. ▪ Then at time t5, transaction TX rollbacks due to server problem, and the value changes back to $300 (as initially). 184 CU IDOL SELF LEARNING MATERIAL (SLM)

▪ But the value for account A remains $350 for transaction TY as committed, which is the dirty read and therefore known as the Dirty Read Problem. Unrepeatable Read Problem (W-R Conflict) Also known as Inconsistent Retrievals Problem that occurs when in a transaction, two different values are read for the same database item. For example: Consider two transactions, TX and TY, performing the read/write operations on account A, having an available balance = $300. The diagram is shown below: Figure 14.4: Unrepeatable Read Problem ▪ At time t1, transaction TX reads the value from account A, i.e., $300. ▪ At time t2, transaction TY reads the value from account A, i.e., $300. ▪ At time t3, transaction TY updates the value of account A by adding $100 to the available balance, and then it becomes $400. ▪ At time t4, transaction TY writes the updated value, i.e., $400. ▪ After that, at time t5, transaction TX reads the available value of account A, and that will be read as $400. ▪ It means that within the same transaction TX, it reads two different values of account A, i.e., $ 300 initially, and after updation made by transaction TY, it reads $400. It is an unrepeatable read and is therefore known as the Unrepeatable read problem. Potential problems of Concurrency: Here, are some issues which you will likely to face while using the DBMS Concurrency Control method: 1. Lost Updates occur when multiple transactions select the same row and update the row based on the value selected 2. Uncommitted dependency issues occur when the second transaction selects a row that is updated by another transaction (dirty read) 185 CU IDOL SELF LEARNING MATERIAL (SLM)

3. Non-Repeatable Read occurs when a second transaction is trying to access the same row several times and reads different data each time. 4. The incorrect Summary issue occurs when one transaction takes summary over the value of all the instances of a repeated data-item, and second transaction update few instances of that specific data-item. In that situation, the resulting summary does not reflect a correct result. Reasons for using the Concurrency control method is DBMS: 1. To apply Isolation through mutual exclusion between conflicting transactions 2. To resolve read-write and write-write conflict issues 3. To preserve database consistency through constantly preserving execution obstructions 4. The system needs to control the interaction among the concurrent transactions. This control is achieved using concurrent-control schemes. 5. Concurrency control helps to ensure serializability Concurrency Control Protocols: Different concurrency control protocols offer different benefits between the amount of concurrency they allow and the amount of overhead that they impose. Following are the Concurrency Control techniques in DBMS: 1. Lock-Based Protocols 2. Two-Phase Locking Protocol 3. Timestamp-Based Protocols 4. Validation-Based Protocols 1.Lock-based Protocols: Lock Based Protocols in DBMS is a mechanism in which a transaction cannot Read or Write the data until it acquires an appropriate lock. Lock based protocols help to eliminate the concurrency problem in DBMS for simultaneous transactions by locking or isolating a particular transaction to a single user. A lock is a data variable which is associated with a data item. This lock signifies that operations that can be performed on the data item. Locks in DBMS help synchronize access to the database items by concurrent transactions. All lock requests are made to the concurrency-control manager. Transactions proceed only once the lock request is granted. Binary Locks: A Binary lock on a data item can either locked or unlocked states. Shared/exclusive: 186 CU IDOL SELF LEARNING MATERIAL (SLM)

This type of locking mechanism separates the locks in DBMS based on their uses. If a lock is acquired on a data item to perform a write operation, it is called an exclusive lock. Shared Lock (S): A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared between transactions. This is because you will never have permission to update data on the data item. For example, consider a case where two transactions are reading the account balance of a person. The database will let them read by placing a shared lock. However, if another transaction wants to update that account's balance, shared lock prevent it until the reading process is over. Exclusive Lock (X): With the Exclusive Lock, a data item can be read as well as written. This is exclusive and can't be held concurrently on the same data item. X-lock is requested using lock-x instruction. Transactions may unlock the data item after finishing the 'write' operation. For example, when a transaction needs to update the account balance of a person. You can allows this transaction by placing X lock on it. Therefore, when the second transaction wants to read or write, exclusive lock prevents this operation. 1.Simplistic Lock Protocol: This type of lock-based protocols allows transactions to obtain a lock on every object before beginning operation. Transactions may unlock the data item after finishing the 'write' operation. 2.Pre-claiming Locking: ▪ Pre-claiming Lock Protocols evaluate the transaction to list all the data items on which they need locks. ▪ Before initiating an execution of the transaction, it requests DBMS for all the lock on all those data items. ▪ If all the locks are granted then this protocol allows the transaction to begin. When the transaction is completed then it releases all the lock. ▪ If all the locks are not granted then this protocol allows the transaction to rolls back and waits until all the locks are granted. 187 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 14.5: Lock Protocols Starvation: Starvation is the situation when a transaction needs to wait for an indefinite period to acquire a lock. Following are the reasons for Starvation: 1. When the waiting scheme for locked items is not properly managed 2. In the case of resource leak 3. The same transaction is selected as a victim repeatedly Deadlock: Deadlock refers to a specific situation where two or more processes are waiting for each other to release a resource or more than two processes are waiting for the resource in a circular chain. 2.Two-Phase Locking Protocol: Two-Phase Locking Protocol is also known as 2PL protocol is a method of concurrency control in DBMS that ensures serializability by applying a lock to the transaction data which blocks other transactions to access the same data simultaneously. Two-Phase Locking protocol helps to eliminate the concurrency problem in DBMS. This locking protocol divides the execution phase of a transaction into three different parts. 1. In the first phase, when the transaction begins to execute, it requires permission for the locks it needs. 2. The second part is where the transaction obtains all the locks. When a transaction releases its first lock, the third phase starts. 3. In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired locks. 188 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 14.6: Two-Phase Locking Protocol The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in two steps: Growing Phase: In this phase transaction may obtain locks but may not release any locks. Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock The 2PL protocol indeed offers serializability. However, it does not ensure that deadlocks do not happen. 3.Strict Two-Phase Locking Method Strict-Two phase locking system is almost similar to 2PL. The only difference is that Strict- 2PL never releases a lock after using it. It holds all the locks until the commit point and releases all the locks at one go when the process is over. Centralized 2PL: In Centralized 2 PL, a single site is responsible for the lock management process. It has only one lock manager for the entire DBMS. Primary copy 2PL: Primary copy 2PL mechanism, many lock managers are distributed to different sites. After that, a particular lock manager is responsible for managing the lock for a set of data items. When the primary copy has been updated, the change is propagated to the slaves. Distributed 2PL: In this kind of two-phase locking mechanism, Lock managers are distributed to all sites. They are responsible for managing locks for data at that site. If no data is replicated, it is equivalent to primary copy 2PL. Communication costs of Distributed 2PL are quite higher than primary copy 2PL Timestamp-based Protocols 189 CU IDOL SELF LEARNING MATERIAL (SLM)

Timestamp based Protocol in DBMS is an algorithm which uses the System Time or Logical Counter as a timestamp to serialize the execution of concurrent transactions. The Timestamp- based protocol ensures that every conflicting read and write operations are executed in a timestamp order. The older transaction is always given priority in this method. It uses system time to determine the time stamp of the transaction. This is the most commonly used concurrency protocol. Lock-based protocols help you to manage the order between the conflicting transactions when they will execute. Timestamp-based protocols manage conflicts as soon as an operation is created. Example: Suppose there are there transactions T1, T2, and T3. T1 has entered the system at time 0010 T2 has entered the system at 0020 T3 has entered the system at 0030 Priority will be given to transaction T1, then transaction T2 and lastly Transaction T3. Advantages: Schedules are serializable just like 2PL protocols No waiting for the transaction, which eliminates the possibility of deadlocks! Disadvantages: Starvation is possible if the same transaction is restarted and continually aborted Validation Based Protocol Validation based Protocol in DBMS also known as Optimistic Concurrency Control Technique is a method to avoid concurrency in transactions. In this protocol, the local copies of the transaction data are updated rather than the data itself, which results in less interference while execution of the transaction. The Validation based Protocol is performed in the following three phases: 1. Read Phase 2. Validation Phase 3. Write Phase 4. Read Phase In the Read Phase, the data values from the database can be read by a transaction but the write operation or updates are only applied to the local data copies, not the actual database. Validation Phase: In the Validation Phase, the data is checked to ensure that there is no violation of serializability while applying the transaction updates to the database. 190 CU IDOL SELF LEARNING MATERIAL (SLM)

Write Phase: In the Write Phase, the updates are applied to the database if the validation is successful, else; the updates are not applied, and the transaction is rolled back. 14.3 DATABASE RECOVERY Database Recovery in DBMS And Its Techniques There can be any case in a database system like any computer system when database failure happens. So data stored in the database should be available all the time whenever it is needed. So Database recovery means recovering the data when it gets deleted, hacked or damaged accidentally. Atomicity is a must whether is transaction is over or not it should reflect in the database permanently or it should not affect the database at all. So database recovery and database recovery techniques are a must in DBMS. Crash recovery: DBMS may be an extremely complicated system with many transactions being executed each second. The sturdiness and hardiness of software rely upon its complicated design and its underlying hardware and system package. If it fails or crashes amid transactions, it’s expected that the system would follow some style of rule or techniques to recover lost knowledge. Classification of failure: ▪ Transaction failure ▪ System crash ▪ Disk failure Figure 14.7: Classification of Failure Transaction Failure: A transaction needs to abort once it fails to execute or once it reaches to any further extent from wherever it can’t go to any extent further. This is often known as transaction failure wherever solely many transactions or processes are hurt. The reasons for transaction failure are: ▪ Logical errors ▪ System errors Logical errors: Where a transaction cannot complete as a result of its code error or an internal error condition. 191 CU IDOL SELF LEARNING MATERIAL (SLM)

System errors: Wherever the information system itself terminates an energetic transaction as a result of the DBMS isn’t able to execute it, or it’s to prevent due to some system condition. to Illustrate, just in case of situation or resource inconvenience, the system aborts an active transaction. System crash: There are issues − external to the system − that will cause the system to prevent abruptly and cause the system to crash. For instance, interruptions in power supply might cause the failure of underlying hardware or software package failure. Examples might include OS errors. Disk failure: In the early days of technology evolution, it had been a typical drawback wherever hard-disk drives or storage drives accustomed to failing oftentimes. Disk failures include the formation of dangerous sectors, unreachability to the disk, disk crash or other failures, that destroys all or a section of disk storage. System crash: There are problems − external to the system − that may cause the system to stop abruptly and cause the system to crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure. Examples may include operating system errors. Disk Failure: ▪ In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail frequently. ▪ Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which destroys all or a part of disk storage. 14.4 SECURITY AND AUTHORIZATION Security: DB2 database and functions can be managed by two different modes of security controls 1. Authentication 2. Authorization Authentication: Authentication is the process of confirming that a user logs in only following the rights to perform the activities he is authorized to perform. User authentication can be performed at the operating system level or database level itself. By using authentication tools for biometrics such as retina and figure prints are in use to keep the database from hackers or malicious users. The database security can be managed from outside the db2 database system. Here are some types of security authentication process: ▪ Based on Operating System authentications. 192 CU IDOL SELF LEARNING MATERIAL (SLM)

▪ Lightweight Directory Access Protocol (LDAP) For DB2, the security service is a part of the operating system as a separate product. For Authentication, it requires two different credentials, those are userid or username, and password. ▪ Vector allows DBMS level authentication in addition to the other supported methods (which include operating system authentication, installation passwords, and Kerberos authentication). The DBMS authentication feature removes the need to add an operating system user every time a new user needs to access a database. ▪ A user who is appropriately defined in the database can access the database using a valid Vector user name and password. The user does not have to be defined at the operating system level or in a global directory. ▪ DBMS authentication must be enabled for the DBMS Server on which the database resides. It is on by default (dbms_authentication=on in config.dat). Such authentication can be enabled at the server level only, not at the database level. ▪ Database administrators can configure DBMS authentication for each user by using new WITH options on the CREATE USER and ALTER USER statements or by using Actian Director or accessdb. ▪ A user can be defined on the CREATE USER statement as either WITH DBMS_AUTHENTICATION='REQUIRED' or WITH DBMS_AUTHENTICATION='OPTIONAL' (the default). ▪ A user who is defined WITH DBMS_AUTHENTICATION='REQUIRED' must connect to the database using his DBMS user name and password. All other connection attempts will fail. Such a user cannot connect to a server configured as dbms_authentication = no. ▪ System administration users must be defined as DBMS_AUTHENTICATION = 'OPTIONAL.' All users with the \"security\" privilege, including the installation owner, are forced to be DBMS_AUTHENTICATION='OPTIONAL'. ▪ If a user is created or altered with DBMS_AUTHENTICATION='REQUIRED', the user must also have a DBMS password or an error is issued. A user defined with no DBMS password can connect to dbms_authentication enabled servers only through a local connection, installation password, or Kerberos authentication. ▪ Users can set and modify their own DBMS passwords if they have the CHANGE_PASSWORD privilege (the default). DBMS passwords are encrypted on disk and when passed over the network. ▪ DBMS authentication is backward compatible. An older remote client will work with new remote dbms_authentication enabled servers assuming that the DBMS password (defined by the CREATE USER statement and stored in iiuser catalogue) is compatible with the vnode password. ▪ When an existing installation is upgraded, existing user definitions are modified to DBMS_AUTHENTICATION='OPTIONAL'. 193 CU IDOL SELF LEARNING MATERIAL (SLM)

Authorization: You can access the DB2 Database and its functionality within the DB2 database system, which is managed by the DB2 Database manager. Authorization is a process managed by the DB2 Database manager. The manager obtains information about the current authenticated user, that indicates which database operation the user can perform or access. Primary permission: Grants the authorization ID directly. Secondary permission: Grants to the groups and roles if the user is a member Public permission: Grants to all users publicly. Context-sensitive permission: Grants to the trusted context role. Authorization can be given to users based on the categories below: ▪ System-level authorization ▪ System administrator [SYSADM] ▪ System Control [SYSCTRL] ▪ System maintenance [SYSMAINT] ▪ System monitor [SYSMON] Authorities provide of control over instance-level functionality. Authority provide to group privileges, to control maintenance and authority operations. For instance, database and database objects • Database-level authorization • Security Administrator [SECADM] • Database Administrator [DBADM] Access Control [ACCESSCTRL] Data access [DATAACCESS] SQL administrator. [SQLADM] Workload management administrator [WLMADM] Explain [EXPLAIN] Authorities provide controls within the database. Other authorities for database include with LDAD and CONNECT. Object-Level Authorization: Object-Level authorization involves verifying privileges when an operation is performed on an object. Content-based Authorization: User can have read and write access to individual rows and columns on a particular table using Label-based access Control [LBAC]. DB2 tables and configuration files are used to record the permissions associated with authorization names. When a user tries to access the data, the recorded permissions verify the following permissions: ▪ Authorization name of the user ▪ Which group belongs to the user ▪ Which roles are granted directly to the user or indirectly to a group ▪ Permissions acquired through a trusted context. ▪ While working with the SQL statements, the DB2 authorization model considers the combination of the following permissions: 194 CU IDOL SELF LEARNING MATERIAL (SLM)

▪ Permissions granted to the primary authorization ID associated with the SQL statements. ▪ Secondary authorization IDs associated with the SQL statements. ▪ Granted to PUBLIC ▪ Granted to the trusted context role. Instance level authorities System administration authority (SYSADM) It is highest level administrative authority at the instance-level. Users with SYSADM authority can execute some databases and database manager commands within the instance. Users with SYSADM authority can perform the following operations: ▪ Upgrade a Database ▪ Restore a Database ▪ Update Database manager configuration file. System control authority (SYSCTRL) It is the highest level in System control authority. It provides to perform maintenance and utility operations against the database manager instance and its databases. These operations can affect system resources, but they do not allow direct access to data in the database. Users with SYSCTRL authority can perform the following actions: ▪ Updating the database, Node, or Distributed Connect Service (DCS) directory ▪ Forcing users off the system-level ▪ Creating or Dropping a database-level ▪ Creating, altering, or dropping a table space ▪ Using any table space ▪ Restoring Database System maintenance authority (SYSMAINT) It is a second level of system control authority. It provides to perform maintenance and utility operations against the database manager instance and its databases. These operations affect the system resources without allowing direct access to data in the database. This authority is designed for users to maintain databases within a database manager instance that contains sensitive data. Only Users with SYSMAINT or higher level system authorities can perform the following tasks: ▪ Taking backup ▪ Restoring the backup ▪ Roll forward recovery ▪ Starting or stopping instance ▪ Restoring tablespaces ▪ Executing db2trc command ▪ Taking system monitor snapshots in case of an Instance level user or a database level user. 195 CU IDOL SELF LEARNING MATERIAL (SLM)

A user with SYSMAINT can perform the following tasks: ▪ Query the state of a tablespace ▪ Updating log history files ▪ Reorganizing of tables ▪ Using RUNSTATS (Collection catalogue statistics) System monitors authority (SYSMON) With this authority, the user can monitor or take snapshots of database manager instance or its database. SYSMON authority enables the user to run the following tasks: ▪ GET DATABASE MANAGER MONITOR SWITCHES ▪ GET MONITOR SWITCHES ▪ GET SNAPSHOT ▪ LIST • LIST ACTIVE DATABASES • LIST APPLICATIONS • LIST DATABASE PARTITION GROUPS • LIST DCS APPLICATIONS • LIST PACKAGES • LIST TABLES • LIST TABLESPACE CONTAINERS • LIST TABLESPACES • LIST UTITLITIES ▪ RESET MONITOR ▪ UPDATE MONITOR SWITCHES Database authorities Each database authority holds the authorization ID to perform some action on the database. These database authorities are different from privileges. Here is the list of some database authorities: ACCESSCTRL: allows to grant and revoke all object privileges and database authorities. BINDADD: Allows to create a new package in the database. CONNECT: Allows to connect to the database. CREATETAB: Allows to create new tables in the database. CREATE_EXTERNAL_ROUTINE: Allows to create a procedure to be used by applications and the users of the databases. DATAACCESS: Allows to access data stored in the database tables. DBADM: Act as a database administrator. It gives all other database authorities except ACCESSCTRL, DATAACCESS, and SECADM. EXPLAIN: Allows to explain query plans without requiring them to hold the privileges to access the data in the tables. IMPLICIT_SCHEMA: Allows a user to create a schema implicitly by creating an object using a CREATE statement. 196 CU IDOL SELF LEARNING MATERIAL (SLM)

LOAD: Allows to load data into table. QUIESCE_CONNECT: Allows to access the database while it is quiesce (temporarily disabled). SECADM: Allows to act as a security administrator for the database. SQLADM: Allows to monitor and tune SQL statements. WLMADM: Allows to act as a workload administrator 14.5 SUMMARY • Concurrency control is the procedure in DBMS for managing simultaneous operations without conflicting with each other. • Lost Updates, dirty read, Non-Repeatable Read, and Incorrect Summary Issue are problems faced due to lack of concurrency control. • Lock-Based, Two-Phase, Timestamp-Based, Validation-Based are types of Concurrency handling protocols • The lock could be Shared (S) or Exclusive (X) • Two-Phase locking protocol which is also known as a 2PL protocol needs transaction should acquire a lock after it releases one of its locks. It has 2 phases growing and shrinking. • The timestamp-based algorithm uses a timestamp to serialize the execution of concurrent transactions. The protocol uses the System Time or Logical Count as a Timestamp. 14.6 KEYWORDS • SYSADM: System administrator [SYSADM] • SYSCTRL: System Control [SYSCTRL] • SYSMAINT: System maintenance [SYSMAINT] • SYSMON: System monitor [SYSMON] • LDAP: Lightweight Directory Access Protocol (LDAP) 14.7 LEARNING ACTIVITY Blood Donation Management You can create a database project for a blood donation clinic. You should start by adding donor names and assign them unique IDs. Add their details and relevant information such as blood type, medical report, and contact number. Similarly, add patient names with unique IDs, details on their medical conditions, and blood types. You can add the Recovery module also. 197 CU IDOL SELF LEARNING MATERIAL (SLM)

___________________________________________________________________________ ____________________________________________________________________ 14.8 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. What is Concurrency Control? 2. What is a System Failure? 3. What is Lock? Types of lock in DBMS. 4. What are the various states of a transaction concept in DBMS? 5. What is the difference between deadlock and starvation? Long Questions 1. Explain in detail are the Concurrency Control techniques in DBMS. 2. Write a short note on the Two-phase locking protocol. What are its advantages and disadvantages? 3. What are the different types of Database security? Briefly explain it. 4. Explain various types of database recovery. 5. Describe database security. B. Multiple choice Questions 1. If a transaction has obtained a __________ lock, it can read but cannot write on the item a. Shared mode b. Exclusive mode c. Read-only mode d. Write-only mode 2. If a transaction has obtained a ________ lock, it can both read and write on the item a. Shared mode b. Exclusive mode c. Read-only mode d. Write-only mode 3. A transaction can proceed only after the concurrency control manager ________ the lock to the transaction a. Grants b. Requests 198 CU IDOL SELF LEARNING MATERIAL (SLM)

c. Allocates d. None of these 4. The protocol that indicates when a transaction may lock and unlock each of the data items is called as __________ a. Locking protocol b. Unlocking protocol c. Granting protocol d. Conflict protocol 5. The database administrator who authorizes all the new users, modifies the database and takes grants privilege is a. Super user b. Administrator c. Operator of operating system d. All of these Answers 1.(a) 2. (b) 3. (c) 4. (a) 5. (d) 14.9 REFERENCES Text Books: • T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson Education, New Delhi. • T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi. • T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley Publications, New Delhi. Reference Books: • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill, International Edition. • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications. 199 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 15: CONCURRENCY CONTROL 2 Structure 15.0 Learning objective 15.1 Recovery Techniques 15.2 Database Security 15.3 Introduction to Distributed Data Processing 15.4 Summary 15.5 Keywords 15.6 Learning Activity 15.7 Unit End Questions 15.8 References 15.0 LEARNING OBJECTIVE In this unit, students will be able to: ▪ Analyse the concepts of Recovery Techniques. ▪ Describe the types of data security measures. ▪ Elaborate the types and layes in Distributed Data Processing. 15.1 RECOVERY TECHNIQUES Database systems, like any other computer system, are subject to failures but the data stored in it must be available as and when required. When a database fails it must possess the facilities for fast recovery. It must also have atomicity i.e. either transaction are completed successfully and committed (the effect is recorded permanently in the database) or the transaction should not affect the database. There are both automatic and non-automatic ways for both, backing up of data and recovery from any failure situations. The techniques used to recover the lost data due to system crash, transaction errors, viruses, catastrophic failure, incorrect commands execution etc. are database recovery techniques. So to prevent data loss recovery techniques based on deferred update and immediate update or backing up data can be used. Recovery techniques are heavily dependent upon the existence of a special file known as a system log. It contains information about the start and end of each transaction and any updates which occur in the transaction. The log keeps track of all transaction operations that affect the values of database items. This information is needed to recover from transaction failure. ▪ The log is kept on disk start_transaction(T): This log entry records that transaction T starts the execution. 200 CU IDOL SELF LEARNING MATERIAL (SLM)


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook