Figure 4.24: Student and Person Key Representation Types of key: Figure 4.25: Types of Key Primary key ▪ It is the first key which is used to identify one and only one instance of an entity uniquely. An entity can contain multiple keys as we saw in PERSON table. The key which is most suitable from those lists become a primary key. ▪ In the EMPLOYEE table, ID can be primary key since it is unique for each employee. In the EMPLOYEE table, we can even select License_Number and Passport_Number as primary key since they are also unique. ▪ For each entity, selection of the primary key is based on requirement and developers. 51 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 4.26: Primary Key Representation Candidate key ▪ A candidate key is an attribute or set of an attribute which can uniquely identify a tuple. ▪ The remaining attributes except for primary key are considered as a candidate key. The candidate keys are as strong as the primary key. For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of the attributes like SSN, Passport_Number, and License_Number, etc. are considered as a candidate key. Figure 4.27: Candidate Key Representation Super Key Super key is a set of an attribute which can uniquely identify a tuple. Super key is a superset of a candidate key. 52 CU IDOL SELF LEARNING MATERIAL (SLM)
For example: In the above EMPLOYEE table, for (EMPLOEE_ID, EMPLOYEE_NAME) the name of two employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also be a key. The super key would be EMPLOYEE-ID, (EMPLOYEE_ID, EMPLOYEE-NAME), etc. Foreign key ▪ Foreign keys are the column of the table which is used to point to the primary key of another table. ▪ In a company, every employee works in a specific department, and employee and department are two different entities. So we can't store the information of the department in the employee table. That's why we link these two tables through the primary key of one table. ▪ We add the primary key of the DEPARTMENT table, Department_Id as a new attribute in the EMPLOYEE table. ▪ Now in the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related. Figure 4.28: Super Key Representation 4.4 REDUCTION OF E-R DIAGRAM INTO TABLES E-R diagram is used for database modelling. Largely this is used for modelling the relational database of RDBMS. RDBMS is a collection of tables or relations. E-R diagram gives information about entity sets, attributes and relationship sets. The model also gives information of 'key attribute'. 53 CU IDOL SELF LEARNING MATERIAL (SLM)
The relations are also named as relation schemes. It is easy to derive the relations from model and then these relations are converted into the tables during physical design phase of database design. Converting an E-R diagram into relational database is called transformation. The relations obtaining from the E- R diagram is 'conceptual' only. These conceptual relations are the base for the relational database design. This approach goes from bottom-to-top. The second approach is 'normalization' and it is top-to-bottom approach. Transformation of E-R model in to a relational database includes 1. Each entity set is transformed in to a relation or relation scheme. The name of the relation will be same as the label indicated in the rectangle. 2. Each relationship set is transformed into a relation/relation scheme. The name of the relation will same as the label indicated in the diamond. 3. The attributes represented by ovals are converted into the column names of corresponding relations taking the same name indicated in the oval. 4. The pk attributes of the participating entity sets are converted into the column names of respective relations of relationship set, with the same name indicated in the oval. Such columns refer to the base relations. So, they are foreign keys. 5. The relationship attribute sets represented using ovals are transformed into columns of respective relations with the same name indicated in the oval. 6. The primary key attributes of all the relations are underlined. 7. Normally the columns that consist of primary key are placed in the beginning of the relation and other non-key columns may be in any order. The syntax of a relation is as follows: Figure 4.29: Key Attribute Representation 54 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 4.30: Reduce the ER Diagram From the E-R diagram, • 'Student' and 'Course' are the entity sets and is reduced in to two relations. • 'Enroll' is the relationship set and is reduced to the third relation. • The attributes 'Sid', 'SName', 'Dob', 'Place' and 'Phone' will be reduced to the columns of relation 'Student'. The column name 'Sid' is primary key of the entity set 'Student', so it is shown as underlined. • 'CId', 'CName' and 'CDurn' are the attributes will be reduced to the columns of relation 'Course'. The column name 'CId' is primary key of the entity set 'Course', so it will be underlined • 'Fee' and 'Doe' are the attributes and will be reduced to the columns names of 'Enroll' relation, in addition to 'Sid' of 'Student' and 'CId' of 'Course'. The composition of column names 'Sid’ and 'CId' will become the primary key attribute of the 'Enroll' relation. The collection of relations is illustrated as follows: Student (Sid, SName, Place, Phone, Dob) Course (CId, CName, CDurn) Enroll (Sid, CId, Fee, Doe) The above collection of relations forms a relational database. The three tables derived with the help of these relations are depicted as follows Table 4.1: Student and Course 55 CU IDOL SELF LEARNING MATERIAL (SLM)
Table 4.2: Enroll From the above tables we can understand that the values for 'Sid' column in 'Enroll' table are reflected from the 'Student' table. A student who is not in the 'Student' table never exists in the 'Enroll' table. This constraint is called 'foreign key' constraint. In the same way the value of 'CId' exists in 'Enroll' if and only if it exists in the 'Course' table. With reference to Access the 'Student' and 'Course’ are base tables whereas 'Enroll' is derived table. We can also understand that the cardinality, many-to-many. Three students are enrolled for the course ' 101' and two students are enrolled for the course ‘104'. Example: Reduce the following the E- R diagram into tables: 56 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 4.31: ER Diagram From the E-R diagram • The entity sets 'Song', 'Participant' and 'Judge' will be reduced to three relations. • The relationship sets 'Performs' and 'Judgment' reduced to the fourth and fifth relations. • The attributes 'Sgld', 'STitle', 'MusDir', 'Singers', 'Film' and 'Lyrics' will be reduced as the columns of relation 'Song'. The attribute 'Sgld' will be the primary key. • The attributes 'PId', 'PName', 'PPlace' and 'Age' will be reduced to the columns of relation 'Participant'. The column name 'PId' is primary key. • The attributes Jid', 'JName' and 'Exp' will be reduced as the columns of relation 'Judge'. The column name 'JId' is the primary key in the relation. • The attributes 'Sgld' and 'PId' will be reduced to the column names of relation 'Performs'. The column names 'Sgld' and 'PId' become the primary key attribute of the relation ‘Performs'. • The attributes 'Grade' and 'Rem' will be reduced to the columns names of relation 'Judgment in addition to 'PId' of 'Participant' and 'JId' of 'Judge'. The column names 'PId' and 'JId' become the primary key attribute of the relation 'Judgment'. So, the set of relations is as follows: Song (Sgld, STitle, MusDir, Film, Lyrics, Singers) Participant (PId, PName, PPIace, Age) Judge (JId, JName, Exp) Performs (Sgld, PId) Judgement (JId, PId, Grade, Rem) The above-mentioned collection of relations forms a relational database. Five tables can be derived or defined with the help of these. The five tables are depicted as follows: 57 CU IDOL SELF LEARNING MATERIAL (SLM)
Table 4.3: Song Table 4.4: Participant, Judge, Performs and Judgement 4.5 SUMMARY 58 ▪ ER Model in DBMS stands for an Entity-Relationship model ▪ The ER model is a high-level data model diagram ▪ ER diagrams are a visual tool which is helpful to represent the ER model ▪ ER diagrams in DBMS are blueprint of a database CU IDOL SELF LEARNING MATERIAL (SLM)
▪ Entity relationship diagram DBMS displays the relationships of entity set stored in a database ▪ ER diagrams help you to define terms related to entity relationship modeling ▪ ER Model in DBMS is based on three basic concepts: Entities, Attributes & Relationships ▪ An entity can be place, person, object, event or a concept, which stores data in the database (DBMS) ▪ Relationship is nothing but an association among two or more entities ▪ A weak entity is a type of entity which doesn't have its key attribute ▪ It is a single-valued property of either an entity-type or a relationship-type ▪ It helps you to defines the numerical attributes of the relationship between two entities or entity sets ▪ ER- Diagram DBMS is a visual representation of data that describe how data is related to each other ▪ While Drawing ER diagrams in DBMS, you need to make sure all your entities and relationships are properly labelled. 4.6 KEYWORDS ▪ Super Key - A super key is a group of single or multiple keys which identifies rows in a table. ▪ Primary Key - is a column or group of columns in a table that uniquely identify every row in that table. ▪ Candidate Key - is a set of attributes that uniquely identify tuples in a table. Candidate Key is a super key with no repeated attributes. ▪ Alternate Key - is a column or group of columns in a table that uniquely identify every row in that table. ▪ Foreign Key - is a column that creates a relationship between two tables. The purpose of Foreign keys is to maintain data integrity and allow navigation between two different instances of an entity. ▪ Compound Key - has two or more attributes that allow you to uniquely recognize a specific record. It is possible that each column may not be unique by itself within the database. ▪ Composite Key - An artificial key which aims to uniquely identify each record is called a surrogate key. These kinds of key are unique because they are created when you don't have any natural primary key. ▪ Surrogate Key - An artificial key which aims to uniquely identify each record is called a surrogate key. These kinds of key are unique because they are created when you don't have any natural primary key. ▪ Relationship: A relationship describes how entities interact. 59 CU IDOL SELF LEARNING MATERIAL (SLM)
4.7 LEARNING ACTIVITY The railway network of our country is one of the most complex public establishments. You can design a database solution for this network and make the management of the same more natural. Your system should have the following pieces of information: ▪ Train IDs with names ▪ Schedules of the trains The train schedules should have information on the stations from where the train starts and by when it reaches the destination. It should also include information on which stations it passes through during its journey. To keep things simple, you can assume that every train completes its journey within a day, and they run daily. However, you’ll also need to store information on the sequence of the stations a train passes through. For example, if a train starts from Delhi and goes to Kolkata through Lucknow, then you’ll need to add the arrival and departure times of the train for all these stations. Keeping the stations in sequence will allow easy management of trains and their data. Till here, the project is rather easy. You can make it more challenging by adding the passenger information of every train such as its coaches, seat numbers, types of coaches, passenger names, and so on. 4.8 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. What are the different types of keys in the database? 2. Write short notes on candidate key and super key. 3. What is ER model? 4. Define weak entity and strong entity? 5. What are the components of ER diagram? Long Questions 1. Explain in details about ER model. 2. Explain Mapping constraints with suitable example. 3. Discuss the candidate key, primary key, super key, composite key and alternate key in detail. 4. Construct an E-R diagram for banking enterprise. 5. Explain in detail the reduction of an E-R diagram into tables. B. Multiple choice Questions 60 CU IDOL SELF LEARNING MATERIAL (SLM)
1. An ________ is a set of entities of the same type that share the same properties, or attributes. a. Entity set b. Attribute set c. Relation set d. Entity model 2. Entity is a _________. a. Object of relation b. Present working model c. Thing in real world d. Model of relation 3. Which of the following can be a multivalued attribute? a. Phone_number b. Name c. Date_of_birth d. All of these 4. Which of the following is a single-valued attribute? a. Register_number b. Address c. SUBJECT_TAKEN d. Reference 5. _____________ express the number of entities to which another entity can be associated via a relationship set. a. Mapping Cardinality b. Relational Cardinality c. Participation Constraints d. None of these Answer 1. (a) 2. (c) 3. (a) 4. (a) 5. (a) 4.9 REFERENCES Text Books: 61 CU IDOL SELF LEARNING MATERIAL (SLM)
• T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson Education, New Delhi. • T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi. • T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley Publications, New Delhi. Reference Books: • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill, International Edition. • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications. 62 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 5: ROLES AND STRUCTURAL CONSTRAINTS: Structure 5.0 Learning objective 5.1 Roles and Structural Constraints 5.2 Weak Entities 5.3 Enhanced E-R and Object Modelling 5.4 Sub Classes, Super Classes 5.5 Inheritance, Specialization and Generalization 5.6 Summary 5.7 Keywords 5.8 Learning Activity 5.9 Unit End Questions 5.10 References 5.0 LEARNING OBJECTIVE Upon successful completion of this unit, student will be able to: ▪ Design ER Model using ER diagram. ▪ To define the terms of subclass and superclass. ▪ To analyse the concepts of Inheritance, Specialization and Generalization 5.1 ROLES AND STRUCTURAL CONSTRAINTS To understand Structural Constraints, we must take a look at Cardinality Ratios and Participation Constraints. Cardinality Ratios of relationships: The entities are denoted by rectangle and relationships by diamond. Figure 5.1: Relationship There are numbers (represented by M and N) written above the lines which connect relationships and entities. These are called cardinality ratios. These represent the maximum number of entities that can be associated with each other through relationship, R. Types of Cardinality: There can be 4 types of cardinality – 63 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 5.2: Mapping cardinality One-to-one (1:1) –When one entity in each entity set takes part at most once in the relationship, the cardinality is one-to-one. One-to-many (1: N) –If entities in the first entity set take part in the relationship set at most once and entities in the second entity set take part many times (at least twice), the cardinality is said to be one-to-many. Many-to-one (N:1) –If entities in the first entity set take part in the relationship set many times (at least twice), while entities in the second entity set take part at most once, the cardinality is said to be many-to-one. Many-to-many (N: N) –The cardinality is said to be many to many if entities in both the entity sets take part many times (at least twice) in the relationship set. Participation Constraints: Participation Constraints tell us that participation in a relationship can either be total or partial. 64 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 5.3: Partial and Total Participation When each entity in an entity set participates in a relation, it is called Total Participation. However, when all entities in the given entity set do not participate in a relation, it is called Partial Participation. Structural Constraints Structural Constraints are also called Structural properties of a database management system (DBMS). Cardinality Ratios and Participation Constraints are taken together are called Structural Constraints. The name constraints refer to the fact that such limitations must be imposed on the data, for the DBMS system to be consistent with the requirements. Figure 5.4: Structural Constraints The Structural constraints are represented by Min-Max notation. This is a pair of numbers (m, n) that appear on the connecting line between the entities and their relationships. The minimum number of times an entity can appear in a relation is represented by m whereas, the maximum time it is available is denoted by n. If m is 0 it signifies that the entity is participating in the relation partially, whereas, if m is either greater than or equal to 1, it denotes total participation of the entity. 65 CU IDOL SELF LEARNING MATERIAL (SLM)
5.2 WEAK ENTITIES Weak Entity: An entity type should have a key attribute which uniquely identifies each entity in the entity set, but there exists some entity type for which key attribute can’t be defined. These are called Weak Entity type. The entity sets which do not have sufficient attributes to form a primary key are known as weak entity sets and the entity sets which have a primary key are known as strong entity sets. As the weak entities do not have any primary key, they cannot be identified on their own, so they depend on some other entity (known as owner entity). The weak entities have total participation constraint (existence dependency) in its identifying relationship with owner identity. Weak entity types have partial keys. Partial Keys are set of attributes with the help of which the tuples of the weak entities can be distinguished and identified. Weak entity always has total participation but Strong entity may not have total participation. Weak entity is depending on strong entity to ensure the existence of weak entity. Like strong entity, weak entity does not have any primary key, it has partial discriminator key. Weak entity is represented by double rectangle. The relation between one strong and one weak entity is represented by double diamond. Figure 5.5: Weak Entity (Customer and Loan Relationship) Weak entities are represented with double rectangular box in the ER Diagram and the identifying relationships are represented with double diamond. Partial Key attributes are represented with dotted lines. 66 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 5.6: Weak Entity (Employee and Departments Relationship) Example-1: In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is the identifying relationship and ‘Payment Number’ is the partial key. Primary Key of the Loan along with the partial key would be used to identify the records. Figure 5.7: Weak Entity (Officer and Payment Relationship) Example-2: The existence of rooms is entirely dependent on the existence of a hotel. So, room can be seen as the weak entity of the hotel. Example-3: The bank account of a particular bank has no existence if the bank doesn’t exist anymore. Example-4: A company may store the information of dependants (Parents, Children, Spouse) of an Employee. But the dependents don’t have existence without the employee. So Dependent will be weak entity type and Employee will be Identifying Entity type for Dependant. Other examples: Strong entity | Weak entity Order | Order Item Employee | Dependent Class | Section Host | Logins Weak Entity in SQL: 67 CU IDOL SELF LEARNING MATERIAL (SLM)
Imagine the Employee table with the following columns: EmployeeID, EmpName, EmpDept, ... The Managers table would be like: ManagerID, EmployeeID(foreign-key), ManagerName, ... Now, each Manager is an Employee, thus if at all there is a Manager in the Manager table, there would be the same entry in Employee table. The \"is a\" relationship is maintained: Each manager is an Employee but each Employee is not a Manager The query would be something like: CREATE TABLE Employee ( EmployeeID int NOT NULL, LastName varchar(255), FirstName varchar(255), Address varchar(255), City varchar(255), PRIMARY KEY (EmployeeID) ) CREATE TABLE Managers ( ManagerID int NOT NULL, EmployeeID int NOT NULL, .. ... FOREIGN KEY (EmployeeID) REFERENCES Employee(EmployeeID) ) 5.3 ENHANCED E-R AND OBJECT MODELLING Enhanced entity-relationship diagrams are advanced database diagrams very similar to regular ER diagrams which represent requirements and complexities of complex databases. It is a diagrammatic technique for displaying the Sub Class and Super Class; Specialization and Generalization; Union or Category; Aggregation etc. 68 CU IDOL SELF LEARNING MATERIAL (SLM)
Union Relationship of one super or sub class with more than one super class. Figure 5.8: Union Representation Owner is the subset of two super class: Vehicle and House. Example: ▪ Set of Library Members is UNION of Faculty, Student, and Staff. A union relationship indicates either type; for example, a library member is either Faculty or Staff or Student. ▪ Below are two examples show how UNION can be depicted in ERD – Vehicle Owner is UNION of PERSON and Company, and RTO Registered Vehicle is UNION of Car and Truck. 69 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 5.9: UNION of Car and Truck and UNION of PERSON and Company. consider an example in above figure 5.9 Vehicle is super-class of CAR and Truck; this is very much the correct example of the subclass as well but here use it differently we are saying RTO Registered vehicle is UNION of Car and Vehicle, they do not inherit any attribute of Vehicle, attributes of car and truck are altogether independent set, where is in sub-classing situation car and truck would be inheriting the attribute of vehicle class. Aggregation: Represents relationship between a whole object and its component. Consider a ternary relationship Works_On between Employee, Branch and Manager. Now the best way to model this situation is to use aggregation, So, the relationship-set, Works_On is a higher-level entity-set. Such an entity-set is treated in the same manner as any other entity-set. We can create a binary relationship, Manager, between Works_On and Manager to represent who manages what tasks. Figure 5.10: Aggregation Representation 5.4 SUB CLASSES, SUPER CLASSES Subclasses A subclass is a class derived from the superclass. It inherits the properties of the superclass and also contains attributes of its own. An example is: 70 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 5.11: Subclass and superclass Representation Car, Truck and Motorcycle are all subclasses of the superclass Vehicle. They all inherit common attributes from vehicle such as speed, colour etc. while they have different attributes also i.e., Number of wheels in Car is 4 while in Motorcycle is 2. Superclasses A superclass is the class from which many subclasses can be created. The subclasses inherit the characteristics of a superclass. The superclass is also known as the parent class or base class. In the above example, Vehicle is the Superclass and its subclasses are Car, Truck and Motorcycle. 5.5 INHERITANCE, SPECIALIZATION AND GENERALIZATION Inheritance: We use all the above features of ER-Model in order to create classes of objects in object- oriented programming. The details of entities are generally hidden from the user; this process known as abstraction. Inheritance is an important feature of Generalization and Specialization. It allows lower-level entities to inherit the attributes of higher-level entities. 71 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 5.12: Inheritance Representation For example, the attributes of a Person class such as name, age, and gender can be inherited by lower-level entities such as Student or Teacher. Generalization As mentioned above, the process of generalizing entities, where the generalized entities contain the properties of all the generalized entities, is called generalization. In generalization, a number of entities are brought together into one generalized entity based on their similar characteristics. For example, pigeon, house sparrow, crow and dove can all be generalized as Birds. Figure 5.13: Generalization Representation Specialization Specialization is the opposite of generalization. In specialization, a group of entities is divided into sub-groups based on their characteristics. Take a group ‘Person’ for example. A person has name, date of birth, gender, etc. These properties are common in all persons, human beings. But in a company, persons can be identified as employee, employer, customer, or vendor, based on what role they play in the company. 72 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 5.14: Specialization Representation Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what role they play in school as entities. 5.6 SUMMARY ▪ Cardinality Ratios and Participation Constraints are taken together are called Structural Constraints ▪ A strong entity set is an entity that contains sufficient attributes to uniquely identify all its entities ▪ A weak entity is an entity set that does not have sufficient attributes for Unique Identification of its records ▪ Union: Relationship of one super or sub class with more than one super class ▪ Aggregation: Represents the relationship between a whole object and its component. ▪ A superclass is the class from which many subclasses can be created. The subclasses inherit the characteristics of a superclass. The superclass is also known as the parent class or base class ▪ Subclasses ▪ A subclass is a class derived from the superclass. It inherits the properties of the superclass and also contains attributes of its own. ▪ Inheritance is an important feature of Generalization and Specialization. It allows lower-level entities to inherit the attributes of higher-level entities. ▪ In generalization, many entities are brought together into one generalized entity based on their similar characteristics. 5.7 KEYWORDS ▪ Union: Relationship of one super or sub class with more than one super class 73 CU IDOL SELF LEARNING MATERIAL (SLM)
▪ Aggregation: Represents the relationship between a whole object and its component. ▪ Subclasses: A subclass is a class derived from the superclass. It inherits the properties of the superclass and also contains attributes of its own ▪ Superclasses: The subclasses inherit the characteristics of a superclass. The superclass is also known as the parent class or base class ▪ Generalization: The process of generalizing entities, where the generalized entities contain the properties of all the generalized entities, is called generalization ▪ Specialization: a group of entities is divided into sub-groups based on their characteristics. ▪ Inheritance: It allows lower-level entities to inherit the attributes of higher-level entities. 5.8 LEARNING ACTIVITY The railway network of our country is one of the most complex public establishments. You can design a database solution for this network and make the management of the same more natural. Your system should have the following pieces of information: ▪ Station names ▪ Tracks that connect those stations (to keep things simple, you can assume that only one track runs between two stations) The train schedules should have information on the stations from where the train starts and by when it reaches the destination. It should also include information on which stations it passes through during its journey. To keep things simple, you can assume that every train completes its journey within a day, and they run daily. However, you’ll also need to store information on the sequence of the stations a train passes through. For example, if a train starts from Delhi and goes to Kolkata through Lucknow, then you’ll need to add the arrival and departure times of the train for all these stations. Keeping the stations in sequence will allow easy management of trains and their data. Till here, the project is rather easy. You can make it more challenging by adding the passenger information of every train such as its coaches, seat numbers, types of coaches, passenger names, and so on. ___________________________________________________________________________ ____________________________________________________________________ 5.9 UNIT END QUESTIONS A. Descriptive Questions Short Questions 74 CU IDOL SELF LEARNING MATERIAL (SLM)
1. Define weak entity? 2. Write short notes on cardinality. 3. What is generalization? 4. Define inheritance? 5. What are superclass and subclass? Long Questions 1. Explain generalization, specialization and aggregation with example. 2. What is cardinality? Explain cardinality with example. 3. Explain the various mapping cardinalities and total participation. 4. Explain in detail about superclass and subclass. 5. Draw and explain union and aggregation. B. Multiple choice Questions 1. An entity set that does not have sufficient attributes to form a primary key is termed a __________. a. Strong entity set b. Variant set c. Weak entity set d. Variable set 2. For a weak entity set to be meaningful, it must be associated with another entity set, called the________________. a. Identifying set b. Owner set c. Neighbour set d. Strong entity set 3. The entity type from which the subgroups can be made is classified as___________. a. Super class b. Subclass c. Qualified class d. Non-qualified class 4. The hierarchy in which each subclass participates in one subclass relationship is classified as a. Specialization hierarchy b. Generalization hierarchy c. Jointness hierarchy d. Disjoint hierarchy 75 CU IDOL SELF LEARNING MATERIAL (SLM)
5. In the process of specialization, the subclasses are attached by a. Lines b. Dotted lines c. Red lines d. Red dotted lines Answer 1. (c) 2. (a) 3. (b) 4. (a) 5. (a) 5.10 REFERENCES Text Books: • T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson Education, New Delhi. • T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi. • T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley Publications, New Delhi. Reference Books: • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill, International Edition. • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications. 76 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 6: DATA MODELS 2 Structure 6.0 Learning objective 6.1 Network Model 6.2 Hierarchical Models 6.3 Summary 6.4 Keywords 6.5 Learning Activity 6.6 Unit End Questions 6.7 References 6.0 LEARNING OBJECTIVE After studying this unit, student will be able to: ▪ Learn about data modeling and their importance. ▪ Illustrate the concept of Network Model and Hierarchical Models. 6.1 NETWORK MODEL Network Model: This is an extension of the Hierarchical model. In this model data is organised more like a graph, and are allowed to have more than one parent node. A network database consists of a collection of records connected to one another through links. A record is in many respects similar to an entity in the E-R model. Each record is a collection of fields (attributes), each of which contains only one data value. A link is an association between precisely two records. Thus, a link can be viewed as a restricted (binary) form of relationship in the sense of the E-R model. Figure 6.1: Sample Database 77 Figure 6.2: ER Diagram CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 6.3: Network Diagram If the relationship depositor is many to many with a descriptive attribute (for example, access date), then the transformation algorithm is similar to the one described. The only difference is that the new record type Rlink now contains the field access date. In this database model data is more related as more relationships are established in this database model. Also, as the data is more related, hence accessing the data is also easier and fast. This database model was used to map many-to-many data relationships. Figure 6.4: Network Model Representation Characteristics of the network model There are many characteristics of the network model, some of these characteristics are mentioned below. 1. The network model is better than a hierarchical model. 2. Supports many to many relationships. 3. Many parents can have many children. 4. Many children can have many parents 5. Entities are represented as a connected network with each other. 6. One child entity can have more than one parent entity. 7. Represented as a network and one child can have more than one parent. This model represents a complex structure. 8. Entities can have multiple parent entities and lead to a complex structure. 9. Not very flexible to reorganize the model. 10. High performance 78 CU IDOL SELF LEARNING MATERIAL (SLM)
11. Relationships among databases are done by programmers by using 3GL programs. 12. Query facility is not available in the network model. The network model has the following major features 1. It can represent redundancy in data more efficiently than that in the hierarchical model. 2. There can be more than one path from a previous node to successor node/s. 3. The operations of the network model are maintained by indexing structure of linked list (circular) where a program maintains a current position and navigates from one record to another by following the relationships in which the record participates. 4. Records can also be located by supplying key values. Advantages of the network model ▪ It is fast data access with a network model. ▪ The network model allows creating more complex and stronger queries as compared to the database with a hierarchical database model. A user can execute a variety of database queries when selecting the network model. Disadvantages of a network model ▪ The network model is a very complex database model, so the user must be very familiar with the overall structure of the database. ▪ Updating inside this database is a quite difficult and boring task. We need the help of the application programs that is being used to navigate the data. 6.2 HIERARCHICAL MODELS Hierarchical Model: This database model organises data into a tree-like-structure, with a single root, to which all the other data is linked. The hierarchy starts from the Root data, and expands like a tree, adding child nodes to the parent nodes. In this model, a child node will only have a single parent node. This model efficiently describes many real-world relationships like index of a book, recipes etc. In hierarchical model, data is organised into tree-like structure with one one-to-many relationship between two different types of data, for example, one department can have many courses, many professors and off-course many students. 79 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 6.5: Hierarchical Model Representation A hierarchical model represents the data in a tree-like structure in which there is a single parent for each record. To maintain order there is a sort field which keeps sibling nodes into a recorded manner. These types of models are designed basically for the early mainframe database management systems, like the Information Management System (IMS) by IBM. This model structure allows the one-to-one and a one-to-many relationship between two/ various types of data. This structure is very helpful in describing many relationships in the real world; table of contents, any nested and sorted information. The hierarchical structure is used as the physical order of records in storage. One can access the records by navigating down through the data structure using pointers which are combined with sequential accessing. Therefore, the hierarchical structure is not suitable for certain database operations when a full path is not also included for each record. Data in this type of database is structured hierarchically and is typically developed as an inverted tree. The \"root\" in the structure is a single table in the database and other tables act as the branches flowing from the root. The diagram below shows a typical hierarchical database structure. 80 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 6.6: Hierarchical database structure In the above diagram, an agent books several entertainers, and each entertainer, in return has his/her own schedule. It is the duty of an agent to maintain several clients whose entertainment needs are to be met. A client books engagement through the agent and makes payments to the agent for his services. A relationship in this database model is represented by the term parent/child. A parent table can be linked with one or more child tables in this type of relationship, but a single child table can be linked with only one parent table. The tables are explicitly linked via a pointer/index or by the physical arrangement of the records within the tables. A user can access the data by starting at the root table and working down through the tree to the target data. the user must be familiar with the structure of the database to access the data without any complexity. Hierarchical model consists of following: 1. It contains nodes which are connected by branches. 2. Topmost node is called root node. 3. If there are multiple nodes appear at top level, then these can be called as root segments. 4. Each node has exactly one parent. 5. One parent may have many children. Limitations of the hierarchical model: 1. Hierarchical model is Complex. 2. One parent per child is allowed in hierarchical model. 81 CU IDOL SELF LEARNING MATERIAL (SLM)
3. Data must be organized in a hierarchical fashion and it is done without compromising the information. 4. There is a Lack of structural independence in hierarchical model. 5. Navigation system is complex in in hierarchical model. 6. In Hierarchical model, Data is independent. 7. Hierarchical model does not support Many too many relationships. Table 6.1: Difference between two models Difference Hierarchical Data Model Network Data Model Basic There is a parent to child type Pointers or links are used to M:M Relationship between records. show a Relationship between Relationship Hierarchical Database Model records. Data Inconsistency does not support M:M relationships. Network Database Model Traversing Possible during the data supports M:M relationships. updation and deletion. Structure No Data inconsistency. Traversing of data is complex. Node can be accessed from Hierarchical Database Model parent to child and similarly supports tree like structure. from child to parent. This makes the Data traversing very easy. Network Database Model supports the graph like structure. 6.3 SUMMARY ▪ The network model was created to represent complex data relationships more effectively than the hierarchical model, to improve database performance, and to impose a database standard. In the network model, the user perceives the network database as a collection of records in 1:M relationships. However, unlike the hierarchical model, the network model allows a record to have more than one parent. ▪ A hierarchical model represents the data in a tree-like structure in which there is a single parent for each record. To maintain order there is a sort field which keeps sibling nodes into a recorded manner. These types of models are designed basically for the early mainframe database management systems, like the Information Management System (IMS) by IBM. 82 CU IDOL SELF LEARNING MATERIAL (SLM)
6.4 KEYWORDS • Network Model: Pointers or links are used to show a Relationship between records. • Hierarchical Model: There is a parent to child type Relationship between records. 6.5 LEARNING ACTIVITY Colleges have multiple departments where every department offers many courses. These departments have a head (HOD) and various instructors. Even though there are many instructors, one instructor can only work in one department. As you can see the organisational structure of a college is quite complicated and requires a lot of effort to manage. you can build a solution to tackle this problem. It would store all this information about the college and its departments. However, the information we’ve discussed above isn’t sufficient for a college. We need to mention the courses as well. The system should allow easy access. Your developed DBMS-based solution would allow a college to save a lot of time and resources; moreover, the user could see all the college information from one place and modify it accordingly. ___________________________________________________________________________ ____________________________________________________________________ 6.6 UNIT END QUESTIONS A. Descriptive Questions Short Questions 1. Define network model? 2. Write short notes on the hierarchical model. 3. List the advantage of the Network model. 4. What is the characteristic of the hierarchical model? 5. What is the limitation of the network model? Long Questions 1. Briefly explain the types of data models. 2. Explain Hierarchical model with an example? 3. Explain the merit and demerits of hierarchical model. 4. Draw a neat diagram of a network model and explain it. 5. Explain the merit and demerits of Relational model. B. Multiple choice Questions 83 CU IDOL SELF LEARNING MATERIAL (SLM)
1. Which data model creates parent-child relationships between data elements and enables each child to have more than one parent? a. Network b. Hierarchical c. Relational d. Object 2. Which data model creates parent-child relationships between data elements and restricts each child to have just one parent? a. Relational b. Hierarchical c. Network d. Object 3. The older logical database model that organizes data in a tree-like structure is? a. Network b. Relational c. Hierarchical d. Object 4. The hierarchical database is very efficient when? a. Handling little amounts of transactions b. Handling many transactions c. Handling large amounts of data d. both a and c 5. Which of the following represents a collection of concepts that are used to describe the structure of a database? a. Data model b. Data structure c. Data type d. Data warehouse Answer 1. (a) 2. (b) 3. (c) 4. (d) 5. (a) 6.7 REFERENCES Text Books: • T1 R. Elmasri and S.B. Navathe, Fundamentals of Database Systems, Pearson Education, New Delhi. • T2 C.J. Date, An Introduction to Database Systems Pearson Education, New Delhi. 84 CU IDOL SELF LEARNING MATERIAL (SLM)
• T3 Data, C. and Darwen, H, Reading, A Guide to the SQL Standard, Addison-Wesley Publications, New Delhi. Reference Books: • R1 A. Silberschatz, H.F. Korth and S. Sudarshan, Database System Concepts, McGraw-Hill, International Edition. • R2 Ivan Bayross, SQL / PL/SQL, BPB Publications. 85 CU IDOL SELF LEARNING MATERIAL (SLM)
UNIT 7: FILE ORGANIZATION 1 Structure 7.0 Learning objective 7.1 Indexed Sequential Access Files 7.2 Implementation Using B & B+ Trees 7.3 Summary 7.4 Keywords 7.5 Learning Activity 7.6 Unit End Questions 7.7 References 7.0 LEARNING OBJECTIVES After studying this unit, student will be able to ▪ Discuss to Organize files using different methods like Indexed Sequential Access Files ▪ Explain that by using file organization, the records should be read/retrieved/accessed as fast as possible. ▪ Evaluate that any user can easily and quickly perform the operations such as insert, update, and delete on the records present in the database. 7.1 INDEXED SEQUENTIAL ACCESS FILES ISAM method is an advanced sequential file organization. In this method, records are stored in the file using the primary key. An index value is generated for each primary key and mapped with the record. This index contains the address of the record in the file. If any record has to be retrieved based on its index value, then the address of the data block is fetched, and the record is retrieved from the memory. ▪ Indexed sequential access file combines both sequential file and direct access file organization. ▪ In indexed sequential access file, records are stored randomly on a direct access device such as magnetic disk by a primary key. ▪ This file has multiple keys. These keys can be alphanumeric in which the records are ordered is called primary key. ▪ The data can be access either sequentially or randomly using the index. The index is stored in a file and read into memory when the file is opened. 86 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 7.1: Indexed Sequential Access Files Characteristic of ISAF: 1. Indexed sequential access file combines both sequential file and direct access file organization. 2. In an indexed sequential access file, records are stored randomly on a direct access device such as a magnetic disk by a primary key. 3. This file has multiple keys. These keys can be alphanumeric in which the records are ordered is called the primary key. 4. The data can be access either sequentially or randomly using the index. The index is stored in a file and read into memory when the file is opened. Advantage: 1. In this method, each record has the address of its data block, searching a record is a huge database is quick and easy. 2. This method supports range retrieval and partial retrieval of records. Since the index is based on the primary key values, we can retrieve the data for the given range of value. In the same way, the partial value can also be easily searched, i.e., the student name starting with 'JA' can be easily searched. Disadvantage: 1. This method requires extra space in the disk to store the index value. 2. When the new records are inserted, then these files have to be reconstructed to maintain the sequence. 3. When the record is deleted, then the space used by it needs to be released. Otherwise, the performance of the database will slow down. 87 CU IDOL SELF LEARNING MATERIAL (SLM)
7.2 IMPLEMENTATION USING B & B+ TREES B Tree: B Tree is a specialized m-way tree that can be widely used for disk access. A B-Tree of order m can have at most m-1 keys and m children. One of the main reasons of using B tree is its capability to store large number of keys in a single node and large key values by keeping the height of the tree relatively small. A B tree of order m contains all the properties of an M way tree. In addition, it contains the following properties. ▪ Every node in a B-Tree contains at most m children. ▪ Every node in a B-Tree except the root node and the leaf node contain at least m/2 children. ▪ The root nodes must have at least 2 nodes. ▪ All leaf nodes must be at the same level. ▪ It is not necessary that, all the nodes contain the same number of children but, each node must have m/2 number of nodes. Figure 7.2: B tree of order 4 While performing some operations on B Tree, any property of B Tree may violate such as number of minimum children a node can have. To maintain the properties of B Tree, the tree may split or join. Operations Searching: Searching in B Trees is similar to that in Binary search tree. For example, if we search for an item 49 in the following B Tree. The process will something like following: 1. Compare item 49 with root node 78. since 49 < 78 hence, move to its left sub-tree. 2. Since, 40<49<56, traverse right sub-tree of 40. 3. 49>45, move to right. Compare 49. 4. match found, return. Searching in a B tree depends upon the height of the tree. The search algorithm takes O (log n) time to search any element in a B tree. 88 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 7.3: Searching Operation Inserting Insertions are done at the leaf node level. The following algorithm needs to be followed in order to insert an item into B Tree. 1. Traverse the B Tree in order to find the appropriate leaf node at which the node can be inserted. 2. If the leaf node contain less than m-1 keys then insert the element in the increasing order. 3. Else, if the leaf node contains m-1 keys, then follow the following steps. a. Insert the new element in the increasing order of elements. b. Split the node into the two nodes at the median. c. Push the median element up to its parent node. d. If the parent node also contains m-1 number of keys, then split it too by following the same steps. Example: Insert the node 8 into the B Tree of order 5 shown in the following image. Figure 7.4: B Tree Creation Figure 7.5: After Insertion The node, now contain 5 keys which is greater than (5 -1 = 4) keys. Therefore, split the node from the median i.e., 8 and push it up to its parent node shown as follows. 89 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 7.6: Split the Key Deletion: Deletion is also performed at the leaf nodes. The node which is to be deleted can either be a leaf node or an internal node. Following algorithm needs to be followed in order to delete a node from a B tree. 1. Locate the leaf node. 2. If there are more than m/2 keys in the leaf node then delete the desired key from the node. 3. If the leaf node doesn't contain m/2 keys then complete the keys by taking the element from eight or left sibling. a. If the left sibling contains more than m/2 elements then push its largest element up to its parent and move the intervening element down to the node where the key is deleted. b. If the right sibling contains more than m/2 elements then push its smallest element up to the parent and move intervening element down to the node where the key is deleted. 4. If neither of the sibling contain more than m/2 elements then create a new leaf node by joining two leaf nodes and the intervening element of the parent node. 5. If parent is left with less than m/2 nodes then, apply the above process on the parent too. If the node which is to be deleted is an internal node, then replace the node with its in-order successor or predecessor. Since, successor or predecessor will always be on the leaf node hence, the process will be similar as the node is being deleted from the leaf node. Example: Delete the node 53 from the B Tree of order 5 shown in the following figure. Figure 7.7: Before Deletion 53 is present in the right child of element 49. Delete it. 90 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 7.8: Delete the Node 53 Now, 57 is the only element which is left in the node, the minimum number of elements that must be present in a B tree of order 5, is 2. it is less than that, the elements in its left and right sub-tree are also not sufficient therefore, merge it with the left sibling and intervening element of parent i.e., 49. Figure 7.9: Final B Tree After Deletion Application of B tree: ▪ B tree is used to index the data and provides fast access to the actual data stored on the disks since, the access to value stored in a large database that is stored on a disk is a very time consuming process. ▪ Searching an un-indexed and unsorted database containing n key values needs O(n) running time in worst case. However, if we use B Tree to index this database, it will be searched in O (log n) time in worst case. B+ Tree: B+ tree has one root, any number of intermediary nodes and a leaf node. Here all leaf nodes will have the actual records stored. Intermediary nodes will have only pointers to the leaf nodes; it not has any data. Any node will have only two leaves. This is the basis of any B+ tree. Consider the STUDENT table below. This can be stored in a B+ tree structure as shown below. We can observe here that it divides the records into two and splits into the left node and right node. The left node will have all the values less than or equal to the root node and the right node will have values greater than the root node. The intermediary nodes at level 2 will have only the pointers to the leaf nodes. The values shown in the intermediary nodes are only the pointers to the next level. All the leaf nodes will have the actual records in sorted order. 91 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 7.10: Representation of B+ Tree If we have to search for any record, they are all found at the leaf node. Hence searching any record will take the same time because of the equidistance of the leaf nodes. Also, they are all sorted. Hence searching a record is like a sequential search and does not take much time. Suppose a B+ tree has an order of n (it is the number of branches – above tree structure has 5 branches altogether, hence order is 5), and then it can have n/2 to an intermediary nodes and n/2 to n-1 leaf nodes. In our example above, n= 5 i.e.; it has 5 branches from the root. Then it can have intermediary nodes ranging from 3 to 5. And it can have leaf nodes from 3 to 4. Figure 7.11: B+ Tree Node Representation The main goal of the B+ tree is: Sorted Intermediary and leaf nodes: Since it is a balanced tree, all nodes should be sorted. Fast traversal and Quick Search: One should be able to traverse through the nodes very fast. That means, if we have to search for any particular record, we should be able to pass through the intermediary node very easily. This is achieved by sorting the pointers at intermediary nodes and the records in the leaf nodes. 92 CU IDOL SELF LEARNING MATERIAL (SLM)
Any record should be fetched very quickly. This is made by maintaining the balance in the tree and keeping all the nodes at the same distance. No overflow pages: The B+ tree allows all the intermediary and leaf nodes to be partially filled, it will have some percentage defined while designing a B+ tree. This percentage up to which nodes are filled is called the fill factor. If a node reaches the fill factor limit, then it is called an overflow page. If a node is too empty then it is called underflow. In our example above, an intermediary node with 108 is an underflow. And leaf nodes are not partially filled, hence it is an overflow. In the ideal B+ tree, it should not have overflow or underflow except the root node. Searching a record in B+ Tree Suppose we want to search 65 in the below B+ tree structure. First, we will fetch for the intermediary node which will direct to the leaf node that can contain a record for 65. So we find branch between 50 and 75 nodes in the intermediary node. Then we will be redirected to the third leaf node at the end. Here DBMS will perform a sequential search to find 65. Suppose, instead of 65, we have to search for 60. What will happen in this case? We will not be able to find in the leaf node. No insertions/update/delete is allowed during the search in B+ tree. Figure 7.12: B+ Tree Search Insertion in B+ tree Suppose we have to insert a record 60 in the below structure. It will go to 3rd leaf node after 55. Since it is a balanced tree and that leaf node is already full, we cannot insert the record there. But it should be inserted there without affecting the fill factor, balance and order. So the only option here is to split the leaf node. But how do we split the nodes? 93 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 7.13: B+ Tree Before Insertion The 3rd leaf node should have values (50, 55, 60, 65, 70) and its current root node is 50. We will split the leaf node in the middle so that its balance is not altered. So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes. If these two has to be leaf nodes, the intermediary node cannot branch from 50. It should have 60 added to it and then we can have pointers to new leaf node. Figure 7.14: B+ Tree After Insertion This is how we insert a new entry when there is an overflow. In a normal scenario, it is simple to find the node where it fits and place it in that leaf node. Delete in B+ tree Suppose we have to delete 60 from the above example. What will happen in this case? We have to remove 60 from the 4th leaf node as well as from the intermediary node too. If we remove it from the intermediary node, the tree will not satisfy B+ tree rules. So we need to modify it to have a balanced tree. After deleting 60 from the above B+ tree and re-arranging nodes, it will appear as below. Figure 7.15: B+ Tree Before Insertion Suppose we have to delete 15 from the above tree. We will traverse to the 1st leaf node and simply delete 15 from that node. There is no need for any re-arrangement as the tree is balanced and 15 do not appear in the intermediary node. 94 CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 7.16: B+ Tree After Insertion B+ Tree Extensions As the number of records grows in the database, the intermediary and leaf nodes need to be split and spread widely to keep the balance of the tree. These are called B+ tree extensions. As it spreads out widely, the searching of records becomes faster. The main goal of creating a B+ tree is the faster traversal of records. As the branches spread out, it requires less I/O on disk to get the record. Record that needs to be fetched are fetched in logarithmic fraction of time. Suppose we have K search key values – that is the pointers in the intermediary node for n nodes. Then we can fetch any record in the b+ tree in log (n/2) (K). Suppose each node takes 40bytes to store an index and each disk block is of 40Kbytes. That means we can have 100 nodes (n). Say we have 1million search key values – that means we have 1 million intermediary pointers. Then we can access log 50 (1000000) = 4 nodes are accessed in one go. Hence this costs only 4milliseconds to fetch any node in the tree. Now we can guess the advantage of extending the B+ tree into more intermediary nodes. As intermediary nodes spread out more and more, it is more efficient in fetching the records in B+ tree. Figure 7.17: B+ Tree Extensions B+ Tree index files The above concept of the B+ tree is used to store the records in the secondary memory. If the records are stored using this concept, then those files are called B+ tree index files. Since this tree is balanced and sorted, all the nodes will be at the same distance and only the leaf node has the actual value, which makes searching for any record easy and quick in B+ tree index files. Even insertion/deletion in the B+ tree does not take much time. Hence B+ tree forms an efficient method to store the records. 95 CU IDOL SELF LEARNING MATERIAL (SLM)
Searching, inserting and deleting a record is done in the same way we have seen above. Since it is a balanced tree, it searches for the position of the records in the file, and then it fetches/inserts /deletes the records. In case it finds that the tree will be unbalanced because of insert/delete/update, it does the proper re-arrangement of nodes so that definition of the B+ tree is not changed. Below is a simple example of how student details are stored in B+ tree index files. Figure 7.18: B+ Tree Index File Suppose we have a new student, Bryan. Where will he fit in the file? He will fit in the 1st leaf node. Since this leaf node is not full, we can easily add him in the node. Figure 7.19: B+ Tree Index File Insertion But what happens if we want to insert another student Ben into this file? Some re- arrangement to the nodes is needed to maintain the balance of the file. Figure 7.20: B+ Tree Index Re-Arrangement The same thing happens when we perform delete too. Benefits of B+ Tree index files ▪ As the file grows in the database, the performance remains the same. It does not degrade like in ISAM. This is because all the records are maintained at the leaf node and all the 96 CU IDOL SELF LEARNING MATERIAL (SLM)
nodes are at equidistance from the root. In addition, if there is any overflow, it automatically re-organizes the structure. ▪ Even though insertion and deletion are little complicated, it can be done in fraction of seconds. ▪ Leaf node allows only partial/ half filled, since records are larger than pointers. B Tree Index Files B tree index file is similar to B+ tree index files, but it uses binary search concepts. In this method, each root will branch to only two nodes and each intermediary node will also have the data. And leaf node will have the lowest level of data. However, in this method also, records will be sorted. Since all intermediary nodes also have records, it reduces the traversing till leaf node for the data. A simple B tree can be represented as below: Figure 7.21: B Tree Representation See the difference between this tree structure and B+ tree for the same example above. Here there is no repetition or pointers till leaf node. All the records are stored in all the nodes. If we need to insert any record, it will be done as B+ tree index files, but it will make sure that each node will branch only to two nodes. If there is not enough space in any of the node, it will split the node and store the records. Example of Simple Insert Figure 7.22: B Tree Insertion 97 Example of splitting the nodes while inserting CU IDOL SELF LEARNING MATERIAL (SLM)
Figure 7.23: B Tree Splitting after the insertion Table 7.1: B+ Tree vs. B Tree B Tree B + Tree Search keys can be repeated. Search keys cannot be redundant. Data is only saved on the leaf nodes. Both leaf nodes and internal nodes can store data Data stored on the leaf node makes the search Searching is slow due to data stored on Leaf more accurate and faster. and internal nodes. Deletion is not difficult as an element is only Deletion of elements is a complicated and removed from a leaf node. time-consuming process. Linked leaf nodes make the search efficient and You cannot link leaf nodes. quick. 7.3 SUMMARY ▪ A file is a collection of correlated information which is recorded on secondary or non- volatile storage like magnetic disks, optical disks, and tapes. ▪ Files are stored on disk or other storage and do not disappear when a user logs off. ▪ A File Structure needs to be predefined format in such a way that an operating system understands it. ▪ File type refers to the ability of the operating system to differentiate different types of files like text files, binary, and source files. ▪ Create find space on disk and make an entry in the directory. ▪ Indexed Sequential Access method is based on simple sequential access ▪ In the Sequential Access method records are accessed in a certain pre-defined sequence ▪ B Tree is a self-balancing data structure for better search, insertion, and deletion of data from the disk. 98 CU IDOL SELF LEARNING MATERIAL (SLM)
▪ B Tree is regulated by the degree specified ▪ B Tree keys and nodes are arranged in ascending order. ▪ The search operation of B Tree is the simplest one, which always starts from the root and starts checking if the target key is greater or lesser than the node value. ▪ The insert operation of B Tree is rather detailed, which first finds an appropriate position of insertion for the target key, inserts it, evaluates the validity of B Tree against different cases, and then restructure the B Tree nodes accordingly. ▪ The delete operation of B Tree first searches for the target key to be deleted, deletes it, evaluates the validity based on several cases like minimum and maximum keys of the target node, siblings, and parent. ▪ We can easily retrieve complete data or partial data because going through the linked tree structure makes it efficient. ▪ The B+ tree structure grows and shrinks with an increase/decrease in the number of stored records. ▪ Storage of data on the leaf nodes and subsequent branching of internal nodes shortens the tree height, which reduces the disk input and output operations, ultimately consuming much less space on the storage devices. 7.4 KEYWORDS ▪ ISAM: Indexed Sequential Access Files method is an advanced sequential file organization. ▪ B Tree is a self-balancing data structure for better search, insertion, and deletion of data from the disk. ▪ Search Operation: Search Operation always starts from the root and starts checking if the target key is greater or lesser than the node value. ▪ Insert Operation: insert operation finds an appropriate position of insertion for the target key, inserts it, evaluates the validity of the B Tree against different cases, and then restructure the B Tree nodes accordingly. ▪ Delete Operation: Delete operation of B Tree first searches for the target key to be deleted, deletes it, evaluates the validity based on several cases like minimum and maximum keys of the target node, siblings, and parent. ▪ B+ Tree is a self-balancing data structure for executing accurate and faster searching, inserting and deleting procedures on data 7.5 LEARNING ACTIVITY Colleges have multiple departments where every department offers many courses. These departments have a head (HOD) and various instructors. Even though there are many 99 CU IDOL SELF LEARNING MATERIAL (SLM)
instructors, one instructor can only work in one department. you know the organisational structure of a college is quite complicated and requires a lot of effort to manage. A course can have only one instructor, but an instructor can have multiple classes. You’d need to add this information to the database system as well. You can make this project more advanced by adding the course enrollment information. The system should allow easy access. Your developed DBMS-based solution would allow a college to save a lot of time and resources; moreover, the user could see all the college information from one place and modify it accordingly. ___________________________________________________________________________ ____________________________________________________________________ 7.6 UNIT END QUESTIONS A. Descriptive Questions. Short Questions 1. Define B Tree? 2. What is a B+ Tree? 3. What is an indexed sequential access file? 4. List out pros and cons of B tree. 5. What are the merits and demerits of the B+ tree? Long Questions 1. Draw and explain a B tree. 2. Explain in detail B tree and B+ tree. 3. What is ISAM and explain it? 4. With a neat sketch of a B+ tree, explain any two operations. 5. Define and explain the following terms of the B tree. a. Insertion b. Deletion B. Multiple choice Questions 1. Which of the following is true? a. B + tree allows only the rapid random access b. B + tree allows only the rapid sequential access c. B + tree allows rapid random access as well as rapid sequential access d. B + tree allows rapid random access and slower sequential access 100 CU IDOL SELF LEARNING MATERIAL (SLM)
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210