Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore MCA642 MCA-SEM-IV-Advanced Data Base management system-26.10.2020-converted

MCA642 MCA-SEM-IV-Advanced Data Base management system-26.10.2020-converted

Published by Teamlease Edtech Ltd (Amita Chitroda), 2021-04-20 17:00:58

Description: MCA642 MCA-SEM-IV-Advanced Data Base management system-26.10.2020-converted

Search

Read the Text Version

MASTER OF COMPUTER APPLICATIONS SEMESTER-IV ADVANCED DATABASE MANAGEMENT SYSTEM MCA642 1 CU IDOL SELF LEARNING MATERIAL (SLM)

CHANDIGARH UNIVERSITY Institute of Distance and Online Learning Course Development Committee Prof. (Dr.) R.S.Bawa Pro Chancellor, Chandigarh University, Gharuan, Punjab Advisors Prof. (Dr.) Bharat Bhushan, Director – IGNOU Prof. (Dr.) Majulika Srivastava, Director – CIQA, IGNOU Programme Coordinators & Editing Team Master of Business Administration (MBA) Bachelor of Business Administration (BBA) Coordinator – Dr. Rupali Arora Coordinator – Dr. Simran Jewandah Master of Computer Applications (MCA) Bachelor of Computer Applications (BCA) Coordinator – Dr. Raju Kumar Coordinator – Dr. Manisha Malhotra Master of Commerce (M.Com.) Bachelor of Commerce (B.Com.) Coordinator – Dr. Aman Jindal Coordinator – Dr. Minakshi Garg Master of Arts (Psychology) Bachelor of Science (Travel &Tourism Management) Coordinator – Dr. Samerjeet Kaur Coordinator – Dr. Shikha Sharma Master of Arts (English) Bachelor of Arts (General) Coordinator – Dr. Ashita Chadha Coordinator – Ms. Neeraj Gohlan Academic and Administrative Management Prof. (Dr.) R. M. Bhagat Prof. (Dr.) S.S. Sehgal Executive Director – Sciences Registrar Prof. (Dr.) Manaswini Acharya Prof. (Dr.) Gurpreet Singh Executive Director – Liberal Arts Director – IDOL © No part of this publication should be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording and/or otherwise without the prior written permission of the authors and the publisher. SLM SPECIALLY PREPARED FOR CU IDOL STUDENTS Printed and Published by: TeamLease Edtech Limited www.teamleaseedtech.com CONTACT NO:- 01133002345 For: CHANDIGARH UNIVERSITY 3 Institute of Distance and Online Learning CU IDOL SELF LEARNING MATERIAL (SLM)

First Published in 2020 All rights reserved. No Part of this book may be reproduced or transmitted, in any form or by any means, without permission in writing from Chandigarh University. Any person who does any unauthorized act in relation to this book may be liable to criminal prosecution and civil claims for damages. This book is meant for educational and learning purpose. The authors of the book has/have taken all reasonable care to ensure that the contents of the book do not violate any existing copyright or other intellectual property rights of any person in any manner whatsoever. In the even the Authors has/ have been unable to track any source and if any copyright has been inadvertently infringed, please notify the publisher in writing for corrective action. 3 CU IDOL SELF LEARNING MATERIAL (SLM)

CONTENT UNIT 1: DBMS Development Concept-1 .............................................................................4 UNIT 2: DBMS Development Concept-2 ...........................................................................23 UNIT 3: Distributed Database Management System 1 .....................................................42 UNIT 4: Distributed Database Management System -2....................................................67 UNIT 5: Object Oriented Databases...................................................................................79 UNIT 6: Enhanced Data Models for Advanced Applications 1......................................104 UNIT 7: Enhanced Data Models for Advanced Applications 2......................................136 UNIT 8: Database Security and Authorization.................................................................155 UNIT 9: Access Control -1 ..................................................................................................174 UNIT 10: Access Control - 2 ...............................................................................................194 3 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 1: DBMS DEVELOPMENT CONCEPT-1 4 Structure Learning Objectives Introduction Paradigm Shift from File System to DBMS DBMS 3-tier Architecture Phases of Database Design Relational Database Management System Introduction to DBLC Database development lifecycle Degree of Data Relationship of higher degree Abstraction Data abstraction Three levels of abstraction Summary Key Words/Abbreviations Learning Activity Unit End Questions (MCQ and Descriptive) References LEARNING OBJECTIVES After studying this unit, you will be able to: • Explain Introduction to Database • State Development of Database CU IDOL SELF LEARNING MATERIAL (SLM)

• List of Introduction to Database Development Life Cycle • Discuss Database features like Data Abstraction INTRODUCTION Database is a collection of inter-related data which helps in efficient retrieval, insertion and deletion of data from database and organizes the data in the form of tables, views, schemas, reports etc. For Example, university database organizes the data about students, faculty, and admin staff etc. which helps in efficient retrieval, insertion and deletion of data from it. A database management system (DBMS) is a software interface that sits between a database and a user, or between a database and an application (such as an inventory system). Different DBMSs support different database models and different • DDLs [Data Definition Language] • DMLs [Data Manipulation Language] A typical DBMS creates, manipulates, modifies, and maintains a database. DDL is short name of Data Definition Language, which deals with database schemas and descriptions, of how the data should reside in the database. CREATE: to create a database and its objects like (table, index, views, store procedure, function, and triggers) ALTER: alters the structure of the existing database DROP: delete objects from the database TRUNCATE: remove all records from a table, including all spaces allocated for the records are removed COMMENT: add comments to the data dictionary RENAME: rename an object DML is short name of Data Manipulation Language which deals with data manipulation and includes most common SQL statements such SELECT, INSERT, UPDATE, DELETE, etc., and it is used to store, modify, retrieve, delete and update data in a database. SELECT: retrieve data from a database 5 CU IDOL SELF LEARNING MATERIAL (SLM)

INSERT: insert data into a table UPDATE: updates existing data within a table DELETE: Delete all records from a database table MERGE: UPSERT operation (insert or update) CALL: call a PL/SQL or Java subprogram EXPLAIN PLAN: interpretation of the data access path LOCK TABLE: concurrency Control It’s a software which is used to manage database is called Database Management System (DBMS). For Example, MySQL, Oracle etc. are popular commercial DBMS used in different applications. DBMS allows users the following tasks: Data Definition: It helps in creation, modification and removal of definitions that define the organization of data in database. Data Updation: It helps in insertion, modification and deletion of the actual data in the database. Data Retrieval: It helps in retrieval of data from the database which can be used by applications for various purposes. User Administration: It helps in registering and monitoring users, enforcing data security, monitoring performance, maintaining data integrity, dealing with concurrency control and recovering information corrupted by unexpected failure. PARADIGM SHIFT FROM FILE SYSTEM TO DBMS File System manages data using files in hard disk. Users are allowed to create, delete, and update the files according to their requirement. Let us consider the example of file based University Management System. Data of students is available to their respective Departments, Academics Section, Result Section, Accounts Section, Hostel Office etc. Some of the data is common for all sections like Roll No, Name, Father Name, Address and Phone number of students but some data is available to a particular section only like Hostel allotment number which is a part of hostel office. Let us discuss the issues with this system: Redundancy of data: Data is said to be redundant if same data is copied at many places. If a student wants to change Phone number, he has to get it updated at various sections. Similarly, old records must be deleted from all sections representing that student. 6 CU IDOL SELF LEARNING MATERIAL (SLM)

Inconsistency of Data: Data is said to be inconsistent if multiple copies of same data does not match with each other. If Phone number is different in Accounts Section and Academics Section, it will be inconsistent. Inconsistency may be because of typing errors or not updating all copies of same data. Difficult Data Access: A user should know the exact location of file to access data, so the process is very cumbersome and tedious. If user wants to search student hostel allotment number of a student from 10000 unsorted students’ records, how difficult it can be. Unauthorized Access: File System may lead to unauthorized access to data. If a student gets access to file having his marks, he can change it in unauthorized way. No Concurrent Access: The access of same data by multiple users at same time is known as concurrency. File system does not allow concurrency as data can be accessed by only one user at a time. No Backup and Recovery: File system does not incorporate any backup and recovery of data if a file is lost or corrupted. 1.2.1 DBMS 3-tier Architecture DBMS 3-tier architecture divides the complete system into three inter-related but independent modules as shown below: Figure 1.1 DBMS 3-tier Architecture Physical Level: At the physical level, the information about the location of database objects in the data store is kept. Various users of DBMS are unaware of the locations of these objects. In simple terms, physical level of a database describes how the data is being stored in secondary storage devices like disks and tapes and also gives insights on additional storage details. 7 CU IDOL SELF LEARNING MATERIAL (SLM)

Conceptual Level: At conceptual level, data is represented in the form of various database tables. For Example, STUDENT database may contain STUDENT and COURSE tables which will be visible to users but users are unaware of their storage. Also referred as logical schema, it describes what kind of data is to be stored in the database. External Level: An external level specifies a view of the data in terms of conceptual level tables. Each external level view is used to cater to the needs of a particular category of users. For Example, FACULTY of a university is interested in looking course details of students, STUDENTS are interested in looking at all details related to academics, accounts, courses and hostel details as well. So, different views can be generated for different users. The main focus of external level is data abstraction. PHASES OF DATABASE DESIGN Database designing for a real-world application starts from capturing the requirements to physical implementation using DBMS software which consists of following steps shown below: 8 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 1.2 Phase of Database design Conceptual Design: The requirements of database are captured using high level conceptual data model. For Example, the ER model is used for the conceptual design of the database. Logical Design: Logical Design represents data in the form of relational model. ER diagram produced in the conceptual design phase is used to convert the data into the Relational Model. Physical Design: In physical design, data in relational model is implemented using commercial DBMS like Orac, DB2. RELATIONAL DATABASE MANAGEMENT SYSTEM A relational database management system (RDBMS) supports the relational model. It performs the same basic functions as those listed for a DBMS. SQL is the data definition and data manipulation language that is packaged with an RDBMS. 9 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 1.3 RDBMS AN INTRODUCTION TO DBLC The database too must be subject to constant evaluation, which is known as database life cycle (DBLC). Database manages data and information. Here we distinguish raw data from processed data information. Database is a repository of business data (transactions data and reference data), which is the result of a carefully designed database constructs. We design the database as a part of the information system design. Usually, the information system design addresses the aspect of data collection, validation and storage of the data. The information system then retrieves the data for processing and for the transformation of data into useful information. Database development is a process of designing a database and implementation. The primary objective of database design is to create complete, normalized, 10 CU IDOL SELF LEARNING MATERIAL (SLM)

non -redundant, and fully integrated conceptual, logical, and physical database models. Database design is a process. It consists of several stages: database study, design, implementation, testing, evaluation, maintenance, and evolution. 1.5.1 Database development lifecycle In this section we propose a new lifecycle model that is adapted from the traditional lifecycles and which is enhanced for database system development. This new model satisfies properties such as scope restriction, progressive enhancement, incremental planning and pre- defined structure. In most traditional life cycles, the first phase is the project planning phase. Although it is a good idea to plan the project from its inception it is also true that, unless the problem, its requirements, and its constraints are well understood it is very difficult to lay out a realistic timeline for the entire project. For this reason, we propose that this initial phase be limited to planning, not about the entire project, but about the collection of requirements definition and information about the organization. In other words, we need a plan on how we are going to proceed to identify the problem as a whole, its scope, constraints, and overall functionality. The resulting document is generally the project plan document. The next phase of this model, the requirement definition and organizational data collection phase, should have as its ultimate goal to provide a complete set of requirements, from the user point of view, for the database system under consideration. This phase, by its very nature, requires a high degree of interaction with people at all levels of the organization, from top management to the entry level clerical workers. Essential activities of this phase are: direct examination of the organizational documents as well as their dataflow through the organization and the overall operation of the latter. Additional information can be collected by means of interviews, questionnaires, and in situ inspection of personnel activities at all organizational levels. This phase should also produce a preliminary document of the present needs and future expansion as currently perceived by all users. Figure shows the deliverables for this phase, namely, the software requirement specification (SRS) and the requirements traceability matrix (RTM). These deliverables serve as the input to the next phase, the requirement analysis phase. High-level Requirement Software requirements requirements specification (SRS) Requirement 11 s traceability CU IDOL SELF LEARNING MATERIAL (SLM) matrix

Figure.1.4. Requirements definition phase After the previous phase has been completed it is necessary to analyze the data to consider issues of extreme importance such as feasibility, cost, scope and boundaries, performance issues, security issues, portability requirements, maintenance and the design model of the expected system. This analysis of the requirements and organizational data helps to identify potential problems and constraints that could arise during development phases. Once the aforementioned requirements and issues have been thoroughly analyzed it is necessary to envision a timeline for future work. During this timeline planning phase it is necessary to update the project plan document initially created and thus addressing the issue of incremental planning. As was indicated early incremental planning is missing in some of the traditional lifecycle models. It is the opinion of the authors that incremental planning is an essential property which needs to be satisfied throughout the entire lifecycle as indicated in Figure. The next two phases of this proposed model comprise the database design phase and the application design phase. The former of these two phases consists of the creating a conceptual design, selecting a database model, and producing a logical and physical design of the system as shown in Figure. The database design phase requires understanding of both Software Design Conceptual requirements design specification document (SRS) (CDD) Requirements Logical design traceability document matrix (LDD) Problems and 12 Constraints CU IDOL SELF LEARNING MATERIAL (SLM) Physical design document Updated requirements

Figure.1.5. Deliverable for the design phase the operational and business requirements of the organization. The purpose of the conceptual design step of the design phase is to create a high-level overview of the database using, for example, an entity-relationship model [Vanslyke, 2009]. The next step is to choose a database model suitable for the system in consideration [Rob and Coronel, 1997]. The conceptual design then needs to be converted into a logical design. To achieve this conversion the logical design uses as its input the conceptual design document (CDD) as shown in Figure. The logical design serves as a communication tool that describes the logical functioning and the system structure to the users [Dave, 2010]. The logical design provides a more detailed view of the database than that of the conceptual design. The last step in the database design phase is to convert the logical design into a physical design. The deliverable resulting from this last conversion is the physical design document (PDD) as shown in Figure 6. The physical design emphasizes the internal aspects of the database, e.g. the operations and processes to carry out the necessary tasks [Dave, 2010]. Figure 6 shows the deliverables for the database design phase, namely, conceptual design document (CDD), logical design document (LDD), physical design document (PDD) and the updated RTM. The physical design documents are late used in the database implementation and loading phase. During the design phase it is important to interact with the users. As result of this process the requirements may change. It is imperative then that any change to the requirements be reflected in the RTM and any other relevant document. In doing so, we address the issue of progressive enhancement. We need to mindful that this interaction process is crucial but we also need to be aware not to fall into an unending cycle of changes that may alter the initial scope of the system. While the database is being designed, the application design phase is carried out in parallel. The application design documents should be discussed with the user and changes should be made to the RTM if needed. The design phase is followed by the database implementation and loading phase. The database is implemented using the physical design documents developed earlier during the design phase. The database implementation and loading phase includes steps such as the follows: creating database tables, populating the tables, building constraints and querying the data 13 CU IDOL SELF LEARNING MAT ERIAL (SLM) Requirements Definition and Organizational Data Collection

14 CU IDOL SELF LEARNING MATERIAL (SLM)

Next follows the application implementation phase. The application design documents from the application design phase serve as an input to this phase. The database is then integrated with the application(s) in the next phase i.e. the integration and testing phase. The integrated system is tested in this phase. Finally we have the installation/evaluation phase. Here the use tries out the product and appraises its functionality and performance. After the system has been accepted by the user and it is operational, the maintenance phase begins. This maintenance phase will continue until the product has reached the end of its useful life. That is, until it no longer meets the new requirements of the user. At this point the whole process of developing a new system starts anew. DEGREE OF DATA Data Independence Data independence means a change of data at one level should not affect another level. Two types of data independence are present in this architecture: Physical Data Independence: Any change in the physical location of tables and indexes should not affect the conceptual level or external view of data. This data independence is easy to achieve and implemented by most of the DBMS. Conceptual Data Independence: The data at conceptual level schema and external level schema must be independent. This means a change in conceptual schema should not affect external schema. e.g.; Adding or deleting attributes of a table should not affect the user’s view of the table. But this type of independence is difficult to achieve as compared to physical data independence because the changes in conceptual schema are reflected in the user’s view. Number of Degree: Number of Degree in DBMS generally refers to the number of tables that are linked to each other by using primary key and foreign key. For Example: - If you have two tables, named as StudentDetails and Parents in the database, and if these two tables are linked together by using primary key and foreign key then we can say that there is the degree of 2 relationships. Similarly, if you have three tables linked with each other, Let’s say Student, Parent and course than there is 3 degree of relationships. 15 CU IDOL SELF LEARNING MATERIAL (SLM)

Similarly, if 4 tables linked to each other than that database have 4 degree of relationships. Degree in relationship can be in one of the cardinal forms, either one to one, one to many or many to many 1.6.1 Relationship of higher degree The degree of relationship can be defined as the number of occurrences in one entity that is associated with the number of occurrences in another entity. There is the three degree of relationship: One-to-one (1:1) One-to-many (1:M) Many-to-many (M:N) One-to-one In a one-to-one relationship, one occurrence of an entity relates to only one occurrence in another entity. A one-to-one relationship rarely exists in practice. For example: if an employee is allocated a company car then that car can only be driven by that employee. Therefore, employee and company car have a one-to-one relationship. One-to-many In a one-to-many relationship, one occurrence in an entity relates to many occurrences in another entity. For example: An employee works in one department, but a department has many employees. Therefore, department and employee have a one-to-many relationship. Many-to-many In a many-to-many relationship, many occurrences in an entity relate to many occurrences in another entity. Same as a one-to-one relationship, the many-to-many relationship rarely exists in practice. For example: At the same time, an employee can work on several projects, and a project has a team of many employees. Therefore, employee and project have a many-to-many relationship. 16 CU IDOL SELF LEARNING MATERIAL (SLM)

ABSTRACTION Abstraction is one of the main features of database systems. Hiding irrelevant details from user and providing abstract view of data to users, helps in easy and efficient user-database interaction. In the previous tutorial, we discussed the three level of DBMS architecture, the top level of that architecture is “view level”. The view level provides the “view of data” to the users and hides the irrelevant details such as data relationship, database schema, constraints, security etc from the user. To fully understand the view of data, you must have a basic knowledge of data abstraction and instance & schema. Data abstraction Instance and schema Database systems are made-up of complex data structures. To ease the user interaction with database, the developers hide internal irrelevant details from users. This process of hiding irrelevant details from user is called data abstraction. Figure 1.7 Levels of Abstraction We have three levels of abstraction Physical level: This is the lowest level of data abstraction. It describes how data is actually stored in database. You can get the complex data structure details at this level. Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is stored in database. 17 CU IDOL SELF LEARNING MATERIAL (SLM)

View level: Highest level of data abstraction. This level describes the user interaction with database system. Example: Let’s say we are storing customer information in a customer table. At physical level these records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These details are often hidden from the programmers. At the logical level these records can be described as fields and attributes along with their data types, their relationship among each other can be logically implemented. The programmers generally work at this level because they are aware of such things about database systems. At view level, user just interact with system with the help of GUI and enter the details at the screen, they are not aware of how the data is stored and what data is stored; such details are hidden from them. SUMMARY • A complete and correct database system is difficult to create if the SDLC does not take into account the intrinsic characteristics of the system to be developed and the SDLC itself does not accommodate properties like scope restriction, progressive enhancement, incremental planning and pre-defined structure. • As indicated before, traditional SDLCs lack at least one of the aforementioned properties making them not all suitable for the development of DBMSs, particularly, when the demands on the DBMS are unpredictable. One of main characteristics of this new proposed model is that it makes emphasis on activities that go back and forth between phases allowing either the incorporation of new requirements, if needed, or the correction of incomplete or misunderstood requirements. The idea is to allow for a system that is more flexible of the realities of developing a DBMS. • A database management system (DBMS) is a software package designed to define, manipulate, retrieve and manage data in a database. A DBMS generally manipulates the data itself, the data format, field names, record structure and file structure. It also defines rules to validate and manipulate this data. • Database management systems are set up on specific data handling concepts, as the practice of administrating a database evolves. The earliest databases only handled individual single pieces of specially formatted data. Today’s more evolved systems can handle different kinds of less formatted data and tie them together in more elaborate ways. 18 CU IDOL SELF LEARNING MATERIAL (SLM)

• Over time, the models for database management systems have changed considerably. This is a key part of understanding how various DBMS options work. • The earliest types of database management systems consisted mainly of hierarchy and network models. • The hierarchy model is one where each node or component has a child/parent relationship with one other node or component. • In the network model, the difference is that a single component can have multiple relationships – think of this as a single node being able to “multicast” connections. • However, over time, these models became overtaken by something called a relational database. In the relational database model, individual components have attributes that are linked to their identities through a database table design. The rows and columns of an individual database table include those identities and attributes in such a way that traditional structured query language or SQL can be used to pull various kinds of information on these relational models. KEY WORDS/ABBREVIATIONS • Database Management System: DBMS is the software that manages all aspects of working with a database, from storing and securing the data to enforcing data integrity rules, to providing forms for data entry and manipulation. • Attribute: A database attribute is a characteristic of a database entity. An attribute is a column in a database table, which itself is known as an entity. • Constraints: A database constraint is a set of rules that define valid data. • Query: A database query is usually written in SQL and can be either a select query or an action query. A select query requests data from a database; an action query changes, updates, or adds data. • Schema: A database schema is the design of tables, columns, relations, and constraints that make up a logically distinct section of a database. • Entity: An entity is a table in a database. It is described using an Entity-Relationship Diagram, which is a type of graphic that shows the relationships among database tables. • Functional Dependency: A functional dependency constraint helps to ensure data validity, and exists when one attribute determines the value of another, described as A -> B which means that the value of A determines the value of B, or that B is functionally dependent on A. • Index: An index is a data structure that helps speed database queries for large datasets. Database developers create an index on particular columns in a table. 19 CU IDOL SELF LEARNING MATERIAL (SLM)

LEARNING ACTIVITY 1. Create the data base and define various stages of Database Development life cycle. 2. Create a table in table name student and define data abstraction and Degree of data for the database and table. UNIT END QUESTIONS (MCQ AND DESCRIPTIVE) A. Descriptive Type Questions 1. Elaborate the advantages of DBMS over traditional file based systems? 2. Compare between DDL, DML and DCL in SQL? 3. Illustrate the need of DBMS? 4. Explain the main tasks performed by DBA? 5. Enlist the various relationships of database. 6. Describe Database Life Cycle. B. Multiple Choice Questions 1. DBMS manages the interaction between and database. a) Users b) Clients c) End Users d) Stake Holders 2. Which of the following is not involved in DBMS? 20 a) End Users b) Data c) Application Request CU IDOL SELF LEARNING MATERIAL (SLM)

d) HTML 3. The restrictions placed on the data. a) Relation b) Attribute c) Parameter d) Constraint 4. What are the different views to present a Table? a) Datasheet View b) Design View c) Pivot Table View d) All of these 5. In which state one gathers and list all the necessary fields for the database design project. a) Data Definition b) Data Refinement c) Data Refinement d) Table 6. Which of the following enables us to view data from a table based on a specific criterion a) Form b) Query c) Macro 21 CU IDOL SELF LEARNING MATERIAL (SLM)

d) Report 7. In one-to-many relationship the table on 'many' side is called a) Parent b) Child c) Sister d) Data Refinement Answer 1. c 2. d 3.d 4.d 5.a 6.b 7.b REFERENCES • Elmasri R., Navathe S.B. (2015). Fundamentals of Database Systems. New Delhi: Pearson Education. • Date C.J. (2004). An Introduction to Database Systems. 7th Edition, New Delhi: Pearson Education. • Bipin Desai (2012). Introduction to Database Management system. New Delhi: Galgotia Pub. • The Software Development Life Cycle for small to medium database application.” Shell Method™ Process Repository. Digital Publications LLC. • Dave. “Database design steps / How to develop a database: Data Models & the Database Development Life Cycle”. Knol., Aug 15,2010, Available from: http://knol.google.com/k/dave/database-design-steps-how-to- developa/2pr18mcjtayt9/11. • Mata-Toledo R.A., Adams E., and Norton M., “Strategies for Determining a Design View: A Preamble to DBMS Modeling” Presented to the 16th Annual Eastern Small College Computing Conference. October 27-28, 2000, 29-39. • Pfleeger S. L., and Atlee J. M., Software engineering, NJ: Pearson Higher Education, 2010 • Pressman, R.S., Software Engineering: A Practitioner's Approach, NJ: Pearson Education, 2007. 22 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 2: DBMS DEVELOPMENT CONCEPT-2 Structure Learning Objectives Introduction Traditional Development Life Cycle Phases of Database Development Life Cycle Summary Key Words/Abbreviations Learning Activity Unit End Questions (MCQ and Descriptive) References LEARNING OBJECTIVES After studying this unit, you will be able to: • Explain Traditional Development Life Cycle • State Phases of Database Development Life Cycle INTRODUCTION This unit provides an overview of the manner in which databases are designed and how such design relates to the wider manner in which information systems are developed. Database development life cycle A database is usually a fundamental component of the information system, especially in business-oriented systems. Thus database design is part of system development. Unit shows how database design is involved in the system development lifecycle. The phases (Database Design, Database Implementation) are the phases that you concentrate on in the Database Design. The other phases are briefly described. They are part of the contents of the Systems Analysis and Design courses, for example. There are various methods of how the different phases of information system design, analysis and implementation can be done. Here the main tasks or goals are described but no method is introduced. 23 CU IDOL SELF LEARNING MATERIAL (SLM)

The SDLC provides an overall systems framework within which the more specific database development takes place. The first step of the SDLC, Planning, evaluates the current system and examines alternative approaches. At this stage it might be decided that the current system is adequate or that a new/enhanced system based around a database approach might be advised. Within the SDLC, database development follows a life cycle of its' own known as the DataBase Life Cycle (DBLC). TRADITIONAL LIFECYCLE MODELS This section discusses the traditional lifecycle models and shows that, at least one of the properties required for database system development (scope restriction, progressive enhancement, incremental planning and pre-defined structure), is missing from each of these lifecycles. For this reason, these life cycle models are not completely suitable for developing database systems. Waterfall model: This is the most common of all software models [Pressman, 2007]. The phases in the waterfall cycle are: project planning, requirements definition, design, development, testing, and installation and acceptance. Each of these phases receives an input and produces an output (that serves as the input for next phase) in the form of deliverables. On the other hand, the waterfall model lacks the progressive enhancement and incremental planning property. In this model, the requirements are finalized early in the cycle. In consequence, it is difficult to introduce new requirements or features at later phases of the development process [Shell Method™ Process Repository, 2005]. This waterfall model, which was derived from the “hardware world”, views the software development from a manufacturing perception where items are produced once and reproduced many times [Pfleeger and Atlee, 2010]. A software development process does not work this way because the software evolves as the details of the problem are understood and discussed with the end user. The waterfall model has a documentation driven approach which, from the user’s point of view, is considered one of its main weaknesses. The system specifications, which are finalized early in the lifecycle, may be written in a non-familiar style or in a formal language that may be difficult for the end user to understand [Schach, 2008]. Generally, the end user agrees to these specifications without having a clear understanding of what the final product will be like. This leads to misunderstood or missing requirements in the software requirements specifications (SRS). For this reason, in general, the user has to wait until the installation phase is complete to see the overall functionality of the system. It should be obvious then that the lack of incremental planning in this model makes it difficult to use when developing a database system particularly when the latter supports, for instance, a data 24 CU IDOL SELF LEARNING MATERIAL (SLM)

mining or data warehouse operations where the “impromptu” demands imposed on the system vary frequently or cannot be easily anticipated. Project Planning Requirements Design Development Testing . Installation & Maintenance Figure.2.1. Waterfall model [Pressman, 2007] Prototype model: In this life cycle model, the developers create a prototype of the application based on a limited version of the user requirements [Pfleeger and Atlee, 2010]. The prototype consists mainly of a “hallow graphics” which shows some basic and simple functionality. However, this may create a problem because the user may view the prototype as it were the final product overlooking some of the requirements specified in the SRS which may not be met fully by this “final product” [Pfleeger and Atlee, 2010]. The prototype model limits the pre-defined structure property of a lifecycle. When a prototype is designed, the developer uses minimal code to show some requirements. During this process no integration with other tools is shown. This leads to uncertainty about the final product. The prototype may have to be re-designed in order to provide a finalized product and thus it may not look the same as the one shown to the user initially. 25 CU IDOL SELF LEARNING MATERIAL (SLM)

Proto Typing Initial Requirements Design Customer Evaluation Review and Update Customer Satisfied Maintain Test Development Figure.2.2 Prototype model [Pfleeger and Atlee, 2010] This lifecycle model does support the progressive enhancement property. In the prototype model, the requirements are finalized early in lifecycle. The iterations are focused on design, prototyping, customer evaluation and review phases. This model lacks the incremental planning property as there is no planning after the initial planning phase. Spiral model: This model is a combination of the prototyping and waterfall model [Pfleeger and Atlee, 2010]. Starting with the requirements and a development plan, the system prototypes and the risks involved in their developments are analyzed through an iterative process. During each iteration alternative prototypes are considered based upon the documented constraints and risks of the previous iteration [Pfleeger and Atlee, 2010]. With each subsequent prototype the risks or constraints are minimized or eliminated. After an operational prototype has been finalized (with minimal or no risks), the detailed design document is created. To control risk, the spiral model combines the development activities with a risk management process [Pfleeger and Atlee, 2010]. This latter process requires expertise in the area of risk evaluation which makes the activities that need to be carried out very complex and difficult. The risk evaluation process imposes the consideration of constraints such as cost, time and effort for the entire project. The pre-defined structure property for this lifecycle model, in terms of the number of activities, is so complex that it raises the problem of controllability and efficiency during development of the system. 26 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 2.3. Spiral model [Schach, 2008] The progressive enhancement property is not accommodated in this lifecycle model because, even though, the system is evolving with each phase, no new features can be added to the SRS due to the fact that the requirements have been finalized in an earlier phase. However, notice that the incremental planning property is still missing from this lifecycle. The initial iterations are focused on alternatives and risks involved in the prototype selected. However, none of these iterations focus on updating the SRS by discussing it with the end user. As a result of this the requirements may not be updated; this may lead to having missing or misunderstood requirements. Due to its iterative nature this model may work well for developing requirements that are well understood from the beginning of the project. However, it is not a good model for developing database systems where new requirements may arise during the later phases of the project. The spiral model also assumes that software is developed in discrete phases; for this reason it does not satisfy the property of incremental planning [Schach, 2008]. Rapid application development model (RAD): The basic approach of this model is to let the user try the application before it is finally delivered. The users provide feedback based upon their hands-on experience with the system. The foremost problem with this model is that it is very easy to get caught in an unending and uncontrollable cycle of enhancements. This will lead to violations of the progressive enhancement and scope restriction property. 27 CU IDOL SELF LEARNING MATERIAL (SLM)

As the name of this model implies a prototype is created and installed as soon as possible at the user’s site for their review. This model lacks the predefined structure because, in general, the rapid prototype phase is completed without strictly adhering to the guideline documents and the processes already defined to complete this phase [Schach, 2008]. As shown the incremental planning property of a lifecycle is missing in this model too. After the prototype is completed and evaluated by the end user the requirements may or may not change. If there are no changes in the requirements, then development of the system will continue as initially envisioned. However, if significant requirement changes are necessary, then it is imperative that a timeline for the remaining of the project be established but this is not generally done [Schach, 2008]. Changed requirements Rapid Prototype Analysis Design Implementation Post-delivery Retirement Figure.2.4. RAD Model 28 CU IDOL SELF LEARNING MATERIAL (SLM)

PHASES OF DATABASE DEVELOPMENT LIFE CYCLE The Database Life Cycle (DBLC) contains six phases, as shown in the following Figure: database initial study, database design, implementation and loading, testing and evaluation, operation, and maintenance and evolution. Figure 2.5: The Database Life Cycle 1. The Database Initial Study: In the Database initial study, the designer must examine the current system’s operation within the company and determine how and why the current system fails. The overall purpose of the database initial study is to: • Analyze the company situation. • Define problems and constraints. • Define objectives. • Define scope and boundaries. a. Analyze the Company Situation: The company situation describes the general conditions in which a company operates, its organizational structure, and its mission. To analyze the company situation, the database 29 CU IDOL SELF LEARNING MATERIAL (SLM)

designer must discover what the company’s operational components are, how they function, and how they interact. b. Define Problems and Constraints: The designer has both formal and informal sources of information. The process of defining problems might initially appear to be unstructured. Company end users are often unable to describe precisely the larger scope of company operations or to identify the real problems encountered during company operations. c. Define Objectives: A proposed database system must be designed to help solve at least the major problems identified during the problem discovery process. In any case, the database designer must begin to address the following questions: • What is the proposed system’s initial objective? • Will the system interface with other existing or future systems in the company? • Will the system share the data with other systems or users? d. Define Scope and Boundaries: The designer must recognize the existence of two sets of limits: scope and boundaries. The system’s scope defines the extent of the design according to operational requirements. Will the database design encompass the entire organization, one or more departments within the organization, or one or more functions of a single department? Knowing the scope helps in defining the required data structures, the type and number of entities, the physical size of the database, and so on. The proposed system is also subject to limits known as boundaries, which are external to the system. Boundaries are also imposed by existing hardware and software. 2. Database Design: The second phase focuses on the design of the database model that will support company operations and objectives. This is arguably the most critical DBLC phase: making sure that the final product meets user and system requirements. As you examine the procedures required to complete the design phase in the DBLC, remember these points: • The process of database design is loosely related to the analysis and design of a larger system. The data component is only one element of a larger information system. • The systems analysts or systems programmers are in charge of designing the other system 30 CU IDOL SELF LEARNING MATERIAL (SLM)

components. Their activities create the procedures that will help transform the data within the database into useful information. Using relational theory for a formal design There will be occasions when it is necessary to prove formally that a database satisfies given requirements. Using relational theory can allow a relational representation of a conceptual data model to be analysed rigorously. This stage, which is usually omitted in all but the most exacting development environments (such as safety-critical systems), involves using the formal properties of the relational theory to mathematically prove properties of the conceptual data model that would then be realised in the database design. Figure summarises the iterative (repeated) steps involved in database design, based on the overview given. Its main purpose is to distinguish the general issue of what tables should be used from the detailed definition of the constituent parts of each table – these tables are considered one at a time, although they are not independent of each other. Each iteration that involves a revision of the tables would lead to a new design; collectively they are usually referred to as second-cut designs, even if the process iterates for more than a single loop. 31 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 2.6: Design Methodology 3. Implementation and Loading: The output of the database design phase is a series of instructions detailing the creation of tables, attributes, domains, views, indexes, security constraints, and storage and performance guidelines. In this phase, you actually implement all these design specifications. a. Install the DBMS: This step is required only when a new dedicated instance of the DBMS is necessary for the system. The DBMS may be installed on a new server or it may be installed on existing servers. One current trend is called virtualization. Virtualization is a technique that creates logical representations of computing resources that are independent of the underlying physical computing resources. b. Create the Database(s): In most modern relational DBMSs a new database implementation requires the creation of special storage-related constructs to house the end-user tables. The constructs usually include the storage group (or file groups), the table spaces, and the tables. c. Load or Convert the Data: After the database has been created, the data must be loaded into the database tables. Typically, the data will have to be migrated from the prior version of the system. Often, data to be included in the system must be aggregated from multiple sources. Data may have to be imported from other relational databases, non-relational databases, flat files, legacy systems, or even manual paper-and-pencil systems Consequently, implementation can involve additional flexing of the design to overcome any software or hardware limitations. Realising the design So far we have been concerned only with the specification of a logical schema. We now need our database to be created according to the definitions we have produced. For an implementation with a relational DBMS, this will involve the use of SQL to create tables and constraints that satisfy the logical schema description and the choice of appropriate storage schema (if the DBMS permits that level of control). 32 CU IDOL SELF LEARNING MATERIAL (SLM)

One way to achieve this is to write the appropriate SQL DDL statements into a file that can be executed by a DBMS so that there is an independent record, a text file, of the SQL statements defining the database. Another method is to work interactively using a database tool like Sybase Central (or Microsoft Access), where the forms provided for defining tables help avoid the need to remember the syntactic detail of the SQL language. While this may seem to make it easier to realise a database, it can lead to maintenance problems. In this case, there can be a problem keeping track of exactly how tables are defined and the ability to make changes to them, so it is not recommended for large development projects. Whatever mechanism is used to implement the logical schema, the result is that a database, with tables and constraints, is defined but will contain no data for the user processes. Populating the database After a database has been created, there are two ways of populating the tables – either from existing data, or through the use of the user applications developed for the database. For some tables, there may be existing data from another database or data files. For example, in establishing a database for a hospital you would expect that there are already some records of all the staff that have to be included in the database. Data might also be bought in from an outside agency (address lists are frequently bought in from external companies) or produced during a large data entry task (converting hard-copy manual records into computer files can be done by a data entry agency). In such situations the simplest approach to populate the database is to use the import and export facilities found in the DBMS. Facilities to import and export data in various standard formats are usually available (these functions are also known in some systems as loading and unloading data). Importing enables a file of data to be copied directly into a table. When data are held in a file format that is not appropriate for using the import function then it is necessary to prepare an application program that reads in the old data, transforms them as necessary and then inserts them into the database using SQL code specifically produced for that purpose. The transfer of large quantities of existing data into a database is referred to as a bulk load. Bulk loading of data may involve very large quantities of data being loaded, one table at a time so you may find that there are DBMS facilities to postpone constraint checking until the end of the bulk loading. Supporting users and user processes Use of a database involves user processes (either application programs or database tools) which must be developed outside of the database development. In terms of the three-schema architecture we now need to address the development of the external schema. This will define the data accessible to each user process or group of user processes. In reality, most DBMSs, and SQL itself, do not have many facilities to support the explicit definition of the external schema. However, by using built-in queries and procedures, and with appropriate security 33 CU IDOL SELF LEARNING MATERIAL (SLM)

management, it is possible to ensure access to data by a user process is limited to a tailored subset of the entire database content. In addition to ensuring that appropriate data access for each user process is achieved, the database developer needs to address other user-related issues. Examples of such issues include: reporting of error conditions, constraint enforcement, automated processing using triggers, grouping of activities into transactions, defining database procedures and functions and data security (in addition to the general database and user process access control). Supporting data management strategies Most of the development we've covered so far in this course has focused on meeting specific user requirements – that is, ensuring the right data are constrained correctly and made available to the right user processes. However, other questions must also be addressed in order to support a data management strategy: How frequently should data be backed-up? What auditing mechanisms are required? Which users will be permitted to perform which functions? Which database tools and user processes will be available to users to access data? What level of legal reporting is required? And so on. The data administrator will be involved in setting policy, but the implementer needs to ensure that the right data are being accessed in the right ways by the right users, with appropriate security, record keeping and reporting taking place. Efficiency: the interaction between design and implementation When using the three-schema architecture we would like to separate the logical schema, that is, the description of the tables in the database, from the storage schema required for its efficient implementation. This separation represents an ideal that is rarely found in a commercial DBMS. This is most evident when we need to take account of efficiency. When DBMSs lack the ability to separate these concerns it forces efficiency issues to be considered during the database design (by choosing efficient table structures) rather than leaving such decisions until the implementation stage. An initial design for a logical schema may be produced, but its efficiency can only be evaluated fully during implementation. If the resulting implemented database is not efficient enough to meet the processing requirements, it is necessary to return to the database design and consider how the logical schema may be changed to be more efficient. If separation of logical and storage schema is possible, and if another storage schema can efficiently implement the logical design, then the logical design may not need revision. For this reason, some database design methods refer to two separate design stages: logical database design and physical database design. However, this physical stage is mainly concerned with producing a revised logical schema – that is, specifying SQL tables – so this terminology can be confusing. The tables of a logical schema may differ from the 34 CU IDOL SELF LEARNING MATERIAL (SLM)

relational representation of the conceptual data model because of concerns with efficiency. To make design decisions that address specific efficiency concerns requires detailed knowledge of the specific DBMS facilities and hardware systems that will host the database. 4. Testing and Evaluation: In the design phase, decisions were made to ensure integrity, security, performance, and recoverability of the database. During implementation and loading, these plans were put into place. In testing and evaluation, the DBA tests and fine-tunes the database to ensure that it performs as expected. This phase occurs in conjunction with applications programming. a. Test the Database: During this step, the DBA tests the database to ensure that it maintains the integrity and security of the data. Data integrity is enforced by the DBMS through the proper use of primary and foreign key rules. In database testing you must check Physical security allows, Password security, Access rights, Data encryption etc. b. Fine-Tune the Database: Although database performance can be difficult to evaluate because there are no standards for database performance measures, it is typically one of the most important factors in database implementation. Different systems will place different performance requirements on the database. Many factors can impact the database’s performance on various tasks. Environmental factors, such as the hardware and software environment in which the database exists, can have a significant impact on database performance. Evaluate the Database and Its Application Programs: As the database and application programs are created and tested, the system must also be evaluated from a more holistic approach. Testing and evaluation of the individual components should culminate in a variety of broader system tests to ensure that all of the components interact properly to meet the needs of the users. To ensure that the data contained in the database are protected against loss, backup and recovery plans are tested. 5. Operation Once the database has passed the evaluation stage, it is considered to be operational. At that point, the database, its management, its users, and its application programs constitute a complete information system. The beginning of the operational phase invariably starts the process of system evolution. 35 CU IDOL SELF LEARNING MATERIAL (SLM)

6. Maintenance and Evolution The database administrator must be prepared to perform routine maintenance activities within the database. Maintenance can take three main forms: • Operational maintenance, where the performance of the database is monitored. If it falls below some acceptable standard, then reorganisation of the database, usually in the form of adjustments to the storage schema, takes place to ensure that performance is maintained at an acceptable level. • Porting and implementation maintenance, in which the DBMS, the user processes, the underlying computer system or some other aspect undergoes changes that require the database implementation to be revised. Of course, the extreme form of porting occurs when the database is to be entirely replaced – in which case the entire development life cycle is usually followed using the existing database and its limitations as the starting point for requirements analysis. Adjustments to the storage schema are almost inevitable as new data storage capabilities become available. This can involve both restructuring and reorganisation, depending on the scope of the changes taking place. • Requirements change, where the original requirement specification changes, usually because databases are frequently used for purposes for which they were not originally designed. This involves restructuring and typically involves a ‘mini life cycle’ related to the development of changes to meet the new requirements. If we follow the three-schema architecture approach we would hope to minimise the impact of change and build systems which are easily maintained. SUMMARY • Database development is just one part of the much wider field of software engineering, the process of developing and maintaining software. A core aspect of software engineering is the subdivision of the development process into a series of phases, or steps, each of which focuses on one aspect of the development. The software product moves through this life cycle (sometimes repeatedly as it is refined or redeveloped) until it is finally retired from use. Ideally, each phase in the life cycle can be checked for correctness before moving on to the next phase. However, software engineering is a very rich discipline with many different methods for the subdivision of the development process and a detailed exploration of the many different ways. • Databases exist because of the need to change data into information. Data are the raw and unprocessed facts. Information is obtained by processing the data into something useful. For example, millions of names and telephone numbers in a phone book are 36 CU IDOL SELF LEARNING MATERIAL (SLM)

data. Information is the telephone number of the fire department when your house is burning down. • This section discusses the traditional lifecycle models and shows that, at least one of the properties required for database system development (scope restriction, progressive enhancement, incremental planning and pre-defined structure), is missing from each of these lifecycles. For this reason, these life cycle models are not completely suitable for developing database systems. • Before the use of computer, a manual file system was used to maintain the records and files. All the data was stored in files and it makes it easy to find any information. But this system was good only for small organizations having small number of items. • Traditional data systems, such as relational databases and data warehouses, have been the primary way businesses and organizations have stored and analyzed their data for the past 30 to 40 years. Although other data stores and technologies exist, the major percentage of business data can be found in these traditional systems. Traditional systems are designed from the ground up to work with data that has primarily been structured data. Characteristics of structured data include the following: • Clearly defined fields organized in records. Records are usually stored in tables. Fields have names, and relationships are defined between different fields. • Schema-on-write that requires data be validated against a schema before it can be written to disk. A significant amount of requirements analysis, design, and effort up front can be involved in putting the data in clearly defined structured formats. This can increase the time before business value can be realized from the data. • A design to get data from the disk and load the data into memory to be processed by applications. This is an extremely inefficient architecture when processing large volumes of data this way. The data is extremely large and the programs are small. The big component must move to the small component for processing. KEY WORDS/ABBREVIATIONS • Acceptance testing: Testing of software outside the development organisation and usually at the client site. • Adaptive maintenance: Takes place when a program’s environment changes, for example a different operating system. • Corrective maintenance: Correction of previously undetected errors during development that is now apparent after installation of the software on the client site. 37 CU IDOL SELF LEARNING MATERIAL (SLM)

• Maintenance: The upkeep of a program by repair and modification. • Methodology: A technique involving various notations that enables the design of software to be implemented. • Perfective maintenance: Takes place when a system has to be enhanced in some way e.g. program run faster. • Portable: The ability of a program to run on different machine architectures with different operating systems. • Problem oriented: The focus is on the problem and how it is to be solved rather than on the hardware on which the program will run. • Problem specification: A document outline of what is to be solved in terms of programming a solution to a given problem. • Requirements specification: A document describing what the system must be able to do in order to meet user requirements. • Robust: A program is robust if it can cope with problems that come from outside and are not of its own making. LEARNING ACTIVITY 1. Draw a creative study to differentiate between traditional development cycle and DBLC. 2. Do the comparative study on various methods of Implementation in system development. UNIT END QUESTIONS (MCQ AND DESCRIPTIVE) 38 A. Descriptive Type Questions 1. Differentiate between Waterfall and Prototype Model. 2. Explain how data is collected in requirement gathering stage. 3. Discuss various testing process. 4. How analysis is the most important phase of DBLC. 5. Explain the various phases of Database design. 6. Explain different phases of DBLC. B. Multiple Choice Questions 1. The primary tool used in structured design is a: CU IDOL SELF LEARNING MATERIAL (SLM)

a) Module b) structure chart c) data-flow diagram d) program flowchart 2. System consists of programs, data files and documentation a) Conceptual b) Logical c) Physical d) None of the above 3. The main ingredient of the report documenting the is the cost benefit analysis. a) System Analysis b) Feasibility Study c) System Analyst d) System Design 4. The organized process or set of steps that needs to be followed to develop an information system is known as a) analytical cycle b) design cycle c) program specification d) system development life cycle 5. What are structured analysis tools? 4. d 5. d a) Data Flow Diagrams b) Data Dictionary c) Decision Trees d) All of these Answer 1.b 2.c 3.b 39 CU IDOL SELF LEARNING MATERIAL (SLM)

REFERENCES • Elmasri R., Navathe S.B. (2015). Fundamentals of Database Systems. New Delhi: Pearson Education. • Date C.J. (2004). An Introduction to Database Systems. 7th Edition, New Delhi: Pearson Education. • Bipin Desai (2012). Introduction to Database Management system. New Delhi: Galgotia Pub. • The Software Development Life Cycle for small to medium database application.” Shell Method™ Process Repository. Digital Publications LLC. • Geoffrey Elliott & Josh Strachan (2004) Global Business Information Technology. p.87. • US Department of Justice (2003). INFORMATION RESOURCES MANAGEMENT Chapter 1. Introduction. • Everatt, G.D.; McLeod Jr., R. (2007). \"Chapter 2: The Software Development Life Cycle\". Software Testing: Testing Across the Entire Software Development Life Cycle. John Wiley & Sons. pp. 29–58. ISBN 9780470146347. • Unhelkar, B. (2016). The Art of Agile Practice: A Composite Approach for Projects and Organizations. CRC Press. pp. 56–59. ISBN 9781439851197. • Land, S.K.; Smith, D.B.; Walz, J.W. (2012). Practical Support for Lean Six Sigma Software Process Definition: Using IEEE Software Engineering Standards. John Wiley & Sons. pp. 341–3. ISBN 9780470289952. • Kay, Russell (May 14, 2002). \"QuickStudy: System Development Life Cycle\". ComputerWorld. 5. Pressman, R.S., Software Engineering: A Practitioner's Approach, NJ: Pearson Education, 2007. 40 CU IDOL SELF LEARNING MATERIAL (SLM)

41 CU IDOL SELF LEARNING MATERIAL (SLM)

UNIT 3: DISTRIBUTED DATABASE MANAGEMENT SYSTEM 1 Structure Learning Objectives Introduction Architecture of Distributed Database Management Components of Distributed Database Management Types of Distributed Database Transparency Features Summary Key Words/Abbreviations Learning Activity Unit End Questions (MCQ and Descriptive) References LEARNING OBJECTIVES This unit helps to learn the concepts of Distributed Databases This unit discusses the architecture, framework, features, functions and types of Distributed Database Management System with examples and diagrams to make it easy to understand. After studying this unit, you will be able to: • Architecture of Distributed Database Management • Components of Distributed Database Management • Types of Distributed Database Management INTRODUCTION A distributed database is a collection of multiple interconnected databases, which are spread physically across various locations that communicate via a computer network. 42 CU IDOL SELF LEARNING MATERIAL (SLM)

Features- Distributed Database: • Databases in the collection are logically interrelated with each other. Often they represent a single logical database. • Data is physically stored across multiple sites. Data in each site can be managed by a DBMS independent of the other sites. • The processors in the sites are connected via a network. They do not have any multiprocessor configuration. • A distributed database is not a loosely connected file system. • A distributed database incorporates transaction processing, but it is not synonymous with a transaction processing system. Distributed Database Management System A distributed database management system (DDBMS) is a centralized software system that manages a distributed database in a manner as if it were all stored in a single location. Features- Distributed Database Management System: • It is used to create, retrieve, update and delete distributed databases. • It synchronizes the database periodically and provides access mechanisms by the virtue of which the distribution becomes transparent to the users. • It ensures that the data modified at any site is universally updated. • It is used in application areas where large volumes of data are processed and accessed by numerous users simultaneously. • It is designed for heterogeneous database platforms. • It maintains confidentiality and data integrity of the databases. Factors Encouraging DDBMS The following factors encourage moving over to DDBMS − • Distributed Nature of Organizational Units − Most organizations in the current times are subdivided into multiple units that are physically distributed over the globe. Each unit requires its own set of local data. Thus, the overall database of the organization becomes distributed. 43 CU IDOL SELF LEARNING MATERIAL (SLM)

• Need for Sharing of Data − The multiple organizational units often need to communicate with each other and share their data and resources. This demands common databases or replicated databases that should be used in a synchronized manner. • Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online Analytical Processing (OLAP) work upon diversified systems which may have common data. Distributed database systems aid both these processing by providing synchronized data. • Database Recovery − One of the common techniques used in DDBMS is replication of data across different sites. Replication of data automatically helps in data recovery if database in any site is damaged. Users can access data from other sites while the damaged site is being reconstructed. Thus, database failure may become almost inconspicuous to users. • Support for Multiple Application Software − Most organizations use a variety of application software each with its specific database support. DDBMS provides a uniform functionality for using the same data among different platforms. Advantages of Distributed Databases Following are the advantages of distributed databases over centralized databases. Modular Development − If the system needs to be expanded to new locations or new units, in centralized database systems, the action requires substantial efforts and disruption in the existing functioning. However, in distributed databases, the work simply requires adding new computers and local data to the new site and finally connecting them to the distributed system, with no interruption in current functions. More Reliable − In case of database failures, the total system of centralized databases comes to a halt. However, in distributed systems, when a component fails, the functioning of the system continues may be at a reduced performance. Hence DDBMS is more reliable. Better Response − If data is distributed in an efficient manner, then user requests can be met from local data itself, thus providing faster response. On the other hand, in centralized systems, all queries have to pass through the central computer for processing, which increases the response time. Lower Communication Cost − In distributed database systems, if data is located locally where it is mostly used, then the communication costs for data manipulation can be minimized. This is not feasible in centralized systems. 44 CU IDOL SELF LEARNING MATERIAL (SLM)

Following are some of the adversities associated with distributed databases. • Need for complex and expensive software − DDBMS demands complex and often expensive software to provide data transparency and co-ordination across the several sites. • Processing overhead − Even simple operations may require a large number of communications and additional calculations to provide uniformity in data across the sites. • Data integrity − The need for updating data in multiple sites pose problems of data integrity. • Overheads for improper data distribution − Responsiveness of queries is largely dependent upon proper data distribution. Improper data distribution often leads to very slow response to user requests. Distributed Database Design Issues Figure 3.1 Distributed database design issue 1. Distributed database design There are two basic alternatives to placing data: partitioned (or non-replicated) and replicated. 2.Distributed Directory Management A directory contains information (such as descriptions and locations) about data items in the database. 3. Distributed Query Processing 45 CU IDOL SELF LEARNING MATERIAL (SLM)

Query processing deals with designing algorithms that analyze queries and convert them into a series of data manipulation operations. 4. Distributed Concurrency Control Concurrency control involves the synchronization of accesses to the distributed data-base, such that the integrity of the database is maintained 5. Distributed Deadlock Management The competition among users for access to a set of resources (data, in this case) can result in a deadlock if the synchronization mechanism is based on locking. The well-known alternatives of prevention, avoidance, and detection/recovery also apply to DDBSs 6. Reliability of Distributed DBMS We mentioned earlier that one of the potential advantages of distributed systems is improved reliability and availability. This, however, is not a feature that comes, automatically. It is important that mechanisms be provided to ensure the consistency of the database as well as to detect failures and recover from them. The implication for DDBSs is that when a failure occurs and various sites become either inoperable or inaccessible, the databases at the operational sites remain consistent and up to date. 7. Replication If the distributed database is (partially or fully) replicated, it is necessary to implement protocols that ensure the consistency of the replicas, i.e., copies of the same data item have the same value. The Replication can be either eager or lazy. DISTRIBUTED DBMS ARCHITECTURE ANSI/SPARC Architecture 46 CU IDOL SELF LEARNING MATERIAL (SLM)

Figure 3.2 ANSI/SPARC Architecture There are three views of data: the external view, which is that of the end user, who might be a programmer; the internal view, that of the system or machine; and the conceptual view, that of the enterprise At the lowest level of the architecture is the internal view, which deals with the physical definition and organization of data Conceptual View it is supposed to represent the data and the relationships among data without considering the requirements of individual applications or the restrictions of the physical storage media Architectural Models for Distributed DBMSs Figure 3.3 Architectural Models for Distributed DBMSs 47 CU IDOL SELF LEARNING MATERIAL (SLM)

Autonomy, in this context, refers to the distribution of control, not of data. It indicates the degree to which individual DBMSs can operate independently. 1. Design autonomy: Individual DBMSs are free to use the data models and transaction management techniques that they prefer. 2. Communication autonomy: Each of the individual DBMSs is free to make its own decision as to what type of information it wants to provide to the other DBMSs or to the software that controls their global execution. 3. Execution autonomy: Each DBMS can execute the transactions that are sub- mitted to it in any way that it wants to. Distribution: P2P or Client Server Heterogeneity Heterogeneity may occur in various forms in distributed systems, ranging from hardware heterogeneity and differences in networking protocols to variations in data managers. The important ones from the perspective of this book relate to data models, query languages, and transaction management protocols. 48 CU IDOL SELF LEARNING MATERIAL (SLM)


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook