Terse Notes — Fundamentals of Database Systems Part 2
Database System Concepts and Architecture
Ch 2 — Book Credit to Elmasri Ramez And Navathe Shamkant
Book: Fundamentals Of Database System, 7th Edition by Elmasri Ramez And Navathe Shamkant
Chapter 2 — Database System Concepts and Architecture
“Data abstraction generally refers to the suppression of details of data organization and storage, and the highlighting of the essential features for an improved understanding of data.” (Pp. 32)
A data model is a collection of concepts that can be used to describe the structure of a database.
Structure of the database includes data types, relationships, constraints that apply to the data and also a set of basic operations for specifying retrievals and updates on the database.
Three-Schema Architecture is an architecture to separate the user applications from the physical database.
- Level one — internal level has an internal schema that describes the physical storage structure of the database.
- Level two — the conceptual level has a conceptual schema that describes the structure of the whole database for a community of users. (Usually with a representational data model)
- Level three — external level or view level has a number of external schemas or user views. These views describe the part of the database that a particular user group is interested in.
Review Questions
2.1. Define the following terms: data model, database schema, database state, internal schema, conceptual schema, external schema, data independence, DDL, DML, SDL, VDL, query language, host language, data sublanguage, database utility, catalog, client/server architecture, three-tier architecture, and n-tier architecture.
A data model is a collection of concepts that can be used to describe the structure of a database, providing means for data abstraction.
A database schema is the description of a database structure, defining data types, relationships, and constraints. It is specified during design and does not change frequently.
The database state refers to the actual data stored in the database at a particular moment, changing with every data insertion, deletion, or modification.
The internal schema describes the physical storage structure of the database using a physical data model, providing complete details of data storage and access paths.
The conceptual schema describes the structure of the whole database for a community of users using a representational data model, hiding physical storage details.
The external schema or user view describes the part of the database that a particular user group is interested in, hiding the rest.
Data independence refers to the ability to change the schema at one level without affecting higher levels. It has two types:
- Logical data independence allows changing the conceptual schema without affecting external schemas or applications.
- Physical data independence allows changing the internal schema without affecting the conceptual schema.
The DDL (Data Definition Language) is used by DBAs and designers to define conceptual and internal schemas.
The DML (Data Manipulation Language) is used by users to query and update the database.
The SDL (Storage Definition Language) specifies the internal schema, while the VDL (View Definition Language) specifies user views and mappings to the conceptual schema.
A query language is used for querying and updating the database, while a host language is a programming language used to write application programs that access the database.
A data sublanguage is an extension of a host language to support database access.
A database utility provides tools for database administration and management tasks like backup, recovery, user account management, etc.
The catalog stores the database description (schema) to make the database self-describing.
In a client/server architecture, functionality is distributed between client and server modules. A two-tier architecture has a client for user interaction and a server for data storage/access. A three-tier architecture adds an application server layer in between.
An n-tier architecture (n > 3) further separates the application functionality into multiple tiers or layers for better modularity and scalability.
2.2. Discuss the main categories of data models. What are the basic differences among the relational model, the object model, and the XML model?
The main categories of data models are:
- Conceptual data models like the Entity-Relationship model provide high-level concepts close to how users perceive data, using entities, attributes, and relationships.
- Representational/Implementation data models like the relational, network, and hierarchical models can be directly implemented and provide concepts understandable to end-users but closer to data organization.
- Physical data models describe how data is stored as files, representing details like record formats and access paths, targeted at computer specialists.
The relational model represents data as tables (relations) with rows (tuples) and columns (attributes). The object model represents data as objects with attributes and methods. The XML model represents data as hierarchical tree structures.
2.3. What is the difference between a database schema and a database state?
A database schema is the logical design and structure of the database, while the database state refers to the actual data values stored at a given point in time
2.4. Describe the three-schema architecture. Why do we need mappings among schema levels? How do different schema definition languages support this architecture?
The three-schema architecture separates the user’s external view, the conceptual schema for the community of users, and the internal storage schema.
Mappings among schema levels are needed to transform requests and results between levels, as each level abstracts away different details.
The DDL defines conceptual and external schemas, the SDL defines the internal schema, and the VDL defines user views and mappings to the conceptual schema.
2.5. What is the difference between logical data independence and physical data independence? Which one is harder to achieve? Why?
Logical data independence allows changing the conceptual schema without affecting external schemas or applications. Physical data independence allows changing the internal schema without affecting the conceptual schema.
Physical data independence is harder to achieve as it requires changing the low-level data storage and access details without impacting the higher-level logical model.
2.6. What is the difference between procedural and nonprocedural DMLs?
A nonprocedural DML allows specifying complex database operations concisely at a high level. A procedural DML must be embedded in a programming language and processes data record-by-record using constructs like loops.
2.7. Discuss the different types of user-friendly interfaces and the types of users who typically use each.
The different types of user-friendly interfaces and the typical users for each are:
Form-based interfaces: These provide a form or template for naive or parametric users like bank tellers to perform routine data entry or retrieval operations by filling out fields.
GUI interfaces: Graphical user interfaces with menus, icons, and dialog boxes are commonly used by casual or naive users like clerks or managers who need to access portions of the database occasionally.
Interfaces for Browsing: These allow naive intermittent users to browse or query the database in a simple manner, like a website visitor browsing a product catalog.
Parametric interfaces: These provide a small set of commands or function keys for parametric users like bank tellers to perform repetitive database transactions efficiently with minimal keystrokes.
Command interfaces: Sophisticated users like database administrators use command-based interfaces to perform complex database operations and administration tasks.
2.8. With what other computer system software does a DBMS interact?
A DBMS interacts with the following other computer system software components:
- Operating System: For disk read/write scheduling and buffer management.
- Communications Software: To allow remote users to access the database over networks.
- Other DBMSs: In distributed databases, to communicate across multiple sites.
2.9. What is the difference between the two-tier and three-tier client/server architectures?
In a two-tier client/server architecture, the client contains the user interface and application programs, while the server contains the DBMS that processes SQL queries and transactions.
In a three-tier architecture, an application server layer is added between the client and database server. The application server runs application logic, enforces business rules, improves security by authenticating clients, and acts as an intermediary between the client UI and database server.
2.10. Discuss some types of database utilities and tools and their functions.
Some common types of database utilities and their functions are:
- Backup and Recovery: For backing up the database and restoring from backups after a failure.
- Database Reorganization: For reorganizing storage structures like indexes to improve performance.
- Performance Monitoring: For monitoring and tuning database performance.
- Space Management: For managing disk space usage and storage allocation.
- Security Management: For managing user accounts, access privileges, encryption etc.
- Data Integrity Checking: For validating constraints and checking database consistency.
2.11. What is the additional functionality incorporated in n-tier architecture (n . 3)?
In an n-tier architecture (n > 3), the application functionality is further divided into multiple layers or tiers beyond the typical three tiers.
This provides benefits like:
- Each tier can run on the most suitable hardware/OS platform
- Tiers can be developed, deployed and scaled independently
- Improved modularity by separating different application concerns
Typically, the business logic layer is split into multiple tiers in an n-tier architecture for better separation of concerns.
2.12. Think of different users for the database shown in Figure 1.2. What types of applications would each user need? To which user category would each belong, and what type of interface would each need?
Different users and their potential applications/interfaces could be:
- Students: Students would need to view course offerings, prerequisites, register for courses, and check their grades. They would likely be categorized as naive/casual users and would need a user-friendly GUI interface or a mobile app.
- Professors/Instructors: They would need to view course details, student rosters, enter/update grades, and potentially update course information. As parametric users performing routine tasks, they may need a form-based interface or a tailored application.
- Department Staff: They would manage course offerings, instructor assignments, classroom allocation, etc. As sophisticated users, they may need a command-based interface to perform complex queries and updates.
- University Administrators: They would need to analyze enrollment data, plan course schedules, allocate resources, etc. As sophisticated users doing complex analysis, they may need query tools or reporting interfaces.
2.13. Choose a database application with which you are familiar. Design a schema and show a sample database for that application, using the notation of Figures 1.2 and 2.1. What types of additional information and constraints would you like to represent in the schema? Think of several users of your database, and design a view for each.
For example, a database for a video streaming service like Netflix:
CUSTOMER(Customer_ID, Name, Email, Payment_Method, Subscription_Plan)
MOVIE(Movie_ID, Title, Genre, Director, Release_Year, Rating, Duration)
VIEWING_HISTORY(Customer_ID, Movie_ID, Watched_Date, Percentage_Watched)
Additional constraints:
- Customer_ID is the primary key for CUSTOMER
- Movie_ID is the primary key for MOVIE
- Composite key (Customer_ID, Movie_ID) for VIEWING_HISTORY
- Payment_Method can be ‘Credit Card’ or ‘PayPal’
- Subscription_Plan can be ‘Basic’, ‘Standard’, or ‘Premium’
- Rating must be between 1 and 5
User views:
- Customer view: CUSTOMER, VIEWING_HISTORY for their Customer_ID
- Content manager view: MOVIE, aggregate VIEWING_HISTORY data
- Billing staff view: CUSTOMER payment/subscription details
2.14. If you were designing a Web-based system to make airline reservations and sell airline tickets, which DBMS architecture would you choose from Section 2.5? Why? Why would the other architectures not be a good choice?
For a web-based airline reservation and ticketing system, a three-tier or n-tier client/server architecture would be most suitable:
- The client tier would be a web browser or mobile app for users to search flights, make reservations, purchase tickets, etc.
- The application/business logic tier (potentially multiple tiers) would handle core functionality like flight search, seat reservations, pricing, payments, etc.
- The database tier would store all flight schedules, passenger records, reservations, and other data.
A multi-tier architecture provides benefits like scalability, separating concerns across tiers, easier maintenance, and improved security compared to a two-tier architecture.
A two-tier architecture would not be ideal as it lacks the separation of the application logic layer. This could make the system inefficient, less scalable and harder to maintain as complexity grows.
2.15. Consider Figure 2.1. In addition to constraints relating the values of columns in one table to columns in another table, there are also constraints that impose restrictions on values in a column or a combination of columns within a table. One such constraint dictates that a column or a group of columns must be unique across all rows in the table. For example, in the STUDENT table, the Student_number column must be unique (to prevent two different students from having the same Student_number). Identify the column or the group of columns in the other tables that must be unique across all rows in the table.
Based on Figure 2.1, the columns or column groups that must be unique across all rows in each table are:
- STUDENT: Student_number
- COURSE: Course_number
- SECTION: Combination of Course_number, Semester, Year, Section_identifier
- PREREQUISITE: Combination of Course_number, Prerequisite_number
- GRADE_REPORT: Combination of Student_number, Section_identifier
The uniqueness constraints ensure that each student has a unique ID, each course has a unique number, each section is uniquely identified by the course, semester, year and section details, each prerequisite relationship is unique, and each grade record is unique for a student-section combination.