Terse Notes — Fundamentals of Database Systems Part 8
The Relational Algebra and Relational Calculus
Book Credit to Elmasri Ramez And Navathe Shamkant
Book: Fundamentals Of Database System, 7th Edition by Elmasri Ramez And Navathe Shamkant
Ch 8 — The Relational Algebra and Relational Calculus
Relational algebra — set of operations for the formal relational model; includes both set operations from mathematical set theory and also other operations for relational databases specifically like SELECT, PROJECT, JOIN
Relational calculus — provides higher level declarative language for specifying relational queries; there is no order of operations to say how to retrieve the query result.
SELECT — unary because it involves only one relation; choose a subset of tuples from a relation that satisfies the select condition. it’s a horizontal partition. It applies to each tuple individually. Like it goes through each tuple one by one. It is commutative so it can be applied in any order.
SELECT is denoted by σ
In SQL the SELECT condition is usually in the WHERE clause.
the SELECT operation is denoted by
σ<selection condition>(R)
PROJECT — this operation is unary too; it is like a vertical partition because we are only selecting columns or attributes.
PROJECT results only in what is specified in the attribute list and in the same order as they appear in the attribute list.
PROJECT is π
The general form of the PROJECT
operation is
π<attribute list>(R)
PROJECT will remove any duplicate tuples; called duplicate elimination. Does not have commutativity.
PROJECT is usually in the SELECT clause of a SQL query.
Relational Algebra Operations from Set Theory
These are standard mathematical operations on sets, including:
UNION is denoted by ∪
INTERSECTION is denoted by ∩
SET DIFFERENCE (MINUS) is denoted by -
■ UNION: The result of this operation, denoted by R ∪ S, is a relation that
includes all tuples that are either in R or in S or in both R and S. Duplicate
tuples are eliminated.
■ INTERSECTION: The result of this operation, denoted by R ∩ S, is a relation
that includes all tuples that are in both R and S.
■ SET DIFFERENCE (or MINUS): The result of this operation, denoted by R — S,
is a relation that includes all tuples that are in R but not in S.
(pp. 247)
Binary operations are applied to two sets of tuples. So they have to be compatible by having the same degree n (or same attributes) and the same domain.
UNION and INTERSECTION are commutative. The are also associative.
MINUS is not commutative.
CARTESIAN PRODUCT (CROSS PRODUCT or CROSS JOIN) is denoted by X
It is a binary set operation but the relations do not have to be union compatible. It produces a new element by combining every element from one relation with every member of the other relation. In other words, finds all possible combinations of the two sets.
In SQL, this is achieved with a CROSS JOIN in joined tables or if there are two tables in the FROM clause.
Binary Relational Operations:
JOIN — we use the join operation to combine related tuples from two relations that satisfy the join condition.
The general form of a JOIN operation on two relations5 R(A1, A2, … , An) and
S(B1, B2, … , Bm) is:
R <join condition>S
JOIN is denoted by two triangles on its side pointing to each other
JOIN is different than CARTESIAN PRODUCT because it only grabs tuples satisfying the join condition. Whereas CARTESIAN PRODUCT grabs all combinations.
THETA JOIN — a JOIN operation with a general join condition like = < > <= >= not equal.
EQUIJOIN — a JOIN operation that only uses the equal comparison operator.
NATURAL JOIN — it is denoted by *. It is the same as an EQUIJOIN but the two join attributes must be named the same in both relations. This is called the join attribute. But only one join attribute is kept.
DIVISION is denoted by divide by symbol;
It is between two relations and finds the tuples in one relation that are associated with all the tuples in another relation.
Query tree — a tree data structure that represents relational algebra expression.
Additional Relational Operations
Generalized Projection — allows functions of attributes to be included in the attribute list.
Aggregate functions and groups
- Aggregate functions on collections fo values from the database like SUM, AVERAGE, MAX, MIN
Recursive Closure — an operation applied to a recursive relationship between tuples of the same type.
OUTER JOIN
- Keep all tuples in R or all those in S or all those in both relations in the result of the JOIN regardless of whether or not they have matching tuples in the other relation.
- RIGHT or LEFT applies
OUTER UNION
- An operation takes the union of tuples from two relations that have some common attributes but are not union type compatible.
- Same as a FULL OUTER JOIN on the common attributes.
The Tuple Relational Calculus
Tuple relational calculus uses tuples. Its variables range over tuples only.
It is a nonprocedural language because it specifies what is to be retrieved (rather than how).
A simple tuple relational calculus query is of the form:
{t | COND(t)}
The Domain Relational Calculus
Differs from tuple relational calculus in the types of variables.
Domain relational calculus uses single values from domains of attributes.
Review Questions
8.1. List the operations of relational algebra and the purpose of each.
The main operations of relational algebra are:
- SELECT (σ): Selects tuples that satisfy a given predicate from a relation.
- PROJECT (π): Projects a relation over specified attributes (columns) to create a new relation with only those attributes.
- UNION (∪): Combines all tuples from two union-compatible relations into a single set, eliminating duplicates.
- INTERSECTION (∩): Produces a relation containing only tuples that are present in both input relations.
- DIFFERENCE (-): Produces a relation containing tuples from the first relation that are not present in the second relation.
- CARTESIAN PRODUCT (×): Combines every tuple from the first relation with every tuple from the second relation, resulting in a relation with the concatenated attributes from both relations.
- JOIN (⨝): Combines tuples from two relations based on a join condition, including INNER JOIN, OUTER JOIN (LEFT, RIGHT, FULL), NATURAL JOIN, etc.
- DIVISION (÷): Produces a relation containing tuples from the first relation that are associated with every tuple from the second relation.
8.2. What is union compatibility? Why do the UNION, INTERSECTION, and DIFFERENCE operations require that the relations on which they are applied be union compatible?
Two relations are union compatible if they have the same number of attributes, and the corresponding attributes in each relation are of the same domain (data type).
The UNION, INTERSECTION, and DIFFERENCE operations require union compatibility because they operate on tuples from both relations. If the relations are not union compatible, the tuples cannot be meaningfully combined or compared since their attributes would have different domains or a different number of attributes.
8.3. Discuss some types of queries for which renaming of attributes is necessary in order to specify the query unambiguously.
Renaming of attributes is necessary when:
- Joining relations with common attribute names: To avoid ambiguity when joining relations with common attribute names, one or both of the common attributes need to be renamed.
- Performing operations on relations with duplicate attribute names: If a relation has duplicate attribute names, renaming is required to distinguish between them.
- Creating views or temporary relations: When creating views or intermediate temporary relations, renaming may be needed to provide meaningful names for the attributes.
- Performing operations involving arithmetic expressions: When performing arithmetic operations on attributes, renaming may be required to assign a new name to the resulting attribute.
8.4. Discuss the various types of inner join operations. Why is theta join required?
The main types of inner join operations are:
- NATURAL JOIN: Combines tuples from two relations based on equal values in their common attribute(s), projecting out the common attribute(s) from the result.
- EQUI-JOIN: Combines tuples from two relations based on a specified equality condition on one or more attributes.
- THETA-JOIN: A generalized form of equi-join that combines tuples from two relations based on a specified condition involving attributes from both relations, not necessarily an equality condition.
The theta-join is required because there are cases where the join condition is not just an equality condition, but involves other comparison operators (<, >, <=, >=, !=) or more complex conditions. The theta-join provides the flexibility to specify arbitrary join conditions, making it more general than the equi-join or natural join.
8.5. What role does the concept of foreign key play when specifying the most common types of meaningful join operations?
Foreign keys play a crucial role in specifying meaningful join operations between relations. A foreign key in one relation references the primary key of another relation, establishing a relationship between the two relations.
When joining two relations based on their foreign key and primary key attributes, the join operation combines tuples from the two relations that are related through the foreign key reference. This ensures that only tuples with matching values in the join attributes (foreign key and primary key) are combined, resulting in a meaningful join that preserves the relationships between the data.
For example, when joining the EMPLOYEE and DEPARTMENT relations based on the Dno (foreign key in EMPLOYEE) and Dnumber (primary key in DEPARTMENT) attributes, the join operation combines each employee tuple with the corresponding department tuple based on the matching department numbers.
8.6. What is the FUNCTION operation? For what is it used?
The FUNCTION operation, also known as the aggregate function or summarization operation, is used to perform calculations or aggregations on groups of tuples in a relation.
It is used to compute summary values such as COUNT, SUM, AVG, MAX, MIN, etc., over groups of tuples that share certain attribute values. The groups are typically formed by applying the GROUP BY clause in SQL, which partitions the tuples based on the specified grouping attributes.
For example, the FUNCTION operation can be used to calculate the average salary for each department, the total sales for each product category, or the count of employees in each department.
8.7. How are the OUTER JOIN operations different from the INNER JOIN operations? How is the OUTER UNION operation different from UNION?
The main difference between OUTER JOIN and INNER JOIN operations is how they handle tuples that do not have matching values in the joined relations.
- INNER JOIN: Includes only tuples from both relations that satisfy the join condition, excluding tuples that do not have a match in the other relation.
- OUTER JOIN: In addition to the tuples included in the INNER JOIN, OUTER JOIN operations also include tuples from one or both relations that do not have a match in the other relation, padding the missing values with NULL.
- LEFT OUTER JOIN: Includes all tuples from the left relation, and tuples from the right relation that match, padding with NULL for non-matching tuples from the left.
- RIGHT OUTER JOIN: Includes all tuples from the right relation, and tuples from the left relation that match, padding with NULL for non-matching tuples from the right.
- FULL OUTER JOIN: Includes all tuples from both relations, padding with NULL for non-matching tuples on either side.
The OUTER UNION operation is different from the regular UNION operation in that it allows combining relations that are not entirely union-compatible (i.e., they have some common attributes but also some attributes that are not common). The OUTER UNION operation includes all tuples from both relations, padding with NULL for attributes that are not present in one of the relations.
8.8. In what sense does relational calculus differ from relational algebra, and in what sense are they similar?
Relational calculus and relational algebra are two formal languages for specifying queries in the relational model, but they differ in their approach and level of abstraction.
The main differences are:
- Procedural vs. Declarative: Relational algebra is a procedural language that specifies how to retrieve the query result by defining a sequence of operations. Relational calculus is a declarative language that specifies what information the result should contain, without specifying the procedure to obtain it.
- Level of Abstraction: Relational algebra operates at a lower level of abstraction, manipulating relations directly through a set of operations. Relational calculus operates at a higher level of abstraction, using logical statements and variables to describe the desired result.
- Mathematical Foundation: Relational calculus has a firm basis in mathematical logic and predicate calculus, while relational algebra is based on set theory and algebraic operations.
The similarities between relational calculus and relational algebra are:
- Relational Model: Both are formal languages for querying and manipulating data in the relational model.
- Expressive Power: Both have equivalent expressive power, meaning they can express the same set of queries, although one may be more concise or convenient for certain types of queries.
- Basis for Query Languages: Both have influenced the development of practical query languages like SQL, which incorporates constructs from both relational algebra and relational calculus.
8.9. How does tuple relational calculus differ from domain relational calculus?
Tuple relational calculus and domain relational calculus are two variations of relational calculus, differing in the way they represent and manipulate data:
- Variables and Ranges:
- In tuple relational calculus, variables range over tuples (rows) of a relation.
- In domain relational calculus, variables range over the domains (values) of attributes.
- Quantification:
- In tuple relational calculus, quantifiers (∃, ∀) are applied to tuple variables, quantifying over tuples.
- In domain relational calculus, quantifiers are applied to domain variables, quantifying over attribute values.
- Representation of Tuples:
- In tuple relational calculus, a tuple is represented as a single variable.
- In domain relational calculus, a tuple is represented by a collection of domain variables, one for each attribute.
- Formulas and Expressions:
- In tuple relational calculus, formulas and expressions involve tuple variables and comparisons between tuples.
- In domain relational calculus, formulas and expressions involve domain variables and comparisons between attribute values.
- Basis for Query Languages:
- SQL is primarily based on tuple relational calculus, with some constructs inspired by domain relational calculus.
- Query-By-Example (QBE) is a graphical query language based on domain relational calculus.
While tuple relational calculus and domain relational calculus have different representations and approaches, they have equivalent expressive power and can express the same set of queries in the relational model.
8.10. Discuss the meanings of the existential quantifier (∃) and the universal quantifier (∀).
The existential quantifier (∃) and the universal quantifier (∀) are logical quantifiers used in predicate logic and relational calculus.
- Existential Quantifier (∃):
- The existential quantifier (∃) is used to express the existence of at least one element or tuple that satisfies a given condition or formula.
- In relational calculus, an expression like ∃t ∈ R (P(t)) means “there exists at least one tuple t in relation R such that the predicate P(t) is true.”
- It is used to specify queries that retrieve tuples satisfying a certain condition.
- Universal Quantifier (∀):
- The universal quantifier (∀) is used to express that a condition or formula holds for all elements or tuples in a given set or relation.
- In relational calculus, an expression like ∀t ∈ R (P(t)) means “for all tuples t in relation R, the predicate P(t) is true.”
- It is used to specify queries that check if a condition holds for all tuples in a relation, or to express constraints or integrity rules.
The existential quantifier is used more commonly in query formulation, as most queries aim to retrieve tuples that satisfy certain conditions. The universal quantifier is often used in constraint specifications or to express queries that check if all tuples in a relation satisfy a given condition.
8.11. Define the following terms with respect to the tuple calculus: tuple variable, range relation, atom, formula, and expression.
In the context of tuple relational calculus:
- Tuple Variable: A variable that ranges over tuples (rows) of a relation. It represents an entire tuple as a single entity.
- Range Relation: The relation over which a tuple variable ranges. It specifies the set of tuples that the variable can take values from.
- Atom: An atomic formula or condition involving tuple variables and constants, connected by comparison operators (=, <, >, ≤, ≥, ≠) or other predicates.
- Formula: A well-formed logical expression constructed from atoms, logical connectives (AND, OR, NOT), and quantifiers (∃, ∀) applied to tuple variables.
- Expression: A tuple relational calculus expression consists of a formula that defines the desired tuples, along with the specification of the range relation(s) and tuple variable(s) used in the formula.
For example, in the expression ∃t ∈ EMPLOYEE (t.Dno = 5 AND t.Salary > 50000):
- t is a tuple variable
- EMPLOYEE is the range relation
- t.Dno = 5 and t.Salary > 50000 are atoms
- t.Dno = 5 AND t.Salary > 50000 is a formula
- The entire expression is a tuple relational calculus expression
8.12. Define the following terms with respect to the domain calculus: domain variable, range relation, atom, formula, and expression.
In the context of domain relational calculus:
- Domain Variable: A variable that ranges over the domain (set of possible values) of an attribute in a relation.
- Range Relation: The relation over which the domain variables range. It specifies the set of tuples that the variables can take values from.
- Atom: An atomic formula or condition involving domain variables and constants, connected by comparison operators (=, <, >, ≤, ≥, ≠) or other predicates.
- Formula: A well-formed logical expression constructed from atoms, logical connectives (AND, OR, NOT), and quantifiers (∃, ∀) applied to domain variables.
- Expression: A domain relational calculus expression consists of a formula that defines the desired tuples, along with the specification of the range relation(s) and domain variable(s) used in the formula.
For example, in the expression ∃d1, d2, d3 ∈ EMPLOYEE (d1 = 5 AND d2 > 50000 AND d3 = ‘Clerk’):
- d1, d2, and d3 are domain variables
- EMPLOYEE is the range relation
- d1 = 5, d2 > 50000, and d3 = ‘Clerk’ are atoms
- d1 = 5 AND d2 > 50000 AND d3 = ‘Clerk’ is a formula
- The entire expression is a domain relational calculus expression
8.13. What is meant by a safe expression in relational calculus?
A safe expression in relational calculus is an expression that guarantees a finite result set and avoids potential infinite loops or undefined values during evaluation.
An expression is considered safe if it satisfies the following conditions:
- Range Variable Safety: Every tuple variable or domain variable in the expression must be bound to a range relation. In other words, each variable must be quantified and associated with a specific relation from which it takes values.
- No Unbound Variables: There should be no free or unbound variables in the expression, meaning that all variables must be quantified (either existentially or universally).
- No Nested Subqueries: The expression should not contain nested subqueries or correlated subqueries, as these can potentially lead to infinite loops or undefined results.
- No Arithmetic Operations on Attributes: The expression should not involve arithmetic operations on attribute values, as this can lead to undefined results if the attribute values are null or non-numeric.
Safe expressions ensure that the evaluation of the expression terminates and produces a well-defined result set, avoiding potential issues like infinite loops, null values propagating through arithmetic operations, or undefined results due to unbound variables or nested subqueries.
8.14. When is a query language called relationally complete?
A query language is called relationally complete if it has the capability to express any query that can be expressed using the relational algebra or relational calculus.
In other words, a relationally complete query language can express all queries that can be formulated within the relational model, without any limitations or restrictions on the types of queries that can be expressed.
For a query language to be considered relationally complete, it must be able to express the following types of queries:
- Selection: Retrieving tuples that satisfy a given condition or predicate.
- Projection: Retrieving specific attributes (columns) from a relation.
- Join: Combining tuples from two or more relations based on a join condition.
- Set Operations: Performing set operations like union, intersection, and difference on relations.
- Aggregation: Performing aggregate functions like sum, count, average, etc., on groups of tuples.
- Nested Queries: Expressing queries that involve subqueries or correlated queries.
- Complex Conditions: Expressing queries with complex conditions involving logical operators, quantifiers, and comparisons.
If a query language can express all these types of queries without any limitations, it is considered relationally complete. SQL, the standard query language for relational databases, is considered relationally complete, as it can express any query that can be form