UMBC CMSC 461 Spring '99

Lecture 26
Integrity Constraints

Chapter 6, Database Systems Concepts, by Silberschatz, et al, 1997

Portions reproduced with permission

Integrity constraints provide a means of ensuring that changes made to the database by authorized users do not result in the loss of data consistency.

Two types of constraints discussed so far this semester have been:

Key declarations: This is using certain keys to form a candidate key for a given entity set. Thus no two entities in the set can be inserted or updated so that they have the same value on the candidate key.

Form of a relationship: This refers to the cardinality (many-to-many, one-to-many, and one-to-one).

Domain Constraints

There is a domain of possible values associated with every attribute. Domain constraints are the most elementary form for integrity constraint. It is possible for several attributes to have the same domain, while others will be distinctly different.

The principle behind attribute domains is similar to that behind typing of variables in programming languages. The benefits of strong typing are even more important when remembering that the users of the system will not be computer experts. However, there are only a small number of domains built into the DBMS and must be extended when developing a database application.

The check clause in SQL-92 that will do that extension:

create domain hourly-wage numeric(5,2) constraint wage-value test check( value >= 4.00)

In this example, the hourly-wage is declared to be a decimal number with a total of five digits, two of which are placed after the decimal point and has a range of 4.00 to 999.99. The check clause can be used for things like value not null, or value in ("Checking", "Savings").

Referential Integrity

We may wish to ensure that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation. An example is the relation takes(student-nr, course-nr). We want to make sure that there is a record in the student(student-nr, address, city, state) with the same student number. We also want to make sure that there is a record in the courses(course-nr, course-title, credits) with a matching course-nr.

Basic Concepts

When doing a natural join on two relations, there can not be any tuple in either relation that does not have a corresponding tuple in the second relation. If there is, it is called a dangling tuple. In our example, this is unacceptable, but this is not always true. When it is unacceptable, we want to make sure that it does not happen by using referential integrity constraints or subset dependencies.

Formally stated, let r₁(R₁) and r₂(R₂) be relations with primary keys K₁ and K₂ respectively. We say that a subset a of R₂ is a foreign key referencing K₁ in relation r₁ if it is required that, for every t₂ in r₂, there must be a tuple t₁ in r₁ such that t₁[K₁] = t₂[a ]

Database Modification

Based on the above, we must make sure that when inserting a record into table 2, there must already be a record in table 1 that have a candidate key that matches the value being inserted into table 2.

For deletions, we can not delete a record from table 1, if the record to be deleted is being pointed to by a record in table 2.

Updates have two facets.

The value for the foreign key in table 2 can not be changed from a valid value to a value that is not in table.
The value of the key in table 1 can not be changed if there are records in table 2 to point to the record in table 1.

Referential Integrity in SQL

When the table is being created we can specify primary, candidate and foreign keys.

primary key clause includes a list of the attributes that form the primary key.

unique clause includes a list of the attributes that constitute a candidate key.

foreign key clause includes a list of the attributes that constitute the foreign key and the name of the relation referenced by the foreign key.

An example is:

    Create table customer
    (
            customer-name     char(20) not null,
            customer-street   char(30),
            customer-city     char(30),
            primary key (customer-name)
    )

    Create table branch
    (
            branch-name     char(15) not null,
            branch-city     char(30),
            assets          integer,
            primary key (branch-name)
     )

     Create table account
    (
            account-number     char(10) not null,
            branch-name        char(15),
            balance            integer,
            primary key (account-number),
            foreign key (branch-name) references branch
            check (balance >= 0 )
    )

    Create table depositor
    (
            customer-name     char(20) not null,
            account-number    char(10) not null,,
            primary key (customer-name, account-number)
            foreign key (customer-name) references customer
            foreign key (account-number) references account
    )

We can use the following form, if the foreign key is a single column:

            Branch-name       char(15) references branch

When a referential integrity constraint is violated, the normal procedure is to reject the action that caused the violation. However, on the foreign key clause, we can add a couple of additional specifications:

on delete cascade : This would delete the records that depended on the key.

on update cascade: This would update all the records that depended on the key.

Additionally, there are some more rules to help us.

All attributes of a primary key are implicitly declared to be not null.
Attributes of a unique declaration are allowed to be null.
Attributes of foreign keys are allowed to be null, provided they have not otherwise been declared to be nonnull.

It is best, however, to ensure that all columns of unique and foreign key are declared to be nonnull.

Assertions

An assertion is a predicate expressing a condition that we wish the database to always satisfy.

The sum of all loan amounts for each branch must be less than the sum of all balances at the branch.
Each load has at least one customer who maintains an account with a minimum balance of $1000.000

The SQL-92 standard provides the following form for the assertion declaration:

create assertion <assertion-name> check <predicate>

The two previous constraints would be:

  create assertion sum-constraint check
      (not exists (select * from branch
           where ( select sum(amount) from loan
                   where loan.branch-name = branch.branch-name)
              >= (select sum(amount) from account
                    where loan.branch-name = branch.branch-name)))

  create assertion balance check
      (not exists (select * from  loan
           where not exists (select * 
                             from   borrow, depositor, account
                             where  loan.loan-number = borrower.loan
                             and borrower.customer-name = 
                                    depositor.customer-name
                              and depositor.account-number = 
                                    account.account-number
                              and account.balance = 1000)))

Triggers

A trigger is a statement that is executed automatically by the system as a side effect of a modification of the database. To design a trigger mechanism, we must meet two requirements:

Specify the condition under which the trigger is to be executed.
Specify the actions to be taken when the trigger executees.

Triggers are useful mechanisms for alerting humans, or for performing certain tasks automatically when certain conditions are met.

The SQL-92 standard does not include triggers.

Dependencies

Functional dependencies (FD) are are type of constraint that is based on keys. A superkey is defined as in the relational schema R , where:

a subset K of R is a subkey of R if, in any legal relation r(R), for all pairs, t₁and t₂ in tuple r such that t₁is not equal to t₂ then t₁[K]is not equal to t₂[K].

Or, no two rows (tuples) have the same value in the attribute(s) K, which is the key. Now, if there are two attributes (or sets of attributes) A and B that are legal in the relation schema R, we can have a functional dependency where

A implies B

for all pairs of tuples such that t₁[A]is equal to t₂[A] and t₁[B]is equal to t₂[B]. This allows us to state that K is a superkey of R if K implies R. For example, in a relation that has names and social security numbers, whenever your Social Security number is the student ID, the name in that tuple can only contain your name. That is because your name is not unique, but your Social Security is. If I go to the Social Security Administration and search their database for the name "Gary Burt", the results is a large number of people. If I search of the social security number "123-45-6789", the result is one and only one person.

Another example is in the loan information that we looked at before:

Loan-info-schema = (branch-name, loan-name, customer-name, amount)

it can be shown that the loan-number implies both the amount and the branch-name. It does not imply the customer-name because there may be more than one person listed on the load, such as a husband and wife, or parent and child (when the parent co-signs the loan).

Functional dependencies:

specify a set of constraints on a legal relation.
test relations to see if they are legal.

Some relations are said to be trivial when they are satisfied by all relations:.

A implies A

A implies B and B implies A.

Closure of a Set of Functional Dependencies

It is not enough to look at a single FD. All FDs must be considered in a relation! Given the schema R = ( A, B, C, G, H, I) and the FDs of:

A implies B
A implies C
CG implies H
CG implies I
B implies H

We can show that A implies H because A implies B which implies H.

The notional for a FD is F. The notation of F⁺ is the set of all FDs logically implied by F. There is a set of rules, called Armstrong's axioms, that we can use to to compute closure.

Reflexivity rule: If A is a set of attributes, and B is a set of attributes that are completely contained in A, the A implies B.

Augmentation rule: If A implies B, and C is a set of attributes, then if A implies B, then AC implies BC.

Transitivity rule: If A implies B and B implies C, then A implies C.

These can be simplified if we also use:

Union rule: If A implies B and A implies C, the A implies AC.

Decomposition rule: If A implies BC then A implies B and A implies C.

Pseudotransitivity rule: If A implies B and CB implies D, then AC implies D.

Using mathematical principles, we can not test a set of attributes to see if they are a legal superkey.

Pitfalls in Relational-Database Design

Obviously, we can have good and bad designs. Among the undesirable design items are:

Repetition of information
Inability to represent certain information

The relation lending with the schema is an example of a bad design:

Lending-Schema=(branch-name, branch-city, assets, cutomer-name, loan-number, amount)

branch-name	branch-city	assets	customer-name	loan-number	amount
Downtown	Brooklyn	9000000	Jones	L-17	1000
Redwood	Palo Alto	2100000	Smith	L-23	2000
Perryridge	Horseneck	1700000	Hayes	L-15	1500
Downtown	Brooklyn	9000000	Jackson	L-14	1500
Mianus	Horseneck	400000	Jones	L-93	500
Round Hill	Horseneck	8000000	Turner	L-11	900
Pownal	Bennington	300000	Williams	L-29	1200
North Town	Rye	3700000	Hayes	L-16	1300
Downtown	Brooklyn	9000000	Johnson	L-23	2000
Perryridge	Horseneck	1700000	Glenn	L-25	2500
Brighton	Brooklyn	7100000	Brooks	L-10	2200

Looking at the Downtown and Perryridge, when a new loan is added, the branch-city and assets must be repeated. That makes updating the table more difficult, because the update must guarantee that all tuples are updated. Additional problems come from having two people take out one loan (L-23). More complexity is involved when Jones took out a loan at a second branch (maybe one near home and the other near work.) Notice that there is no way to represent information on a branch unless there is a loan.

Decomposition

The obvious solution is that we should decompose this relation. As an alternative design, we can use the Decomposition rule: If A implies BC then A implies B and A implies C.

This gives us the schemas:

branch-customer-schema = (branch-name, branch-city, assets, customer-name)

customer-loan-schema = (customer-name, loan-number, amount)

branch-name	branch-city	assets	customer-name
Downtown	Brooklyn	9000000	Jones
Redwood	Palo Alto	2100000	Smith
Perryridge	Horseneck	1700000	Hayes
Downtown	Brooklyn	9000000	Jackson
Mianus	Horseneck	400000	Jones
Round Hill	Horseneck	8000000	Turner
Pownal	Bennington	300000	Williams
North Town	Rye	3700000	Hayes
Downtown	Brooklyn	9000000	Johnson
Perryridge	Horseneck	1700000	Glenn
Brighton	Brooklyn	7100000	Brooks

customer-name	loan-number	amount
Jones	L-17	1000
Smith	L-23	2000
Hayes	L-15	1500
Jackson	L-14	1500
Jones	L-93	500
Turner	L-11	900
Williams	L-29	1200
Hayes	L-16	1300
Johnson	L-23	2000
Glenn	L-25	2500
Brooklyn	L-10	2200

Then when we need to get back to the original table, we can do a natural join on the two relations branch-customer and customer-loan.

Evaluating this design, how does it compare to the first version?

Looking at the Downtown and Perryridge, when a new loan is added, the branch-city and assets must be repeated. Problem still exists.
Problems come from having two people take out one loan (L-23). Problem still exists.
More complexity is involved when Jones took out a loan at a second branch. Problem still exists.
Notice that there is no way to represent information on a branch unless there is a loan. Problem still exists.

Worse, there is a new problem! When we do the natural join, we get back four additional tuples that did not exists in the original table:

(Downtown, Brooklyn, 9000000, Jones, L-93, 500)
(Perryridge, Horseneck, 1700000, Hayes, L-16, 1300)
(Mianus, Horseneck, 400000, Jones, L-17, 1000)
(North Town, Rye, 3700000, Hayes, L-15, 1500)

We are no long able to represent in the database information about which customers are borrows from which branch. This is called a lossy decomposition or lossy-join decomposition. A decomposition that is not a lossy-decomposition is a lossless-join decomposition. Lossless-joins are a requirement for good design and this causes constraints on the set of possible relations. We say that a relation is legal if it satisfies all rules, or constraints, imposed.

The proper way to decomposition this example so that we can have a lossless-join is to use three relations.

branch-schema = (branch-name, assets, branch-city)

loan-schema = (branch-name, loan-number, amount)

borrower-schema = (customer-name, loan-number)

Normalization Using Functional Dependencies

Using FDs, it is possible to define several normal forms to help develop good database designs. The two that we will example are Boyce-Codd Normal Form (BCNF) and Third Normal Form (3NF). The requirements for good decomposition are

Lossless-Join Decomposition
Dependency Preservation
Lack of Repetition of Information

We've discussed the lossless decomposition. Dependency preservation specifies that the design insure that when an update is made to the database, that it does not create an illegal relation. In regard to the repetition of information, it is necessary to include the key of another table, so that the joins can be properly formed. That is the only information that should be in both tables!

Boyce-Codd Normal Form

A relation schema R is said to be in BCNF with respect to a set F of FDs, if for all FDs in F⁺ of the form A implies B, where A is a subset of R and B is a subset of R and at least one of the following rules is true:

A implies B is a trivial FD (B is a subset of A)
A is a superkey for schema R

Without doing the mathematically proofs, it can be shown that the BCNF results in dependency preservation.

Third Normal Form

A relation schema R is in 3NF with respect to a set F of FDs if, for all FDs in the F⁺ of the form A implies B, where A is a subset of R and B is a subset of R and at least one of the following rules is true:

A implies B is a trivial FD
A is a superkey for schema R
Each attribute in the result of the expression B-A is contained in a candidate key for R

Lecture 26 Integrity Constraints