UMBC CMSC 461 Spring '99 

CSEE | 461 | 461 S'99 | lectures | news | help 


Lecture 26
Integrity Constraints

Chapter 6, Database Systems Concepts, by Silberschatz, et al, 1997

Portions reproduced with permission

Integrity constraints provide a means of ensuring that changes made to the database by authorized users do not result in the loss of data consistency.

Two types of constraints discussed so far this semester have been:

 Domain Constraints

There is a domain of possible values associated with every attribute. Domain constraints are the most elementary form for integrity constraint. It is possible for several attributes to have the same domain, while others will be distinctly different.

The principle behind attribute domains is similar to that behind typing of variables in programming languages. The benefits of strong typing are even more important when remembering that the users of the system will not be computer experts. However, there are only a small number of domains built into the DBMS and must be extended when developing a database application.

The check clause in SQL-92 that will do that extension:

create domain hourly-wage numeric(5,2) constraint wage-value test check( value >= 4.00)

In this example, the hourly-wage is declared to be a decimal number with a total of five digits, two of which are placed after the decimal point and has a range of 4.00 to 999.99. The check clause can be used for things like value not null, or value in ("Checking", "Savings").

Referential Integrity

We may wish to ensure that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation. An example is the relation takes(student-nr, course-nr). We want to make sure that there is a record in the student(student-nr, address, city, state) with the same student number. We also want to make sure that there is a record in the courses(course-nr, course-title, credits) with a matching course-nr.

Basic Concepts

When doing a natural join on two relations, there can not be any tuple in either relation that does not have a corresponding tuple in the second relation. If there is, it is called a dangling tuple. In our example, this is unacceptable, but this is not always true. When it is unacceptable, we want to make sure that it does not happen by using referential integrity constraints or subset dependencies.

Formally stated, let r1(R1) and r2(R2) be relations with primary keys K1 and K2 respectively. We say that a subset a of R2 is a foreign key referencing K1 in relation r1 if it is required that, for every t2 in r2, there must be a tuple t1 in r1 such that t1[K1] = t2[a ]

Database Modification

Based on the above, we must make sure that when inserting a record into table 2, there must already be a record in table 1 that have a candidate key that matches the value being inserted into table 2.

For deletions, we can not delete a record from table 1, if the record to be deleted is being pointed to by a record in table 2.

Updates have two facets.

Referential Integrity in SQL

When the table is being created we can specify primary, candidate and foreign keys.

An example is:

    Create table customer
    (
            customer-name     char(20) not null,
            customer-street   char(30),
            customer-city     char(30),
            primary key (customer-name)
    )

    Create table branch
    (
            branch-name     char(15) not null,
            branch-city     char(30),
            assets          integer,
            primary key (branch-name)
     )

     Create table account
    (
            account-number     char(10) not null,
            branch-name        char(15),
            balance            integer,
            primary key (account-number),
            foreign key (branch-name) references branch
            check (balance >= 0 )
    )

    Create table depositor
    (
            customer-name     char(20) not null,
            account-number    char(10) not null,,
            primary key (customer-name, account-number)
            foreign key (customer-name) references customer
            foreign key (account-number) references account
    )

We can use the following form, if the foreign key is a single column:

            Branch-name       char(15) references branch

When a referential integrity constraint is violated, the normal procedure is to reject the action that caused the violation. However, on the foreign key clause, we can add a couple of additional specifications:

Additionally, there are some more rules to help us.

It is best, however, to ensure that all columns of unique and foreign key are declared to be nonnull.

Assertions

An assertion is a predicate expressing a condition that we wish the database to always satisfy.

The SQL-92 standard provides the following form for the assertion declaration:

create assertion <assertion-name> check <predicate>

The two previous constraints would be:

  create assertion sum-constraint check
      (not exists (select * from branch
           where ( select sum(amount) from loan
                   where loan.branch-name = branch.branch-name)
              >= (select sum(amount) from account
                    where loan.branch-name = branch.branch-name)))

  create assertion balance check
      (not exists (select * from  loan
           where not exists (select * 
                             from   borrow, depositor, account
                             where  loan.loan-number = borrower.loan
                             and borrower.customer-name = 
                                    depositor.customer-name
                              and depositor.account-number = 
                                    account.account-number
                              and account.balance = 1000)))

  

Triggers

 A trigger is a statement that is executed automatically by the system as a side effect of a modification of the database. To design a trigger mechanism, we must meet two requirements:

Triggers are useful mechanisms for alerting humans, or for performing certain tasks automatically when certain conditions are met.

The SQL-92 standard does not include triggers.

Dependencies

Functional dependencies (FD) are are type of constraint that is based on keys.  A superkey is defined as in the relational schema R , where:

a subset K of R is a subkey of R if, in any legal relation r(R), for all pairs, t1 and t2 in tuple r such that t1 is not equal to t2 then  t1[K] is not equal to t2[K].

Or, no two rows (tuples) have the same value in the attribute(s) K, which is the key. Now, if there are two attributes (or sets of attributes) A and B that are legal in the relation schema R, we can have a functional dependency where

A implies B

for all pairs of tuples such that t1[A] is equal to t2[A] and  t1[B] is equal to t2[B].  This allows us to state that K is a superkey of R if K implies R.  For example, in a relation that has names and social security numbers, whenever your Social Security number is the student ID, the name in that tuple can only contain your name.  That is because your name is not unique, but your Social Security is.  If I go to the Social Security Administration and search their database for the name "Gary Burt", the results is a large number of people.  If I search of the social security number "123-45-6789", the result is one and only one person.

Another example is in the loan information that we looked at before:

Loan-info-schema = (branch-name, loan-name, customer-name, amount)

it can be shown that the loan-number implies both the amount and the branch-name.  It does not imply the customer-name because there may be more than one person listed on the load, such as a husband and wife, or parent and child (when the parent co-signs the loan).

Functional dependencies:

Some relations are said to be trivial when they are satisfied by all relations:.

Closure of a Set of Functional Dependencies

It is not enough to look at a single FD.  All FDs must be considered in a relation!  Given the schema R =  ( A, B, C, G, H, I) and the FDs of:

A implies B
A implies C
CG implies H
CG implies I
B implies H

We can show that A implies H because A implies B which implies H.

The notional for a FD is F.  The notation of F+ is the set of all FDs logically implied by F.  There is a set of rules, called Armstrong's axioms, that we can use to to compute closure.

These can be simplified if we also use:

Using mathematical principles, we can not test a set of attributes to see if they are a legal superkey.

Pitfalls in Relational-Database Design

Obviously, we can have good and bad designs.  Among the undesirable design items are:

 The relation lending with the schema is an example of a bad design:

Lending-Schema=(branch-name, branch-city, assets, cutomer-name, loan-number, amount)

 

branch-name

branch-city

assets

customer-name

loan-number

amount

Downtown

Brooklyn

9000000

Jones

L-17

1000

Redwood

Palo Alto

2100000

Smith

L-23

2000

Perryridge

Horseneck

1700000

Hayes

L-15

1500

Downtown

Brooklyn

9000000

Jackson

L-14

1500

Mianus

Horseneck

400000

Jones

L-93

500

Round Hill

Horseneck

8000000

Turner

L-11

900

Pownal

Bennington

300000

Williams

L-29

1200

North Town

Rye

3700000

Hayes

L-16

1300

Downtown

Brooklyn

9000000

Johnson

L-23

2000

Perryridge

Horseneck

1700000

Glenn

L-25

2500

Brighton

Brooklyn

7100000

Brooks

L-10

2200

Looking at the Downtown and Perryridge, when a new loan is added, the branch-city and assets must be repeated.  That makes updating the table more difficult, because the update must guarantee that all tuples are updated.  Additional problems come from having two people take out one loan (L-23).  More complexity is involved when Jones took out a loan at a second branch (maybe one near home and the other near work.)  Notice that  there is no way to represent information on a branch unless there is a loan.

Decomposition

The obvious solution is that we should decompose this relation.  As an alternative design, we can use the Decomposition rule: If A implies BC then A implies B and A implies C.

This gives us the schemas:

 

branch-name

branch-city

assets

customer-name

Downtown

Brooklyn

9000000

Jones

Redwood

Palo Alto

2100000

Smith

Perryridge

Horseneck

1700000

Hayes

Downtown

Brooklyn

9000000

Jackson

Mianus

Horseneck

400000

Jones

Round Hill

Horseneck

8000000

Turner

Pownal

Bennington

300000

Williams

North Town

Rye

3700000

Hayes

Downtown

Brooklyn

9000000

Johnson

Perryridge

Horseneck

1700000

Glenn

Brighton

Brooklyn

7100000

Brooks

 
 

customer-name

loan-number

amount

Jones

L-17

1000

Smith

L-23

2000

Hayes

L-15

1500

Jackson

L-14

1500

Jones

L-93

500

Turner

L-11

900

Williams

L-29

1200

Hayes

L-16

1300

Johnson

L-23

2000

Glenn

L-25

2500

Brooklyn

L-10

2200
     

Then when we need to get back to the original table, we can do a natural join on the two relations branch-customer and customer-loan.

Evaluating this design, how does it compare to the first version?

Worse, there is a new problem!  When we do the natural join, we get back four additional tuples that did not exists in the original table:

We are no long able to represent in the database information about which customers are borrows from which branch.  This is called a lossy decomposition or lossy-join decomposition.  A decomposition that is not a lossy-decomposition is a lossless-join decomposition. Lossless-joins are a requirement for good design  and this causes constraints on the set of possible relations.  We say that a relation is legal if it satisfies all rules, or constraints, imposed.

The proper way to decomposition this example so that we can have a lossless-join is to use three relations.

Normalization Using Functional Dependencies

Using FDs, it is possible to define several normal forms to help develop good database designs. The two that we will example are Boyce-Codd Normal Form (BCNF) and Third Normal Form (3NF).  The requirements for good decomposition are

We've discussed the lossless decomposition.  Dependency preservation specifies that the design insure that when an update is made to the database, that it does not create an illegal relation. In regard to the repetition of information, it is necessary to include the key of another table, so that the joins can be properly formed.  That is the only information that should be in both tables!

Boyce-Codd Normal Form

A relation schema R is said to be in BCNF with respect to a set F of FDs, if for all FDs in F+ of the form A implies B, where A is a subset of R and B is a subset of R and at least one of the following rules is true:

Without doing the mathematically proofs, it can be shown that the BCNF results in dependency preservation.

Third Normal Form

A relation schema R is in 3NF with respect to a set F of FDs if, for all FDs in the F+ of the form A implies B, where A is a subset of R and B is a subset of R and at least one of the following rules is true:


CSEE | 461 | 461 S'99 | lectures | news | help