Skip to content

Relational Database Management System (RDBMS)

The relational database model was introduced by E.F. Codd in 1970 of IBM's research laboratory in the paper titled "A Relational Model of Data for Large Shared Data Banks"

  • It is based on the concept of tables, where data is stored in rows and columns
  • Rather than using pointers to navigate between related entities, redundant data is used to link related records in different tables
  • A few related tables form a relational database
  • The data can be accessed or reassembled in many different ways without having to reorganize the database tables

Relational Algebra allows us to retrieve data efficiently

Relationships: tell you how much of the data from foreign key field can be seen in the primary key column of the table that data is related to and vice-versa

  • Cardinality constraints

Types of Relationships

  1. One-to-One Relationship: A relationship where each record in one table is related to only one record in another table

    Example: A person has only one passport

  2. One-to-Many Relationship: A relationship where each record in one table is related to one or more records in another table

    • One value from a column under a table can be found many times in the column in the other table

    Example: A customer can have multiple orders

  3. Many-to-Many Relationship: A relationship where each record in one table is related to one or more records in another table, and vice versa

    Example: A student can enrol in multiple courses, and a course can have multiple students

Key Concepts

  • Table: A table is a collection of data organized into rows and columns

    • Modelled after a real-world entity
  • Column: A column is a set of data values of a particular type

    • Some attribute of the entity
  • Row: A row is a single record in a table

    • A single instance of the entity
  • Cell: A cell is a single data value in a table

    • A single value for a single instance
    • The intersection of a row and a column
  • Field: A field is a single piece of data in a table

  • Record: A record is a complete set of fields in a table

  • Primary Key: A primary key is a unique identifier for each row in a table

  • Foreign Key: A foreign key is a column that references a primary key in another table

  • Index: An index is a data structure that improves the speed of data retrieval operations on a database table

  • View: A view is a virtual table that is based on the result of a SELECT query

  • Transaction: A transaction is a single unit of work that is performed on a database

  • Foreign key

  • Referential integrity

  • One-to-many relationship

  • One-to-one relationship

  • Many-to-many relationship

  • Data type

  • Null value

  • Default value

  • Auto increment column

  • Relational database

  • Table

  • Column

  • Row

  • Cell

  • Primary key

  • Composite primary key

  • Non-primary key (unique key)

  • Index

Transactional

Nature of Transactions:

  • Usually Transactions are used to change and modify data
  • However, it is perfectly normal to have a read only transaction
  • Example, you want to generate a report and you want to get consistent snapshot based at the time of transaction

Example:

sql
-- Transaction

-- ACCOUNT_ID BALANCE
--    1         $1000
--    2         $500

-- Send $100 From Account 1 to Account 2

BEGIN TX1

-- Check if ACCOUNT 1 contains the amount
SELECT BALANCE FROM ACCOUNT WHERE ID = 1

-- If BALANCE > 100 then send amount
UPDATE ACCOUNT SET BALANCE = BALANCE - 100 WHERE ID = 1
UPDATE ACCOUNT SET BALANCE = BALANCE + 100 WHERE ID = 2

COMMIT TX1

ACID Properties

ACID is a set of properties that guarantee that database transactions are processed reliably

  1. Atomicity
  2. Isolation
  3. Consistency
  4. Durability

Atomicity

  • All queries in a transaction must succeed
  • If one query fails, all prior successful queries in the transaction should rollback
  • If the database went down prior to a commit of a transaction, all the successful queries in the transactions should rollback

Example:

sql
-- Transaction

-- ACCOUNT_ID BALANCE
--    1         $1000
--    2         $500

-- Send $100 From Account 1 to Account 2

BEGIN TX1

-- Check if ACCOUNT 1 contains the amount
SELECT BALANCE FROM ACCOUNT WHERE ID = 1

-- If BALANCE > 100 then send amount
UPDATE ACCOUNT SET BALANCE = BALANCE - 100 WHERE ID = 1

-- Database crashed, last update failed

-- ACCOUNT_ID BALANCE
--    1         $900
--    2         $500
  • After we restarted the machine the first account has been debited but the other account has not been credited
  • This is really bad as we just lost data, and the information is inconsistent
  • An atomic transaction is a transaction that will rollback all queries if one or more queries failed
  • The database should clean this up after restart

Isolation

  • Can my in-flight transaction see changes made by other transactions?

Read phenomena: The ANSI/ISO standard SQL 92 refers to three different read phenomena when a transaction retrieves data that another transaction might have updated

Example:

idnameage
1Alice20
2Bob25
  1. Dirty reads: A dirty read (aka uncommitted dependency) occurs when a transaction retrieves a row that has been updated by another transaction that is not yet committed

    sql
    -- Transaction 1
    BEGIN;
    SELECT age FROM users WHERE id = 1;
    -- retrieves 20
    
    -- Transaction 2
    BEGIN;
    UPDATE users SET age = 21 WHERE id = 1;
    -- no commit here
    
    -- Transaction 1
    SELECT age FROM users WHERE id = 1;
    -- READ UNCOMMITTED retrieves 21 (dirty read)
    -- READ COMMITTED retrieves 20 (dirty read has been avoided)
    -- REPEATABLE READ retrieves 20 (dirty read has been avoided)
    -- SERIALIZABLE retrieves 20 (dirty read has been avoided)
    COMMIT;
    
    -- Transaction 2
    ROLLBACK;
  2. Non-repeatable reads: A non-repeatable read occurs when a transaction retrieves a row twice and that row is updated by another transaction that is committed in between

    sql
    -- Transaction 1
    BEGIN;
    SELECT age FROM users WHERE id = 1;
    -- retrieves 20
    
    -- Transaction 2
    BEGIN;
    UPDATE users SET age = 21 WHERE id = 1;
    COMMIT;
    
    -- Transaction 1
    SELECT age FROM users WHERE id = 1;
    -- READ UNCOMMITTED retrieves 21 (non-repeatable read)
    -- READ COMMITTED retrieves 21 (non-repeatable read)
    -- REPEATABLE READ retrieves 20 (non-repeatable read has been avoided)
    -- SERIALIZABLE retrieves 20 (non-repeatable read has been avoided)
    COMMIT;
  3. Phantom reads: A phantom read occurs when a transaction retrieves a set of rows twice and new rows are inserted into or removed from that set by another transaction that is committed in between

    sql
    -- Transaction 1
    BEGIN;
    SELECT name FROM users WHERE age > 17;
    
    -- Transaction 2
    BEGIN;
    INSERT INTO users VALUES (3, 'Carol', 26);
    COMMIT;
    
    -- Transaction 1
    SELECT name FROM users WHERE age > 17;
    -- READ UNCOMMITTED retrieves Alice, Bob and Carol (phantom read)
    -- READ COMMITTED retrieves Alice, Bob and Carol (phantom read)
    -- REPEATABLE READ retrieves Alice, Bob and Carol (phantom read)
    -- SERIALIZABLE retrieves Alice and Bob (phantom read has been avoided)
    COMMIT;
  • Lost updates:

    sql
    -- Transaction 1
    UPDATE users SET age = age + 10 WHERE id = 1;
    
    -- Transaction 2
    UPDATE users SET age = age + 50 WHERE id = 1;
    COMMIT;
    
    -- Transaction 1
    SELECT age FROM users WHERE id = 1;
    -- retrieves 70
    -- expected 30
    -- UPDATE in transaction was lost
  • Isolation Levels for in-flight transactions:

    • Read uncommitted: No Isolation, any change from the outside is visible to the transaction, committed or not

    • Read committed: Each query in a transaction only sees committed changes by other transactions

    • Repeatable Read: The transaction will make sure that when a query reads a row, that row will remain unchanged while its running

    • Snapshot: Each query in a transaction only sees changes that have been committed up to the start of the transaction. It's like a snapshot version of the database at that moment

    • Serializable: Transactions are run as if they serialized one after the other

    • Each DBMS implements Isolation level differently

  • Isolation levels vs read phenomena:

Isolation levelDirty readNon-repeatable readPhantom read
Serializablenonono
Repeatable readnonoyes
Read committednoyesyes
Read uncommittedyesyesyes

Database Implementation of Isolation:

  • Each DBMS implements Isolation level differently

  • Pessimistic: Row level locks, table locks, page locks to avoid lost updates

  • Optimistic: No locks, just track if things changed and fail the transaction if so

  • Repeatable read "locks" the rows it reads but it could be expensive if you read a lot of rows, postgres implements RR as snapshot. That is why you don't get phantom reads with postgres in repeatable read

  • Serializable are usually implemented with optimistic concurrency control, you can implement it pessimistically with SELECT FOR UPDATE

Consistency

Consistency in Data:

  • Defined by the user
  • Referential integrity (foreign keys)
  • Atomicity
  • Isolation

Example:

  • pictures table
ID (PK)BLOBLIKES
1xx2
2xx1
  • picture_likes table
USER (PK)PICTURE_ID (PK)(FK)
Jon1
Edmond1
Jon2

Spot inconsistency in this data:

  • pictures table
ID (PK)BLOBLIKES
1xx5
2xx1
  • picture_likes table
USER (PK)PICTURE_ID (PK)(FK)
Jon1
Edmond1
Jon2
Edmond4

Consistency in reads:

  • If a transaction committed a change will a new transaction immediately see the change?
  • Affects the system as a whole
  • Relational and NoSQL databases suffer from this
  • Eventual consistency

Durability

  • Once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors

Database Design / Data Modelling

  1. It's an ongoing process where basic Database is designed:

    • Understand business data
    • Create a logical design
      • Tables, Indexes, Columns (ER Diagram)
  2. Normalization is done:

    • Eliminate / Reduce:
      • Data Redundancy
      • Data Anomalies
      • Data Inconsistency
De-normalized DatabaseNormalized Database
Less JoinsMore Joins
More StorageLess Storage
High Data RedundancyLow Data Redundancy
  • Database designer: plot the entire database system on a canvas using a visualization tool

  • Entity-Relationship (ER) diagram

  • Relational Schema: an existing idea of how the database must be organized

    • Represents the concept database administrators must implement

    • Depict how a database is organized

    • Blueprints, or a plan for a database

    • Database Schema

      • Table name
      • Primary Key: (column name will be underlined)
      • Foreign Key: (add (FK) in after the column name)
      • Other fields
  • Database Management = database design + creation + manipulation

  • Database Administrator: maintenance of database

Normalization

Schema

Database

Main goal: organize huge amounts of data that can be quickly retrieved

  • Entity: the smallest unit that can contain a meaningful set of data

    • An object we want to model & store information about
  • Data inside a column is known as data value

  • Data values in a row make up a record (or row)(horizontal entity), a record is each entry that exists in a table

  • Field (or column)(vertical entity)(entity instance): a column in a table containing specific information about every record in the table

Table

Tables are data organized in columns and rows

  • An entity or database object

A column (or a set of columns) whose value exists and is unique for every record in a table is called a primary key

  • Each table can have 1 and only 1 primary key
  • Cannot contain NULL values
  • Primary key may be composed of a set of columns
  • Primary keys are the unique identifiers of a table
  • Not all tables will have a primary key

Foreign key: identifies the relationships between tables, not the tables themselves

Unique Key: to specify that given field shouldn't have duplicate data

ValuesPrimary KeyUnique Key
NULL valuesnoyes
Number of keys10, 1 or more
Multiple columnsyesyes

View

Index

Index is a lookup table for specific columns

  • Indexes are expensive

Reference