Introduction to Databases and SQL

Reference

Selecting Data

  • A relational database stores information in tables, each of which has a fixed set of columns and a variable number of records.
  • A database manager is a program that manipulates information stored in a database.
  • We write queries in a specialized language called SQL to extract information from databases.
  • SQL is case-insensitive.

Sorting and Removing Duplicates

  • The records in a database table are not intrinsically ordered: if we want to display them in some order, we must specify that explicitly.
  • The values in a database are not guaranteed to be unique: if we want to eliminate duplicates, we must specify that explicitly as well.

Filtering

  • Use where to filter records according to Boolean conditions.
  • Filtering is done on whole records, so conditions can use fields that are not actually displayed.

Calculating New Values

  • SQL can perform calculations using the values in a record as part of a query.

Missing Data

  • Databases use null to represent missing information.
  • Any arithmetic or Boolean operation involving null produces null as a result.
  • The only operators that can safely be used with null are is null and is not null.

Aggregation

  • An aggregation function combines many values to produce a single new value.
  • Aggregation functions ignore null values.
  • Aggregation happens after filtering.

Combining Data

  • Every fact should be represented in a database exactly once.
  • A join produces all combinations of records from one table with records from another.
  • A primary key is a field (or set of fields) whose values uniquely identify the records in a table.
  • A foreign key is a field (or set of fields) in one table whose values are a primary key in another table.
  • We can eliminate meaningless combinations of records by matching primary keys and foreign keys between tables.
  • Keys should be atomic values to make joins simpler and more efficient.

Data Hygiene

  • Every value in a database should be atomic.
  • Every record in a database should have a primary key.
  • Database entries should not contain redundant information.

Creating and Modifying Data

  • Database tables are created using queries that specify their names and the names and properties of their fields.
  • Records can be inserted, updated, or deleted using queries.
  • It is simpler and safer to modify data when every record has a unique primary key.

Programming with Databases

  • We usually write database applications in a general-purpose language, and embed SQL queries in it.
  • To connect to a database, a program must use a library specific to that database manager.
  • A program may open one or more connections to a single database, and have one or more cursors active in each.
  • Programs can read query results in batches or all at once.

Glossary

aggregation function
A function that combines multiple values to produce a single new value (e.g. sum, mean, median).
atomic
Describes a value not divisible into parts that one might want to work with separately. For example, if one wanted to work with first and last names separately, the values "Ada" and "Lovelace" would be atomic, but the value "Ada Lovelace" would not.
cascading delete
An SQL constraint requiring that if a given record is deleted, all records referencing it (via foreign key) in other tables must also be deleted.
case insensitive
Treating text as if upper and lower case characters were the same. See also: case sensitive.
case sensitive
Treating upper and lower case characters as different. See also: case insensitive.
cross product
A pairing of all elements of one set with all elements of another.
comma-separated values (CSV)
A common textual representation for tables in which the values in each row are separated by commas.
cross product
A pairing of all elements of one set with all elements of another.
cursor
A pointer into a database that keeps track of outstanding operations.
database manager
A program that manages a database, such as SQLite.
fields
A set of data values of a particular type, one for each record in a table.
filter
To select only the records that meet certain conditions.
foreign key
One or more values in a database table that identify a records in another table.
prepared statement
A template for an SQL query in which some values can be filled in.
primary key
One or more fields in a database table whose values are guaranteed to be unique for each record, i.e., whose values uniquely identify the entry.
query
A database operation that reads values but does not modify anything. Queries are expressed in a special-purpose language called SQL.
record
A set of related values making up a single entry in a database table, typically shown as a row. See also: field.
referential integrity
The internal consistency of values in a database. If an entry in one table contains a foreign key, but the corresponding records don't exist, referential integrity has been violated.
relational database
A collection of data organized into tables.
sentinel value
A value in a collection that has a special meaning, such as 999 to mean "age unknown".
SQL
A special-purpose language for describing operations on relational databases.
SQL injection attack
An attack on a program in which the user's input contains malicious SQL statements. If this text is copied directly into an SQL statement, it will be executed in the database.
table
A set of data in a relational database organized into a set of records, each having the same named fields.
wildcard
A character used in pattern matching. In SQL's like operator, the wildcard "%" matches zero or more characters, so that %able% matches "fixable" and "tablets".