forked from swcarpentry/sql-novice-survey
-
Notifications
You must be signed in to change notification settings - Fork 0
/
09-create.html
156 lines (156 loc) · 12.6 KB
/
09-create.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Databases and SQL</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">Databases and SQL</h1></a>
<h2 class="subtitle">Creating and Modifying Data</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>Write statements that creates tables.</li>
<li>Write statements to insert, modify, and delete records.</li>
</ul>
</div>
</section>
<p>So far we have only looked at how to get information out of a database, both because that is more frequent than adding information, and because most other operations only make sense once queries are understood. If we want to create and modify data, we need to know two other sets of commands.</p>
<p>The first pair are <a href="https://www.sqlite.org/lang_createtable.html"><code>CREATE TABLE</code></a> and <a href="https://www.sqlite.org/lang_droptable.html"><code>DROP TABLE</code></a>. While they are written as two words, they are actually single commands. The first one creates a new table; its arguments are the names and types of the table’s columns. For example, the following statements create the four tables in our survey database:</p>
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">CREATE</span> <span class="kw">TABLE</span> Person(ident TEXT, personal TEXT, family TEXT);
<span class="kw">CREATE</span> <span class="kw">TABLE</span> Site(name TEXT, lat <span class="dt">REAL</span>, <span class="dt">long</span> <span class="dt">REAL</span>);
<span class="kw">CREATE</span> <span class="kw">TABLE</span> Visited(ident <span class="dt">INTEGER</span>, site TEXT, dated TEXT);
<span class="kw">CREATE</span> <span class="kw">TABLE</span> Survey(taken <span class="dt">INTEGER</span>, person TEXT, quant <span class="dt">REAL</span>, reading <span class="dt">REAL</span>);</code></pre>
<p>We can get rid of one of our tables using:</p>
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">DROP</span> <span class="kw">TABLE</span> Survey;</code></pre>
<p>Be very careful when doing this: most databases have some support for undoing changes, but it’s better not to have to rely on it.</p>
<p>Different database systems support different data types for table columns, but most provide the following:</p>
<table>
<thead>
<tr class="header">
<th align="left">data type</th>
<th align="left">use</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left">INTEGER</td>
<td align="left">a signed integer</td>
</tr>
<tr class="even">
<td align="left">REAL</td>
<td align="left">a floating point number</td>
</tr>
<tr class="odd">
<td align="left">TEXT</td>
<td align="left">a character string</td>
</tr>
<tr class="even">
<td align="left">BLOB</td>
<td align="left">a “binary large object”, such as an image</td>
</tr>
</tbody>
</table>
<p>Most databases also support Booleans and date/time values; SQLite uses the integers 0 and 1 for the former, and represents the latter as discussed <a href="#a:dates">earlier</a>. An increasing number of databases also support geographic data types, such as latitude and longitude. Keeping track of what particular systems do or do not offer, and what names they give different data types, is an unending portability headache.</p>
<p>When we create a table, we can specify several kinds of constraints on its columns. For example, a better definition for the <code>Survey</code> table would be:</p>
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">CREATE</span> <span class="kw">TABLE</span> Survey(
taken <span class="dt">integer</span> <span class="kw">not</span> <span class="kw">null</span>, <span class="co">-- where reading taken</span>
person text, <span class="co">-- may not know who took it</span>
quant <span class="dt">real</span> <span class="kw">not</span> <span class="kw">null</span>, <span class="co">-- the quantity measured</span>
reading <span class="dt">real</span> <span class="kw">not</span> <span class="kw">null</span>, <span class="co">-- the actual reading</span>
<span class="kw">primary</span> <span class="kw">key</span>(taken, quant),
<span class="kw">foreign</span> <span class="kw">key</span>(taken) <span class="kw">references</span> Visited(ident),
<span class="kw">foreign</span> <span class="kw">key</span>(person) <span class="kw">references</span> Person(ident)
);</code></pre>
<p>Once again, exactly what constraints are available and what they’re called depends on which database manager we are using.</p>
<p>Once tables have been created, we can add, change, and remove records using our other set of commands, <code>INSERT</code>, <code>UPDATE</code>, and <code>DELETE</code>.</p>
<p>The simplest form of <code>INSERT</code> statement lists values in order:</p>
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">INSERT</span> <span class="kw">INTO</span> Site <span class="kw">values</span>(<span class="st">'DR-1'</span>, -<span class="fl">49.85</span>, -<span class="fl">128.57</span>);
<span class="kw">INSERT</span> <span class="kw">INTO</span> Site <span class="kw">values</span>(<span class="st">'DR-3'</span>, -<span class="fl">47.15</span>, -<span class="fl">126.72</span>);
<span class="kw">INSERT</span> <span class="kw">INTO</span> Site <span class="kw">values</span>(<span class="st">'MSK-4'</span>, -<span class="fl">48.87</span>, -<span class="fl">123.40</span>);</code></pre>
<p>We can also insert values into one table directly from another:</p>
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">CREATE</span> <span class="kw">TABLE</span> JustLatLong(lat text, <span class="dt">long</span> text);
<span class="kw">INSERT</span> <span class="kw">INTO</span> JustLatLong <span class="kw">SELECT</span> lat, <span class="dt">long</span> <span class="kw">FROM</span> site;</code></pre>
<p>Modifying existing records is done using the <code>UPDATE</code> statement. To do this we tell the database which table we want to update, what we want to change the values to for any or all of the fields, and under what conditions we should update the values.</p>
<p>For example, if we made a mistake when entering the lat and long values of the last <code>INSERT</code> statement above:</p>
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">UPDATE</span> Site <span class="kw">SET</span> lat=-<span class="fl">47.87</span>, long=-<span class="fl">122.40</span> <span class="kw">WHERE</span> name=<span class="st">'MSK-4'</span>;</code></pre>
<p>Be careful to not forget the <code>where</code> clause or the update statement will modify <em>all</em> of the records in the database.</p>
<p>Deleting records can be a bit trickier, because we have to ensure that the database remains internally consistent. If all we care about is a single table, we can use the <code>DELETE</code> command with a <code>WHERE</code> clause that matches the records we want to discard. For example, once we realize that Frank Danforth didn’t take any measurements, we can remove him from the <code>Person</code> table like this:</p>
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">DELETE</span> <span class="kw">FROM</span> Person <span class="kw">WHERE</span> ident = <span class="st">'danforth'</span>;</code></pre>
<p>But what if we removed Anderson Lake instead? Our <code>Survey</code> table would still contain seven records of measurements he’d taken, but that’s never supposed to happen: <code>Survey.person</code> is a foreign key into the <code>Person</code> table, and all our queries assume there will be a row in the latter matching every value in the former.</p>
<p>This problem is called <a href="reference.html#referential-integrity">referential integrity</a>: we need to ensure that all references between tables can always be resolved correctly. One way to do this is to delete all the records that use <code>'lake'</code> as a foreign key before deleting the record that uses it as a primary key. If our database manager supports it, we can automate this using <a href="reference.html#cascading-delete">cascading delete</a>. However, this technique is outside the scope of this chapter.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pushpin"></span>Hybrid Storage Models</h2>
</div>
<div class="panel-body">
<p>Many applications use a hybrid storage model instead of putting everything into a database: the actual data (such as astronomical images) is stored in files, while the database stores the files’ names, their modification dates, the region of the sky they cover, their spectral characteristics, and so on. This is also how most music player software is built: the database inside the application keeps track of the MP3 files, but the files themselves live on disk.</p>
</div>
</aside>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pencil"></span>Replacing NULL</h2>
</div>
<div class="panel-body">
<p>Write an SQL statement to replace all uses of <code>null</code> in <code>Survey.person</code> with the string <code>'unknown'</code>.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pencil"></span>Generating Insert Statements</h2>
</div>
<div class="panel-body">
<p>One of our colleagues has sent us a <a href="reference.html#comma-separated-values">CSV</a> file containing temperature readings by Robert Olmstead, which is formatted like this:</p>
<pre class="output"><code>Taken,Temp
619,-21.5
622,-15.5</code></pre>
<p>Write a small Python program that reads this file in and prints out the SQL <code>INSERT</code> statements needed to add these records to the survey database. Note: you will need to add an entry for Olmstead to the <code>Person</code> table. If you are testing your program repeatedly, you may want to investigate SQL’s <code>INSERT or REPLACE</code> command.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2><span class="glyphicon glyphicon-pencil"></span>Backing Up with SQL</h2>
</div>
<div class="panel-body">
<p>SQLite has several administrative commands that aren’t part of the SQL standard. One of them is <code>.dump</code>, which prints the SQL commands needed to re-create the database. Another is <code>.load</code>, which reads a file created by <code>.dump</code> and restores the database. A colleague of yours thinks that storing dump files (which are text) in version control is a good way to track and manage changes to the database. What are the pros and cons of this approach? (Hint: records aren’t stored in any particular order.)</p>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/sql-novice-survey">Source</a>
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>