forked from swcarpentry/sql-novice-survey
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Generating and committing HTML pages
Showing
21 changed files
with
3,720 additions
and
101 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,396 @@ | ||
<!DOCTYPE html> | ||
<html> | ||
<head> | ||
<meta charset="utf-8"> | ||
<meta name="generator" content="pandoc"> | ||
<title>Software Carpentry: Databases and SQL</title> | ||
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" /> | ||
<link rel="stylesheet" href="css/bootstrap/bootstrap.css" /> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /> | ||
<link rel="stylesheet" href="css/bootstrap/bootstrap-responsive.css" /> | ||
<link rel="stylesheet" type="text/css" href="css/swc.css" /> | ||
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/> | ||
<meta charset="UTF-8" /> | ||
<!-- HTML5 shim, for IE6-8 support of HTML5 elements --> | ||
<!--[if lt IE 9]> | ||
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script> | ||
<![endif]--> | ||
</head> | ||
<body> | ||
<div class="container"> | ||
<div class="banner"> | ||
<a href="http://software-carpentry.org" title="Software Carpentry"> | ||
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" /> | ||
</a> | ||
</div> | ||
<div class="row-fluid"> | ||
<div class="span10 offset1"> | ||
<h1 class="title">Databases and SQL</h1> | ||
<h2 class="subtitle">Sorting and Removing Duplicates</h2> | ||
<div id="learning-objectives" class="objectives"> | ||
<h2>Learning Objectives</h2> | ||
<ul> | ||
<li>Write queries that display results in a particular order.</li> | ||
<li>Write queries that eliminate duplicate values from data.</li> | ||
</ul> | ||
</div> | ||
<p>Data is often redundant, so queries often return redundant information. For example, if we select the quantitites that have been measured from the <code>survey</code> table, we get this:</p> | ||
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">select</span> quant <span class="kw">from</span> Survey;</code></pre> | ||
<h2 id="quant">quant</h2> | ||
<p>rad<br />sal<br />rad<br />sal<br />rad<br />sal<br />temp<br />rad<br />sal<br />temp<br />rad<br />temp<br />sal<br />rad<br />sal<br />temp<br />sal<br />rad<br />sal<br />sal<br />rad</p> | ||
<p>We can eliminate the redundant output to make the result more readable by adding the <code>distinct</code> keyword to our query:</p> | ||
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">select</span> <span class="kw">distinct</span> quant <span class="kw">from</span> Survey;</code></pre> | ||
<h2 id="quant-1">quant</h2> | ||
<p>rad<br />sal<br />temp</p> | ||
<p>If we select more than one column --- for example, both the survey site ID and the quantity measured --- then the distinct pairs of values are returned:</p> | ||
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">select</span> <span class="kw">distinct</span> taken, quant <span class="kw">from</span> Survey;</code></pre> | ||
<table> | ||
<thead> | ||
<tr class="header"> | ||
<th align="left">taken</th> | ||
<th align="left">quant</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr class="odd"> | ||
<td align="left">619</td> | ||
<td align="left">rad</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">619</td> | ||
<td align="left">sal</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">622</td> | ||
<td align="left">rad</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">622</td> | ||
<td align="left">sal</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">734</td> | ||
<td align="left">rad</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">734</td> | ||
<td align="left">sal</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">734</td> | ||
<td align="left">temp</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">735</td> | ||
<td align="left">rad</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">735</td> | ||
<td align="left">sal</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">735</td> | ||
<td align="left">temp</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">751</td> | ||
<td align="left">rad</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">751</td> | ||
<td align="left">temp</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">751</td> | ||
<td align="left">sal</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">752</td> | ||
<td align="left">rad</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">752</td> | ||
<td align="left">sal</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">752</td> | ||
<td align="left">temp</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">837</td> | ||
<td align="left">rad</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">837</td> | ||
<td align="left">sal</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">844</td> | ||
<td align="left">rad</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
<p>Notice in both cases that duplicates are removed even if they didn't appear to be adjacent in the database. Again, it's important to remember that rows aren't actually ordered: they're just displayed that way.</p> | ||
<p>As we mentioned earlier, database records are not stored in any particular order. This means that query results aren't necessarily sorted, and even if they are, we often want to sort them in a different way, e.g., by the name of the project instead of by the name of the scientist. We can do this in SQL by adding an <code>order by</code> clause to our query:</p> | ||
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">select</span> * <span class="kw">from</span> Person <span class="kw">order</span> <span class="kw">by</span> ident;</code></pre> | ||
<table> | ||
<thead> | ||
<tr class="header"> | ||
<th align="left">ident</th> | ||
<th align="left">personal</th> | ||
<th align="left">family</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr class="odd"> | ||
<td align="left">danforth</td> | ||
<td align="left">Frank</td> | ||
<td align="left">Danforth</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">dyer</td> | ||
<td align="left">William</td> | ||
<td align="left">Dyer</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">lake</td> | ||
<td align="left">Anderson</td> | ||
<td align="left">Lake</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">pb</td> | ||
<td align="left">Frank</td> | ||
<td align="left">Pabodie</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">roe</td> | ||
<td align="left">Valentina</td> | ||
<td align="left">Roerich</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
<p>By default, results are sorted in ascending order (i.e., from least to greatest). We can sort in the opposite order using <code>desc</code> (for "descending"):</p> | ||
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">select</span> * <span class="kw">from</span> person <span class="kw">order</span> <span class="kw">by</span> ident <span class="kw">desc</span>;</code></pre> | ||
<table> | ||
<thead> | ||
<tr class="header"> | ||
<th align="left">ident</th> | ||
<th align="left">personal</th> | ||
<th align="left">family</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr class="odd"> | ||
<td align="left">roe</td> | ||
<td align="left">Valentina</td> | ||
<td align="left">Roerich</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">pb</td> | ||
<td align="left">Frank</td> | ||
<td align="left">Pabodie</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">lake</td> | ||
<td align="left">Anderson</td> | ||
<td align="left">Lake</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">dyer</td> | ||
<td align="left">William</td> | ||
<td align="left">Dyer</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">danforth</td> | ||
<td align="left">Frank</td> | ||
<td align="left">Danforth</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
<p>(And if we want to make it clear that we're sorting in ascending order, we can use <code>asc</code> instead of <code>desc</code>.)</p> | ||
<p>We can also sort on several fields at once. For example, this query sorts results first in ascending order by <code>taken</code>, and then in descending order by <code>person</code> within each group of equal <code>taken</code> values:</p> | ||
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">select</span> taken, person <span class="kw">from</span> Survey <span class="kw">order</span> <span class="kw">by</span> taken <span class="kw">asc</span>, person <span class="kw">desc</span>;</code></pre> | ||
<table> | ||
<thead> | ||
<tr class="header"> | ||
<th align="left">taken</th> | ||
<th align="left">person</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr class="odd"> | ||
<td align="left">619</td> | ||
<td align="left">dyer</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">619</td> | ||
<td align="left">dyer</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">622</td> | ||
<td align="left">dyer</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">622</td> | ||
<td align="left">dyer</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">734</td> | ||
<td align="left">pb</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">734</td> | ||
<td align="left">pb</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">734</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">735</td> | ||
<td align="left">pb</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">735</td> | ||
<td align="left">---</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">735</td> | ||
<td align="left">---</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">751</td> | ||
<td align="left">pb</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">751</td> | ||
<td align="left">pb</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">751</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">752</td> | ||
<td align="left">roe</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">752</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">752</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">752</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">837</td> | ||
<td align="left">roe</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">837</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">837</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">844</td> | ||
<td align="left">roe</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
<p>This is easier to understand if we also remove duplicates:</p> | ||
<pre class="sourceCode sql"><code class="sourceCode sql"><span class="kw">select</span> <span class="kw">distinct</span> taken, person <span class="kw">from</span> Survey <span class="kw">order</span> <span class="kw">by</span> taken <span class="kw">asc</span>, person <span class="kw">desc</span>;</code></pre> | ||
<table> | ||
<thead> | ||
<tr class="header"> | ||
<th align="left">taken</th> | ||
<th align="left">person</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr class="odd"> | ||
<td align="left">619</td> | ||
<td align="left">dyer</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">622</td> | ||
<td align="left">dyer</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">734</td> | ||
<td align="left">pb</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">734</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">735</td> | ||
<td align="left">pb</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">735</td> | ||
<td align="left">---</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">751</td> | ||
<td align="left">pb</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">751</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">752</td> | ||
<td align="left">roe</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">752</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">837</td> | ||
<td align="left">roe</td> | ||
</tr> | ||
<tr class="even"> | ||
<td align="left">837</td> | ||
<td align="left">lake</td> | ||
</tr> | ||
<tr class="odd"> | ||
<td align="left">844</td> | ||
<td align="left">roe</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
<div id="fixme" class="challenge"> | ||
<h2>FIXME</h2> | ||
<p>Write a query that selects distinct dates from the <code>Site</code> table.</p> | ||
</div> | ||
<div id="fixme-1" class="challenge"> | ||
<h2>FIXME</h2> | ||
<p>Write a query that returns the distinct dates in the <code>Visited</code> table.</p> | ||
</div> | ||
<div id="fixme-2" class="challenge"> | ||
<h2>FIXME</h2> | ||
<p>Write a query that displays the full names of the scientists in the <code>Person</code> table, ordered by family name.</p> | ||
</div> | ||
</div> | ||
</div> | ||
<div class="footer"> | ||
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Email</a> | ||
<a class="label swc-blue-bg" href="http://github.com/swcarpentry">GitHub</a> | ||
<a class="label swc-blue-bg" href="LICENSE.html">License</a> | ||
<a class="bugreport label swc-blue-bg" href="mailto:admin@software-carpentry.org?subject=bug%20in%20{{page.path}}">Bug Report</a> | ||
</div> | ||
</div> | ||
<!-- Javascript placed at the end of the document so the pages load faster --> | ||
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script> | ||
<script src="http://software-carpentry.org/v5/js/bootstrap/bootstrap.min.js"></script> | ||
</body> | ||
</html> |
Oops, something went wrong.