What are the four different types of results that can be returned from a subquery?

What are the four different types of results that can be returned from a subquery?

This reading covers two important topics: computing summary values on multiple rows by grouping the rows, and nesting queries inside of queries. Of course, the two can be used in combination as well.

We'll also look at a few other topics.

  • The LIKE Operator
  • Distinct
  • GROUP BY
  • Uniformity Across a Group
  • Aggregate Functions
  • HAVING
  • WHERE versus HAVING
  • Actors that have at least 2 actor credits:
  • Movies directed by people older than 50:
  • Subquery Use and Syntax
  • Correlated Sub-queries
  • Subqueries that Return Tables
  • Subquery versus Join
  • Negating / Complementing Queries
  • Summary

The LIKE Operator¶

We can test for equality easily:

select * from person where name = 'George Clooney';

But looking for everyone named "George" might be trickier. However, MySQL has both full-blown regular expressions (which we might discuss later in the course) and simple wildcard characters. This is triggered by using the LIKE operator instead of =:

select * from person where name LIKE 'George%'; select * from move where title LIKE '%Potter%';

The % character is a wildcard: it matches zero or more of any character. Thus, the first expression matches any string starting with 'George'. The second expression matches any string that contains 'Potter': it starts with any string of characters, has Potter next, and concludes with any string of characters.

Distinct¶

Before we get into grouping, let's look at a related notion: finding the distinct values of something. For example, there are over a thousand movies in our database. We could ask for the release year of each and get over a thousand results (years). Obviously, there would be lots of repeats. For example, as of this writing, there are 8 movies in our database released in 1983. There are even more repeats of more recent years.

select `release` from movie;

Suppose we just want to know the different release years in the database, without repetitions. SQL provides an easy way to do that, using the distinct keyword:

select distinct `release` from movie;

The first query (above) returns results with repetitions. The second only reports each different value once.

Note: Unfortunately, the word release is a reserved word in MySQL versions after 3.x, so in order to use it as a normal column name, we have to put it in backquotes like that.

Groups

So far, we've used the SELECT statement to retrieve rows. In this section, we'll discuss grouping the rows, and returning just one representative per group.

Imagine a database of sales data:

regiondatesales
south 2019-09-09 $11,000
south 2019-10-01 $9,000
east 2019-09-09 $7,000
north 2019-09-13 $15,000
...

We can easily imagine asking questions like:

  • what was the average sales for each region?
  • what was the total sales for each month?

The idea is if we just have four regions (north, south, east and west) the answer to the first question would be a table with one row per region, and the average of the sales column, regardless of how many rows are averaged. Similarly, the answer to the second question might be a table with 12 rows (one for each month) and the total sales, summed over all rows falling in that month.

GROUP BY¶

The MySQL GROUP BY clause of the SELECT statement gives us this ability. It groups all the selected rows by the distinct values in the grouping condition (like the four values for 'region' or the 12 values for the month), and then aggregating the underlying rows of the group to yield an aggregate value. For example, the first question above might be answered with a query like this:

SELECT region, avg(sales) FROM sales_data GROUP BY region;

That says to report the region and to average the sales column for all the rows in each group, grouping by all the distinct values of the 'region' column. Even if the table has a million rows in it, there will be only four rows in the result, with the database doing all the arithmetic for us.

Here's the solution to the other question:

SELECT month(date), sum(sales) FROM sales_data GROUP BY month(date);

Here, we are using a date function to extract the month number from each date, and grouping by all the distinct values of that.

Note that we can combine the GROUP BY clause with other clauses. It comes after the WHERE clause. So, if we are only interested in average sales in December, by region:

SELECT region, avg(sales) FROM sales_data WHERE month(date) = 12 GROUP BY region

Suppose we are interested in how many sales there were. We can also count the rows in the group:

SELECT region, avg(sales), count(sales) FROM sales_data WHERE month(date) = 12 GROUP BY region

Uniformity Across a Group¶

Imagine the following query, which counts the number of recent movies in our database, grouped by year.

select `release`, count(*) from movie where `release` > 2010 group by `release`;

The output is something like this:

release count(*) 2011 40 2012 41 2013 47 2014 38 2015 45 2016 40 2017 25 2018 31 2019 26 2020 7

So, the database has 40 movies released in 2011, and 41 released in 2012 and so forth. This is a perfectly sensible and useful query.

Let's add a column to it:

select title, `release`, count(*) from movie where `release` > 2010 group by `release`;

This query asks, in addition to the release year and how many are in the group, for the title of that movie. Hunh? What movie?

How can the group of 40 movies from 2011 have a title?

This query makes no sense. Indeed, many versions of MySQL forbid this kind of query; you'll get a run-time error if you try it. Ours is not configured that way. Instead, it returns a value of an arbitrary member of the group (possibly the first). So we get output like this:

title release count(*) Jane Eyre 2011 40 Lincoln 2012 41 The Palace 2013 47 Interstellar 2014 38 Jurassic World 2015 45 Dreamland 2016 40 Wonder Woman 2017 25

The fact that you are allowed to do this doesn't mean you should.

When grouping, the columns should values that are uniform (the same) for every member of the group.

Aggregate Functions¶

So far, we've computed properties of a group with functions like count() and avg(). These are called aggregate functions, since they work on a group or aggregate of rows.

The most popular are:

  • count()
  • sum()
  • avg()
  • min()
  • max()

There are many others. You can learn more about aggregate functions

HAVING¶

The WHERE clause lets us select or filter rows but sometimes we want to filter groups. To do that, we can use a HAVING clause.

For example, if we are only interested in sales averages by month only if there have been at least 10 sales that month, we could do this:

SELECT month(date),sum(sales) FROM sales_data GROUP BY month(date) HAVING count(sales)>=10;

In general, the syntax is:

SELECT cols FROM tables WHERE boolean GROUP BY something HAVING boolean on group

WHERE versus HAVING¶

A lot of people get confused about WHERE versus HAVING, because they are both filters and we put in boolean conditionals. However, remember that a group has properties that individual rows don't: properties like count, sum, average and so forth. A single row of our sales_data table doesn't have a count (or it's vacuously 1). The HAVING clause is intended to filter groups and so it comes after the GROUP BY clause.

Sub-Queries

The SQL standard defines a syntax for nested queries or subqueries. Indeed, that's where the term "Structured Query Language" comes from: so far, our language has been describable by a regular expression, but now we can put queries inside queries, so we'll need context-free grammars to describe the language.

Of course, we don't really care what class of formal language SQL falls into. We care about expressivity. There are queries we can't easily express without this ability. Try the following:

Actors that have at least 2 actor credits:¶

Take a minute to try to wrap your mind around this:

-- actors who have at least two credits use wmdb; select nm, name from person where nm in ( select nm from credit group by nm having count(*) >= 2);

The inner query (parenthesized SELECT statement) returns the NM of people who have two or more acting credits. The outer query returns the nm and name of anyone whose NM is returned by the inner query. We report both NM and name in case there are two actors with the same name who both have more than two credits.

Movies directed by people older than 50:¶

Here's a complex nested query:

-- Movies directed by people over 50 use wmdb; select title from movie where director in ( select nm from person where birthdate is not null and birthdate < date_sub(current_date(), INTERVAL 50 YEAR));

This is scary looking, but let's take it one part at a time. The inner query finds the NM of all people over 50. (The date_sub function takes today's date and subtracts 50 years. People whose birthdate is before that date are older than 50.) Think of it as a list of NM values. George Clooney is older than 50 and if he directed a movie, his NM of 123 would be in that list.

The outer query just finds movies whose director ID is in the list of NM values returned by the inner query.

Note the use of the date functions here, so this query continue to work over time. You can look them up at MySQL Reference Manual, 12.5 Date and Time Functions.

Note also the removal of the people with a NULL birthdate.

Subquery Use and Syntax¶

The basic rules for subqueries are these. We'll see examples afterward.

  • If the subquery returns a scalar (a single value), you can use it wherever the scalar can be.
  • If the subquery returns a column of data, you can look through that column using IN or NOT IN
  • If the subquery returns a table of data, you must give an alias (temporary name) for that data and then use it like any table.

Returning a Scalar (single) Value

The following uses a subquery to look up Peter Jackson's NM (a scalar value) and then uses that NM in the main query.

use wmdb; select * from movie where director = (select nm from person where name='Peter Jackson');

However, if the subquery returns more than one value, you can't successfully compare a single value with multiple values, so you get no results. As an example, we look up the NM value of people with names starting with "John" and, since that returns multiple values, we get no results.

use wmdb; select * from movie where director = (select nm from person where name like 'John%');

(Indeed, the first query wouldn't work if there were more than one person named Peter Jackson in our database.)

Returning a Single Column

Here we return the TT values of every row in the credit table. Since all of those entries are acting credits, those TT values are movies with at least one actor (that the database knows about). Therefore, the outer query gives all data about movies that have at least one known actor.

use wmdb; select * from movie where tt in (select tt from credit);

What's cool about that is that we can easily invert the logic, in order to list movies without actors. (Maybe the movie is early in production and hasn't hired anyone yet, or maybe the database is incomplete.)

use wmdb; select * from movie where tt not in (select tt from credit);

EXISTS and NOT EXISTS

Subqueries allow you to use a new operator in the WHERE clause: EXISTS and NOT EXISTS. For example, here's how to find out people who don't have movie credits. The inner subquery finds acting credits for the person in the outer query. The condition asks that no such acting credits exist. Thus, the person doesn't have acting credits. (Maybe they are a director.)

use wmdb; -- people who are not actors (have no acting credits) select name from person where not exists (select * from credit where credit.nm = person.nm);

Remember the EXISTS and NOT EXISTS operators don't care what is returned by the query; they just care about whether zero rows are returned.

The inner query in the last example is an important variation on our examples. Most of our examples of subqueries have stood alone and can be tested by themselves. Like this:

SELECT tt FROM credit;

But the subquery in the last section uses a value from the outer query:

SELECT tt FROM credit WHERE credit.nm = person.nm

That subquery can't be tested by itself, since it refers to a value from the outer query. This is called a correlated subquery. They are perfectly kosher, but just a little harder to test.

Subqueries that Return Tables¶

If the subquery returns more than one column, you can use it in the FROM clause as if it were a table. In MySQL, they must have a temporary name or alias, which we do with the AS keyword. Here, we've used T1 as the temporary name of the result of the subquery.

-- titles of movies added by scott (id of 1) -- along with the name of the director use wmdb; select name,title from person, (select title,director from movie where addedby=1) as t1 where nm=t1.director;

Subquery versus Join¶

Subqueries can do many things that Joins can do, and vice versa. In fact, MySQL didn't add subqueries to its capabilities until version 4 (as I recall). I started using it back in version 3, which is how I ended up with the problem with release becoming a reserved word.

Consider trying to find:

Actors who were directed by Clint Eastwood

Here's a solution using JOIN:

-- actors who have been directed by Clint Eastwood, -- implemented as a join use wmdb; select distinct p1.name from person as p1,credit,movie,person as p2 where p1.nm=credit.nm and credit.tt=movie.tt and movie.director=p2.nm and p2.name = 'Clint Eastwood';

We used DISTINCT here, because if someone was directed by Clint more than once, we only want to return them once. (For example, if we changed the query to talk about Peter Jackson movies, Orlando Bloom was in all three of the Lord of the Rings movies, but we only want to list him once.)

Here's a solution using subqueries:

-- actors who have been directed by Clint Eastwood, -- implemented using subqueries use wmdb; select name from person where nm in ( select nm from credit where tt in ( select tt from movie where director = ( select nm from person where name = 'Clint Eastwood')));

Here, we don't need the DISTINCT keyword, because the outer query is just referring to the Person table, where a person only occurs once. If they were directed by Clint more than once, their NM would appear multiple times in the results of the subquery, but that wouldn't affect the number of times they were printed by the outer query. (Again, think of Orlando Bloom in Peter Jackson movies.)

Which kind of query should you choose? You should choose the query that is clearest and easiest to understand and get right. Let the database worry about which is more efficient.

However, there can be other considerations. For example, if we want to know the titles of the Clint Eastwood movies the actors were in, the version with JOIN can easily do that, since the Movie table is part of the outer query. The version with subqueries can be adapted to do that as well (changing some of the subqueries to return more information), but the adaptation process is a little harder.

Negating / Complementing Queries¶

One advantage of subqueries is that it's usually easier to find the complement of a set using subqueries. So, to find actors who were not directed by Clint Eastwood (that is, no movie they every acted in was directed by Clint), you can negate the subquery that returns the list of Clint Eastwood movies. It's harder to negate the query using join.

Summary¶

  • rows can be grouped by column values using a GROUP BY clause
  • all values must be uniform over the group
  • groups can be filtered using a HAVING clause
  • subqueries allow SQL queries to be nested
  • The inner query can return
    • a single value (a scalar)
    • a column, searched by IN or NOT IN
    • several columns, used as a temporary table, with an alias

What are the four different types of results that a subquery can returned explain what each type is and when it is used?

Subqueries can return different types of information: A scalar subquery returns a single value. A column subquery returns a single column of one or more values. A row subquery returns a single row of one or more values.

What are the different types of a subquery?

Types of Subqueries.
Single Row Sub Query: Sub query which returns single row output. ... .
Multiple row sub query: Sub query returning multiple row output. ... .
Correlated Sub Query: Correlated subqueries depend on data provided by the outer query..

What is a subquery and what are the different types of subqueries?

Type of Subqueries Single row subquery : Returns zero or one row. Multiple row subquery : Returns one or more rows. Multiple column subqueries : Returns one or more columns. Correlated subqueries : Reference one or more columns in the outer SQL statement.

Which of the following are the two types of subquery?

There are two types of subquery namely, Correlated and Non-Correlated. Correlated subquery: These are queries which select the data from a table referenced in the outer query.