Developing By Me

Project Development and Design

Subqueries

What are Subqueries?

  Subqueries are queries within a query. This sounds odd however a query can be used to return a subset of data that can be used by the parent query for a further breakdown. There are many different reasons and ways to handle subqueries. It is important to understand how and where they can be used but it is also necessary to understand that there are times that a subquery is not necessary. It all depends on how complicated the query is and if two dissimilar queries are necessary to connect data. Subqueries are normally placed in either the FROM or the WHERE clauses depending on requirement and usage. Using a query in the FROM clause means that SQL will treat it like it was a table. Using it in the WHERE clause it is treated as an array and must be used with the IN operator. The following is an example of each:

 

SQL Syntax
SELECT t1.* FROM table1 AS t1                JOIN table2 AS t2WHERE t1.field1 < 100

                AND t2.field1 < 100

                AND t2.field2 = t1.field2

 

SELECT t1.* FROM table1 AS t1JOIN (                SELECT * FROM table2 WHERE field1 < 100

                ) AS sub2 ON t1.field2 = sub2.field2

WHERE t1.field1 < 100

 

SELECT * FROM table1WHERE table1.field1 < 100                AND table1.field2 IN (

                                SELECT field2 FROM table2 WHERE field1 < 100

                )

 

 

  All three queries do exactly the same thing, with a few differences. Because of the way they are written all three queries have the exact same result set. The difference is how they are processed by the system. SQL does not execute top to bottom like it is read. In fact SQL starts with the FROM then handles all JOINs followed by the WHERE and finally returns the SELECT. Because of this order of operations each of the queries is handled differently on the system.

  The first query is the format that has been discussed previously. In this method SQL will link table1 and table2 directly through the JOIN statement. Then filter the results in the WHERE clause and return table1’s values. This is the strait forward approach and is the most commonly used.

  The second is also commonly used since it allows the DBA to use a query subset which will then be used as a table or a view, which is a predefined subquery. However, unlike the first query the subquery is executed first then it is joined to table1. Finally the WHERE clause is executed and the results are returned per the SELECT clause.

The third is not as common however it can be quite useful in specific applications. In this case all of the normal JOINs are handled and then during the WHERE clause the subquery is executed and the result is used to filter the results prior to returning the result set in the SELECT clause.

 

How are they used?

  It is important to understand what SQL does when information is joined to understand the usage of subqueries. For example if there are 10 tables each with 10 fields and all of the tables are joined together the database would have 100 fields that it needs to keep track of for each row. In this case there are now 100 fields in each row but there could be hundreds of rows by the time the 10th table is added to the first row from the first table, this is because of the way tables are associated. In order to make the dataset more manageable the DBA will use a subquery or multiple subqueries in order to limit the associations.

  The example of a transaction history is a case that a single record could have a large amount of records in the return result. If the standard transaction has 10 entries in the history table but the account that it belongs to has 10 transactions that means that by traditional joining each account would have 100 rows before the data is filtered by the WHERE clause. The usage of subqueries minimizes this because some of the data could be filtered in advance and then the smaller result is returned to the parent for further manipulation. An example of this in the traditional joining might be:

 

SELECT *
FROM users AS u
JOIN transactions AS t ON u.id = t.user_id
JOIN history AS h ON t.id = h.transaction_id
WHERE u.id = @user_id
AND h.note LIKE ‘FINAL%’

 

  A normal system might have 1,000 user accounts and if each account actually has 100 transactions with the standard 10 history entries per transaction then this query would have then the table would have a million rows before it is filtered. But it is also important to know that each row has 30 fields because all of the tables have 10 each. That means that the unfiltered result would have 30 million fields that will need to be filtered by the where clause. This is a lot of data to filter out and can slow down the system. However by using a subquery it is possible to organize the dataset into smaller chunks of information that is easier to handle. The following code does this.

 

SELECT *
FROM users AS u
JOIN transactions AS t ON u.id = t.user_id
JOIN (
SELECT *
FROM history AS h2
WHERE h2.note LIKE ‘FINAL%’
) AS h ON t.id = h.transaction_id
WHERE u.id = @user_id

 

  With the movement of the history table into the subquery the size of the data set is greatly reduced. Originally there were 1 million rows before the where clause. However since the history table only has a single note that is marked with the ‘FINAL’ string the table will not join every entry but only the ones from the original request which means that only 100 thousand rows would be returned and filtered by the where clause. However, this can be improved even more by adjusting the query just a bit more. Sometimes the query may get larger but the potential results get smaller when handling this process. With a simple two or three table example the potential is not that high for requiring this optimization however when combining dozens of tables that could have thousands of results the usage of subqueries becomes more important.

  One final way to handle subqueries for similar results is to have multiple subqueries inset into a larger parent query. In the above example there are only three tables. However, it is possible that there is one request that needs to go to the histories table but another that has to go into the transactions. In this case the history is for the full account not only transactions. Because both tables could be quite large it is possible that the data set can grow very large quickly. However, by breaking the tables into smaller groups it is possible to retrieve the requested information quickly through multiple subqueries. The following is an example of this:

 

SELECT *
FROM (
SELECT u.id AS user_id, *
FROM users AS u
JOIN transactions AS t ON u.id = t.user_id
WHERE u.id = @user_id
) AS sub1
JOIN (
SELECT h2.*
FROM users AS u2
JOIN history AS h2 ON u2.id = h2.user_id
WHERE u.id = @user_id
AND h2.note LIKE ‘FINAL%’
) AS sub2 ON sub1.user_id = sub2.user_id

 

  The simplified example breaks down the complex interactions into smaller blocks. For only three tables the usage of multiple subqueries is not very useful however if the DBA has to combine a dozen tables in several different ways the data result could be very large. However, using this method like the other subquery methods it is simplified and the individual datasets are much smaller and easier for the database to work with.

 

%d bloggers like this: