TRAG R&A - PS Query: Subqueries

By Dede Young posted 11-16-2021 12:47 PM


The TRAG’s Reporting and Analytics workgroup recently published two short articles on subqueries.  This article will dig a bit deeper into the subquery, with several examples.  The examples use student records data, but the underlying principles may be applied to other areas.

Maximum Value

Subqueries can be used as a comparison value for specific criteria.  For example, the need may be to pull in a maximum effective-dated record where more than one may exist.  A subquery can be used to perform an aggregate on that record’s effective date.  The following is an example of a subquery to select the results from the most recently run academic requirements report.

Create a new query and select the record SAA_ADB_REPORT and select EMPLID as a field to appear in the query (query cannot be saved without a field).

Add criteria on the field RPT_DATE by clicking the funnel icon to the right of the field.

The Condition Type should remain as ‘equal to’, and the Expression 2 Type should be Subquery. 

Once Subquery is selected, the option to ‘Define/Edit Subquery’ appears.  Click this link.  “Working on select…Subquery for A.RPT_DATE” appears at the top of the Records tab, indicating that this is a subquery.  Once a subquery or union is created, the ‘Subquery/Union Navigation’ link appears at the top-right. 

Select the record SAA_ADB_REPORT.

Only one field can be selected in a subquery.  Since the intention of this subquery is to select the most recently run academic requirements report, and the subquery is for RPT_DATE, the field RPT_DATE should be selected from the Query tab.

On the Fields tab, Edit the RPT_DATE field and select the Max aggregate.

Add criteria, joining the subquery back to the top level query, to sure that the most recently run academic requirements report will be identified for the desired report type (TSCRPT_TYPE), correct student (EMPLID), and at the correct career level (SAA_CAREER_RPT).

Be sure to identify the TSCRPT_TYPE at the top level and save the query.  Navigate back to the top level of the query by clicking the Subquery/Union Navigation link.

Exclusionary Selection

Subqueries can be used to exclude students with specific values on their record who may meet the criteria at the top level.  Typically, subqueries would be used in this situation if the value being excluded exists as a data point in which students can have multiple values.  Examples of this include student groups, service indicators, and class instruction modes. 

The following is an example of a subquery used in a query intended to identify students who are taking only online classes.  Since students are often enrolled in more than one class, and each class can have a different instruction mode, simply identifying students at the top level of the query based on INSTRUCTION_MODE = ‘W’ (or the appropriate ‘online’ instruction mode) does not guarantee that the students are only taking online classes.  A subquery can be used to exclude classes taken with any other instruction mode.

The top level criteria identifies students enrolled in a prompted term, in which the instruction mode is equal to ‘W’, which is this institution’s 100% Online code.

Criteria is added on EMPLID, Condition Type ‘not in list’, Expression 2 type Subquery.

This subquery is an independent query and does not connect back to the top level.  It is built using the student enrollment tables, specifically STDNT_ENRL and CLASS_TBL.

Since the goal of this subquery is to create a list of Empl IDs, the field displayed is EMPLID from STDNT_ENRL.  No aggregation is done on this field, it is used to create the ‘list’.

Criteria is added to identify student enrollments in the prompted term in which the instruction mode is not equal to ‘W’, or the institution’s 100% Online instruction mode value.

The end result of adding this subquery is to exclude, from those students identified at the top level as being enrolled in at least one online class, students who are enrolled in at least one class that is not online, thus generating a list of students who are only enrolled in online classes.

Minimum and Maximum Values

Subqueries, as has been discussed, can be used to identify minimum and maximum values of a field.  One particular situation in which this can be useful is to identify, in one field, students who have more than one value of a field.  The following example shows subqueries used to identify the minimum and maximum ethnic group codes.  The query then utilizes an expression to identify when the minimum and maximum values are not the same, so as to identify that the student has ‘Two or More’ ethnicities.  This is often used for aggregated reports that academic departments must submit for reaccreditation purposes.

The top level query is written to identify the specific cohort of students-in this example, graduate college of education majors.  One instance of an ethnicity record is added (ex. SCC_ETH_GRP_VW), as well as criteria on the field ETHNIC_GRP_CD, equal to a subquery.

The subquery is edited/created, selecting the same ethnicity record, SCC_ETH_GRP_VW in this example.

The selected field in this subquery is ETHNIC_GRP_CD, as it is the field for which the subquery is being created.  The field is edited and the MIN aggregate is selected.

Criteria is added for EMPLID and REG_REGION, joining the subquery instance of SCC_ETH_GRP_VW back to the top level instance.

Navigate back to the top level to add the second subquery.  Another instance of SCC_ETH_GRP_VW is added at the top level and subquery criteria is added using the field ETHNIC_GRP_CD from the second top level instance.

ETHNIC_GRP_CD is added as the subquery field, edited, with the MAX aggregate selected.

Just as in the MIN subquery, criteria is added joining this subquery instance of SCC_ETH_GRP_VW back to the second top level instance of the same record.  Navigate back to the top level and save the query.

In adding the description fields (DESCR50) from both top level instances of SCC_ETH_GRP_VW, it is apparent that many, if not most, students have only one ethnicity. 

However, there are students with multiple ethnicities, as is seen above in the first entry.  A case-logic expression is added to identify when the MAX ETHNIC_GRP_CD is not equal to the MIN ETHNIC_GRP_CD. 

If the ethnic group codes are different, a value of ‘Two or More’ will display.  If they are the same, the description field from the MIN record will display.

The minimum and maximum ethnicity descriptions can be removed from the query, as they were only used for verification purposes.

As has been evidenced by these examples, subqueries provide a fairly simple (?) way to refine query results without having to rely on connected queries or some other manner of getting at the data needed.




11-22-2021 10:02 AM

Jesse, I know it, and thanks for pointing that out!  I try to avoid unions at all costs!

11-22-2021 09:51 AM

Great article.

One major shortcoming of PSQuery that's worth pointing out: a subquery that is under a top-level query applies ONLY to the results of that top-level query. Unions are top-level queries (existing in parallel with the "main" query), so if you need to apply exclusion logic to the entire resultset, you will basically have to build the exclusion logic under each union, too. This comes with all sorts of potential pitfalls, and of course it introduces repetitive code into the query that can quickly make maintenance difficult as you add more unions. So be prudent when going this route.