Sometimes when displaying data in an application you want the amount of data (rows, individual data points etc.) to be limited for the app to stay responsive and to allow the user to more easily scan and browse the data set.
In SQL this is commonly done using a LIMIT statement:
1 | SELECT * FROM table LIMIT 100 |
Now, what if you want to not only display the data but also inform the user in case the total number of rows in the table exceeds the limit set by your application? You could of course select a COUNT of the table rows and display a message if COUNT(*) > 100:
1 | SELECT COUNT(*) FROM table |
However, this means you have to execute 2 queries instead of 1 which depending on your application architecture could involve additional overhead such as additional HTTP calls in a single-page application architecture.
A more efficient solution is using a sentinel to determine if the total number of data points is larger than a given number:
1 | SELECT * FROM table LIMIT 101 |
Using 101 instead of 100 as the LIMIT argument in this case allows you to check if there are further rows in addition to the ones selected. If the size of the result set returned by this query is larger than 100 you know that the table contains additional rows without having to execute an additional COUNT query. In terms of data structures, such an element that serves no other purpose but to determine if a list iteration can terminate is called a sentinel node.
Another useful feature when dealing with large datasets is PostgreSQL’s window function. When using aggregate functions such as SUM or COUNT in SQL you have to use a GROUP BY statement if you want to mix those aggregate columns with normal columns in your result set:
1 | SELECT id, name, location, SUM(amount) FROM items GROUP BY id, name, location |
Sometimes this is desired, for example if you want summate over the amounts by item. However, sometimes you want the total sum over all items or you might not want to group by a certain field (for instance ‘location’ in the above example). In that case you’d have to resort to executing 2 queries again: One that yields the individual items and one that gives you the total amount.
Another, more elegant solution in such a use case is using a window function:
1 | SELECT id, name, location, SUM(amount) OVER () FROM items |
Windowing allows you to take values from a subset of table rows and use those in calculations. The excellent PostgreSQL documentation explains it like this:
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.
By using OVER () with no arguments you create a window over the whole result set and use that to calculate the overall amount. In addition to that OVER () also takes expressions like PARTITION BY depname ORDER BY salary DESC that allow you to group (PARTITION BY) and order the calculation results.
Windowing also allows you to accumulate values that take previous rows into account. The following query for instance yields a result set where accumulated_amount increases for each subsequent row by that row’s amount, which effectively creates a series of the sequence of amounts in the item table:
1 | SELECT name, SUM(amount) OVER (ORDER BY amount DESC) AS accumulated_amount FROM itmes |
Along those lines, pivoting in PostgreSQL also is a useful approach (commonly known from Microsoft Excel) for aggregating and partitioning values by doing a matrix transposition and turn rows into columns and vice versa.