Analyzing Large Datasets With SQL

Home » Blog » Software » Enterprise Software » Analyzing Large Datasets With SQL

As part of a web application I’ve created for a client, I’ve been working on efficient analysis of somewhat large datasets for some time now. This analysis often involves complexing grouping, clustering and value aggregation. SQL and PostgreSQL aggregate functions in particular come in quite handy when dealing with that kind of challenge.

While RDBMS and SQL certainly are useful already just for keeping and retrieving data, i.e. for running your usual CRUD operation, those tools really shine when it comes to efficiently gaining insight from raw data in a scalable manner. Sure, Microsoft Excel or Google Sheets also provide features for analyzing data and particularly for graphically displaying the results visually using diagrams and plots. In fact, that used to be their raison d’être. However, scaling an Excel spreadsheet to more than a few thousand rows – let alone millions – isn’t exactly recommendable or even possible anymore once the dataset grows large enough.

Dealing with large datasets in a scalable manner – that’s what relational database systems were invented for in the first place so it’s only natural to use them for large scale data analysis and data mining, too. Apart from quite common and well-known aggregation functions such as COUNT, SUM or AVG, PostgreSQL also provides more sophisticated aggregation functions for calculating statistical measures, for example:

In addition to that, Matt DeLand – co-founder and data scientist at Wagon – recently summarized techniques for summarizing data with SQL. In this article he covers aspects such as:

If you work on analyzing somewhat large datasets his article – and his company Wagon – or very much worth checking out.

If you’d like to know more about data analysis with SQL (or other subjects related to software development / data analysis / data mining), please shoot me an email.

Leave a Comment

* Checkbox GDPR is required

*

I agree

By continuing to browse the site you agree to our use of cookies. Privacy Policy

Privacy Preference Center

Strictly necessary

These cookies are necessary for the site to function.

PHPSESSID: Preserves user session state across page requests.

__cfduid: Used by the content network, Cloudflare, to identify trusted web traffic.

PHPSESSID
__cfduid

Preferences

Remembers the user's submitted data when a comment is submitted in a blog post. The purpose is to aut o-populate form fields for subsequent comments, in order to save time for the user.

wfvt_#

Statistics

Statistic cookies help us to understand how visitors interact with our websites by collecting and reporting information anonymously.

_ga: Registers a unique ID that is used to generate statistical data on how the visitor uses the website.

_gat: Used by Google Analytics to throttle request rate.

_gid: Registers a unique ID that is used to generate statistical data on how the visitor uses the website.

collect: Used to send data to Google Analytics about the visitor's device and behaviour. Tracks the visitor across d evices and marketing channels.

_ga,_gat,_gid
collect

Security

We use Wordfence to secure our website against hacking attempts: https://www.wordfence.com/

wordfence_verifiedHuman

Close your account?

Your account will be closed and all data will be permanently deleted and cannot be recovered. Are you sure?