{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidarwqwjlx2dvvpw7ck3br7t36kb3hmboajrauofgvfg7vxrjhjci",
"uri": "at://did:plc:wszrgoqdwy3i2dfeub2mt3wf/app.bsky.feed.post/3lckggdalmvk2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreibinvuxrazudsk2qwoejmua34vse6ls6vlabtwgcupxdsgtw5vfvy"
},
"mimeType": "image/png",
"size": 5495
},
"description": "A collection of SQLite snippets I've picked up recently to improve my queries.",
"path": "/posts/2024/12/05/sql-tidbits/",
"publishedAt": "2024-12-05T09:57:16.000Z",
"site": "https://www.jvt.me",
"tags": [
"blogumentation",
"sqlite",
"dependency-management-data",
"in a separate post",
"`sqlc`"
],
"textContent": "I've recently spent some time working towards presenting some of the findings from dependency-management-data at work, which has required writing some fun queries, and then using Google Sheets to better visualise that data.\n\nNot only have I taken advantage of some fairly straightforward queries, written off-the-cuff, or relied upon some pre-written queries in the DMD codebase, I've also found myself needing to branch out and write some much more complex SQL, as I didn't want to add any logic into the Google Sheet and have that only as a presentation layer, which could then be consumed by a Google Slides presentation.\n\nIn particular, because I haven't looked at presenting this data directly via Google Sheets visualisations, I ended up searching for things like `sqlite buckets` or `how to make rows columns vice versa`, to mixed luck.\n\nInstead of this being a spatter of separate posts, I thought I'd collect them here for lower discoverability, but higher density.\n\n## Splitting data into buckets\n\nThis is a big one that took me _many_ attempts to find an article / StackOverflow post that would explain quite right for what I wanted to do.\n\nFor instance, let's say we have the following data (limited for brevity):\n\nplatform| organisation| repo| package_name| version| current_version| package_manager| package_file_path| dep_types| level| advisory_type| description| supported_until| eol_from\n---|---|---|---|---|---|---|---|---|---|---|---|---|---\ngithub| stretchr| testify| go| 1.11| 1.23.3| gomod| _codegen/go.mod| [\"golang\"]| ERROR| UNMAINTAINED| go 1.11 has been End-of-Life for 1897 days| | 2019-09-03\ngithub| wiremock| wiremock| gradle| 4.5.1| 4.5.1| gradle-wrapper| perf-test/gradle/wrapper/gradle-wrapper.properties| []| ERROR| UNMAINTAINED| gradle 4 has been End-of-Life for 2178 days| 2018-11-26| 2018-11-26\ngithub| wiremock| wiremock-resilience-examples| gradle| 4.5.1| 4.5.1| gradle-wrapper| gradle/wrapper/gradle-wrapper.properties| []| ERROR| UNMAINTAINED| gradle 4 has been End-of-Life for 2178 days| 2018-11-26| 2018-11-26\ngitlab| gitlab-org| gitlab| node| 4.2.2| 4.2.2| gitlabci| lib/gitlab/ci/templates/Pages/Metalsmith.gitlab-ci.yml| [\"image\"]| ERROR| UNMAINTAINED| nodejs 4 has been End-of-Life for 2388 days| 2017-04-01| 2018-04-30\n\nIf we wanted to get a high-level view of what packages (via the `package_name`) were in the state of being `UNMAINTAINED` or `DEPRECATED`, how would we do that, so we can i.e. get the following output?\n\npackage_name| bucket| count(*)\n---|---|---\ngo| unmaintained| 1\ngradle| unmaintained| 2\nnode| deprecated| 2\nnode| unmaintained| 10\npython| deprecated| 1\nrails| unmaintained| 2\nruby| unmaintained| 1\n\nTo do this, we can use a `case` / `when` statement to group each type of data into a named bucket, like so:\n\n\n select\n package_name,\n (\n \tcase\n \twhen abs(cast ((julianday(eol_from) - julianday('now')) as integer)) > (365 * 5)\n \t\tthen 'unmaintained'\n \twhen abs(cast ((julianday(supported_until) - julianday('now')) as integer)) > (365 * 5)\n \t\tthen 'deprecated'\n \telse\n \t\t'THIS SHOULD NOT BE HIT'\n \tend\n ) as bucket,\n count(*)\n from advisories\n where\n (\n abs(cast ((julianday(eol_from) - julianday('now')) as integer)) > (365 * 5)\n )\n or\n (\n abs(cast ((julianday(supported_until) - julianday('now')) as integer)) > (365 * 5)\n )\n group by package_name, bucket\n order by package_name, bucket\n\n\nThis is _very_ useful when trying to then graph the resulting data, and I've ended up using this bucketing quite a few times after finding out how to do it 🙌🏼\n\n## `ORDER BY` doesn't need to reference a column\n\nSomething interesting is the fact that it's possible to use an `ORDER BY` with an arbitrary statement, for instance:\n\n\n select\n visibility,\n count(*)\n from\n repository_metadata\n group by\n visibility\n order by\n (\n case\n visibility\n when 'PUBLIC' then 0\n when 'INTERNAL' then 1\n when 'PRIVATE' then 2\n else 3\n end\n )\n\n\nPreviously, I had been creating a separate column called `ord` and then was using this, but this is much simpler.\n\n## Querying JSON\n\nI've written about this more in a separate post but one thing to note that I've recently been doing is relying on the shorthand \"arrow functions\" that are inbuilt to SQLite, and are familiar if you're coming from Postgres.\n\nFor instance, let's say we have the following:\n\n\n select\n id,\n json_extract(data.json, '$.name') name\n from\n data\n\n\nWe can actually further simplify this with an arrow operator, `->>`:\n\n\n select\n id,\n - json_extract(data.json, '$.name') name\n + data.json ->> '$.name' name\n from\n data\n\n\n## Concatenating strings\n\nSomething I've done a bit before, but for posterity:\n\n\n -- ...\n datasource || ': ' || update_type as label,\n -- ^^ this concatenates between multiple strings\n -- ...\n\n\nVery useful when trying to add useful labels for graphs.\n\n## Providing default values for `NULL`s\n\nIn the case that we're retrieving a nullable column from database, but want to provide a default value, we can use `coalesce`.\n\nFor instance:\n\n\n select\n coalesce(current_version, version) as ver\n from\n renovate\n\n\nOr:\n\n\n select\n coalesce(metadata, '{}') as m\n -- ...\n\n\n## Date calculations\n\nSomething I've been doing since very early on with DMD is calculating \"days between\" dates, i.e. to given an indication of \"you've been running Node 12.x for 900 days past its End-of-Life date\".\n\nFor instance, if we want to take a column like:\n\nsupported_until\n---\n2024-10-22\n\nAnd work out how many days it's been since that date, we could write a query using `julianday`, such as:\n\n\n select\n supported_until,\n -- it's more useful to show the absolute (aka always positive) value to humans\n abs(cast(julianday(supported_until) - julianday('now') as integer)) as since\n from\n advisories\n where\n supported_until is not null\n and\n -- this number will be negative if it's in the past\n cast(julianday(supported_until) - julianday('now') as integer) < 0\n\n\n## Using subqueries as a way to produce column-focussed data (for presenting in Google Sheets)\n\nWithin the DMD codebase, I'm using the excellent `sqlc` for my database queries.\n\nOne thing it doesn't really support - which is fair - is heavily nested queries, including those with subqueries, so I don't generally lean on them as it's hard to use them with `sqlc`.\n\nHowever, when preparing this presentation, I found that as I _wasn't_ tied to `sqlc` while preparing the data, I could write the most cursed and nested subqueries I wanted 😈\n\nThis allowed me to think about how to handle getting the right columns in the resulting output, at the cost of it generally being a less efficient query, but working for what I needed.\n\nFor instance, the below query:\n\n\n select\n (\n \tselect\n \tcount(*)\n \tfrom\n \t(\n \t\tselect\n \t\tdistinct package_manager, package_name\n \t\tfrom\n \t\trenovate\n \t)\n ) as num_deps,\n (\n \tselect\n \tcount(*)\n \tfrom advisories\n \twhere advisory_type = 'DEPRECATED'\n ) as total_deprecated,\n (\n \tselect\n \tcount(*)\n \tfrom advisories\n \twhere advisory_type = 'UNMAINTAINED'\n ) as total_unmaintained\n\n\nThis _isn't_ the most complex example I could share, but out of an abundance of shame I won't go into it too much 🫣",
"title": "Some useful SQL(ite) tips I've learned",
"updatedAt": "2026-02-11T20:22:30.000Z"
}