Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreidt4amdvjw2t5bjiguoeygv2u7kizpbccvnr7not35xp7suxsg4xa",
    "uri": "at://did:plc:wszrgoqdwy3i2dfeub2mt3wf/app.bsky.feed.post/3mfbzmz3ykje2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreiajjkxtsfv62kcb4ee3qcwzx4uvmw6umwrb6w5mzfa3l5jy4luqzq"
    },
    "mimeType": "image/png",
    "size": 78437
  },
  "description": "How we built our own interface on top of GitHub Discussions to improve triage for Renovate's Open Source community.",
  "path": "/posts/2026/02/20/renovate-discussions-data/",
  "publishedAt": "2026-02-20T11:30:01.000Z",
  "site": "https://www.jvt.me",
  "tags": [
    "renovate",
    "open-source",
    "recently wrote about",
    "the maintainers",
    "as noted",
    "why we use Discussions",
    "taking my own advice",
    "reach for",
    "my local workflow with SQLite",
    "Datasette",
    "sql-studio",
    "`sqlc`",
    "writing a tool",
    "written about loving Evidence for data visualisation",
    "write me a Neovim plugin",
    "Evidence",
    "perfect",
    "self-described \"cautious skeptic\" of AI",
    "`crush`"
  ],
  "textContent": "As I recently wrote about, we use GitHub Discussions for triaging user requests in the Renovate project.\n\nThis process works really well for the maintainers (the team who have final say on new features, and whom have merge rights) and contributors (people who regularly contribute to Renovate's features, fix bugs, improve documentation and help answer user questions), and as noted, Renovate optimises for the people who do the work, rather than for what the users would necessarily want (although we take that into account).\n\nI'd recommend reading why we use Discussions if you want to understand more about our reasoning for Discussions instead of Issues, before going further.\n\nOne of the things that we find a little lacking is that the Discussions UI on GitHub is missing many features and views that would make it more straightforward to monitor our user requests and suggestions.\n\nSome of these gaps are specific to us as a project and how our process works, but some seem like \"low hanging fruit\" that would be great for GitHub to implement on their side, but Discussions doesn't seem super invested in recently.\n\nFor instance, Discussions have a native means to vote on posts, wehre users can anonymously show interest in i.e. getting a feature implemented, or to say that they're also affected by a bug. You'd think that because votes are natively implemented in Discussions, being able to ask \"what is the Discussion with the most votes on it?\" would be a reasonable ask, right? Well, you _cannot_ answer this from the web UI. To solve this, you need to reach for the API, which results in you needing to list _every_ Discussion to get the vote count, and then work out what's highest from there, as there's no way to order by votes.\n\nThis is a simple example of some insights we want to get into our community, but it's by no means the only question we're missing the means to answer.\n\nAs a maintainer on the project, and our community manager, I'd like to see information like:\n\n  * What new Discussions are there?\n    * Setting up an RSS-feed-to-Slack for the GitHub Discussions works well as a starting point, but you can sometimes miss things when there's a torrent of requests coming through\n  * Where have we had Discussions that haven't had a reply by _anyone_ , for some time?\n  * How many Discussions did we close this week/month?\n  * How many (possible) bugs have been reported recently?\n  * How many (actual) bugs have been reported recently?\n  * Are there any Discussions that had a bot request they provide more information, but they haven't replied since?\n    * As they're likely stale (but we won't close it)\n  * Are there any Discussions that had a bot / contributor / maintainer request they provide more information, but they _have_ replied since?\n    * As they need us to follow-up on them\n  * Where do we have highly upvoted requests from the community?\n    * As we may not be focussing on things the community want/need\n  * What has maintainer input needed?\n    * Where should I or the other maintainers focus on responding to?\n  * Which user(s) are being the most helpful and/or answering the most questions?\n    * Split by the maintainers and contributors, and anyone else, who may be folks who we may want to \"promote\" to a contributor, if they're interested\n    * This data is only available for the last 30 days, but we may want to see usefulness-over-time\n  * Which Discussions do I need to reply to, because there hasn't yet been a contributor or maintainer review?\n\n\n\nUnfortunately, none of this is straightforward to work out, and makes triage a little bit painful for the project.\n\n## Implementing our own tooling\n\nInstead of being grumpy about it, I set about taking my own advice, and decided to work on fixing that for myself.\n\nI had a long-term vision of building a maintainer-and-contributor-only web UI that folks could log into, see the various information that's relevant to us, and provide information about \"which Discussions are lower priority, as another maintainer/contributor has already answered it\", or \"which Discussions have you replied to that have been replied to by the user\".\n\nBefore I got too far in trying to design the right web interface and experience, I wanted to focus on the underlying data that we would need to ingest, as that would drive quick feedback.\n\nFolks who've worked with me over the last ~5 years have noticed that I'll often reach for the great tool that is SQLite, and will be unsurprised to know that this was going to be no different.\n\nBy focussing on extracting the data from GitHub into the local SQLite database, this would give me the fastest feedback. I could then utilise my local workflow with SQLite, as well as allow me to use great tools like Datasette or sql-studio, to write the queries I needed, without needing to also build out the complexity of a web UI.\n\nThis would mean I could push the building of the web UI to a later point, at the point that others on the team would be wanting to use it without hand-writing SQL queries.\n\n## Hitting the API\n\nTo decide on the data model for the SQLite database, I first had to understand how GitHub's data model worked, and so spent some time exploring the GraphQL APIs available.\n\n(Aside: I'm still sad that GitHub no longer host `graphql.github.com` for an interactive GraphQL browser)\n\nOnce I'd done my poking around, I set about writing a set of Go tooling to hit the GitHub API, fetch Discussion data, and sync it to the local database via `sqlc`.\n\nRenovate is a very active project, with 5 years worth of Discussions data, so I had a lot of data to sync from GitHub to the local database - at least for the first time.\n\nGitHub's rate limits seemed to agree with me:\n\nWhile doing significant backfill (culminating in a ~125MB SQLite database, with ~11k Discussions and ~55k comments), I ended up repeatedly hitting GitHub's rate limits - which was very understandable!\n\nTo monitor the rate limits for each Installation of my GitHub App, I ended up writing a tool that would give me a view of how close to rate limits I was running:\n\nAs each Installation ID got close to the rate limits, I cancelled the backfill process, and I then moved to another Installation ID, until I'd completed the backfill.\n\nAt a later point, I implemented the ability to auto-rotate with Installation the GitHub App was using, so I wouldn't have to keep cancelling the process, but could instead transparently move between them.\n\n## SQLite-as-UI\n\nWith the data available, I was already able to start getting a good amount of value from interrogating the SQLite database, instead of GitHub's web UI.\n\nFor instance, my first minimal database would allow me to query \"which Discussions have been closed without an answer being selected\":\n\n\n    select\n        title,\n        url,\n        category_name\n    from\n        discussions\n    where\n        (\n            state = 'RESOLVED'\n            or state = 'OUTDATED'\n        )\n        and answered_by is null\n    order by\n        updated_at desc\n\n\nThis would give me the answer to a relatively straightforward query, which I also could've used the GitHub UI for.\n\n(This specific example is more from a project hygiene basis, as it's useful to know where we've maybe missed that there was a resolution that fixed the user's request)\n\nUsing the GitHub UI would mean that I'd have to work through ~23 pages of GitHub's fairly heavy web UI to see all the results. Now, I can query that data in ~0.01s!\n\nLet's look at some more complex queries, but for straightforward asks that are seemingly impossible to query with GitHub's UI:\n\n\n    -- how many open \"Request Help\" Discussions are there which haven't had any replies?\n    select\n        number,\n        title as title,\n        url as url,\n        discussions.created_at as created_at,\n        discussions.updated_at\n    from\n        discussions\n        left join discussion_comments on discussion_comments.discussion_number = discussions.number\n    where\n        (\n            state = 'OPEN'\n            or state = 'REOPENED'\n        )\n        and discussion_comments.discussion_number is null\n        and category_name = 'Request Help'\n    group by\n        discussions.number,\n        discussions.created_at\n    order by\n        discussions.created_at asc\n\n\nAnd:\n\n\n    -- which \"Request Help\" Discussions are the highest upvoted?\n    select\n        discussions.upvote_count,\n        number,\n        title as title,\n        url as url,\n        discussions.created_at as created_at,\n        discussions.updated_at,\n        count(discussion_comments.discussion_number)\n    from\n        discussions\n        left join discussion_comments on discussion_comments.discussion_number = discussions.number\n    where\n        (\n            state = 'OPEN'\n            or state = 'REOPENED'\n        )\n        and category_name = 'Request Help'\n    group by\n        discussions.number,\n        discussions.upvote_count\n    order by\n        discussions.upvote_count desc\n\n\nIn each of these cases, now we have the raw data, if I can think of a query (or ask a Large Language Model to generate a query for me) I can get the data.\n\nThat's pretty awesome.\n\nOver time, I've added in more data, such as comments on the Discussion, the full body of the Discussion and/or comments, and the upvote count on Discussions.\n\nThis required a bit more logic, given the database already had some of the data synced, so I crafted a `backfill` command, which would look up what missing data may exist for a given Discussion locally, see what was available via the API, and sync it if needed.\n\n## MVP for visualisation with Evidence\n\nAs well as the raw queries I was writing, I was yearning for a little bit of visualisation for things like \"how many new Discussions were raised\", as well as a way to have some pre-canned queries I could reload the data for.\n\nI've written about loving Evidence for data visualisation before (including having GPT-4.1 write me a Neovim plugin), and so I reached for it again as a MVP for a UI, before needing to build my own.\n\nAdding Evidence into the mix allowed me to start building visualisations of data.\n\nIn particular, I wanted to visualise how many Discussions we had, based on when they were created:\n\nWhen looking at this graph, it's also useful to compare this to how many we've closed over time:\n\nWe can also see this on a per-category basis with open/close stats over time, for instance with the \"Request Help\" category:\n\nThese visualisations are very useful, but I've found that for daily triage work, creating purely table-based rendering of the data has been enough:\n\nThis may be all the UI we need for some time - it provides ways of rendering commonly needed queries, visualising data in interesting ways, and if we wanted to, we could deploy this to a (private) GitHub Pages deployment, so only maintainers and contributors could access it.\n\n## Open Source\n\nThe project will be Open Source _soon_. I need to finish cleaning up some Git commit history - as I've not been as focussed on making sure my commit messages are perfect, but general snapshots after implementing something - before it's open to the world, but it's the plan.\n\nThe project will be available under the `AGPL-3.0-only`, as Renovate is, and I hope it will be useful for others to look at.\n\n## What did I learn?\n\nI don't think these are going to be particularly surprising to anyone who's been building software for years, but I thought it's worth re-stating some lessons I've reminded myself of, while I've been building this.\n\n### Optimise for fast feedback\n\nWhen building something like this, getting an indication early on of whether this is going to be useful is very important.\n\nIterating quickly to build out the data model, get a chance to play with some of the queries, and then add more data or functionality was important, and helped confirm the utility of the work.\n\nNow, I could have also got other maintainers and contributors to test drive it to get additional feedback, but given the majority of this work is done by me and one other, I decided to focus on what key requirements I wanted to fulfill before it went out to others.\n\n### MVP → MVP → MVP\n\nFocussing on smaller iterations of the Minimum Viable Product provided value in iterations, where instead of trying to build the final web UI, I focussed on SQLite as the interface.\n\nThen, when I needed a little more, I used Evidence as the web UI.\n\nIf I'd aimed for the web UI as my primary focus, I'd still be building it, and getting no value 😅\n\nIt might be we never need to move on from Evidence, as it gives us exactly what we need, which allows us to focus on the important work - helping Renovate users! - rather than building a new web UI to replace the current web UI we have.\n\n### Large Language Models have been useful\n\nI'm a self-described \"cautious skeptic\" of AI, and I'm trying to do more with AI where makes sense.\n\nThis project has been a great opportunity to try and push them a little more than I have in the past, especially as this effectively a side project I've been working on alongside my other work. Being able to push it forwards with a bit of outsourcing to an LLM has been useful, and it's helped me ship a little bit quicker than I would have otherwise.\n\nLLMs have not only been useful for writing the underlying Go code to query and sync data from GitHub to my SQLite database, but it's also been useful with writing queries.\n\nIt's been really fun pointing `crush` to my SQLite database, and watching it work out how to find an answer for me. Although some of these more complex queries are things I could've written, it's been nice to parallelise myself, allowing the LLM to go and write the query, while I can focus on something else.\n\nIn particular, when using Evidence, a slight rough edge is that it uses DuckDB's SQL syntax, which means that some SQLite queries I can write won't run in Evidence. It's been useful to outsource this to the LLM, who can then convert it for me.\n\n## Was it worth it?\n\nMaybe burying the lede a little bit, but **yes** , it was absolutely worth it.\n\nEven if it's only used by me, it's provided a _huge_ amount of value since starting the project in January.\n\nAt the start of each day, I'll come in, sync the latest changes from GitHub, and see what I need to go and triage. This is a much more useful process than opening the GitHub UI, and trying to work out where things are at.\n\nI'm hoping that this will also be useful for the other maintainers and contributors. So far, I've not shared it with them, mostly out of commit history shame 😹 I'm wanting to make sure that I can share it when it's ready for them to be able to pull and start using (without the risk of me force-pushing over changes.\n\nI'm hoping that some of these insights come to GitHub's UI, but until then, we've got our own control over improvements and insights that will make the experience better for us and our users!",
  "title": "Breaking free from GitHub Discussions' limitations",
  "updatedAt": "2026-02-20T11:30:01.000Z"
}