Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreif5f62jonr3usnuaza7j5rr2e45ss3qj5w7wks2mge7ndhhixhdii",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mpjrzhzhbdu2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreie7gji6p5yx45sx5zbt5vy3r6xh6i3l6jnzhzlrc4xuuqfhvywjmy"
    },
    "mimeType": "image/webp",
    "size": 487482
  },
  "path": "/priya_sajja_c336921bbda87/the-one-mistake-that-made-my-first-2000-github-issues-almost-useless-1n5d",
  "publishedAt": "2026-06-30T19:14:12.000Z",
  "site": "https://dev.to",
  "tags": [
    "ai",
    "devex",
    "developers",
    "github"
  ],
  "textContent": "If you've ever opened a GitHub repository with hundreds—or even thousands—of issues, you've probably experienced the same feeling I did. Where do you even begin? At first glance, GitHub Issues look like an endless list of bug reports, feature requests, enhancement proposals, questions, and pull requests. Reading them one by one quickly becomes overwhelming. When I started my developer experience research, I thought collecting GitHub issues would be the easy part.\n\nI was wrong. The real challenge wasn't reading the issues. The real challenge was organizing them into research data and make that data understandable to maintainers, co-UX designers for analysis as well the engineers who wanted to explore the raw data.\n\nAfter spending months analyzing GitHub repositories, I realized that the quality of your research depends far more on how you organize the data than on how many issues you collect. A spreadsheet full of issue links is not research. **A structured dataset is.**\n\n##  My First Spreadsheet Failed\n\nLike many researchers, I started with a **simple spreadsheet**.\n\nIt contained: Issue title, GitHub link, Open date, Close date, Brief summary After documenting hundreds of issues for 2-3 months of time,\n\nI tried answering simple questions.\n\n  * Which deployment stage fails most often?\n  * Which users experience the biggest challenges?\n  * Which releases introduce the most regressions?\n  * What problems appear repeatedly?\n  * Which workflows consume the most engineering time?\n\n\n\nI couldn't answer any of them. along with that i presented with the team of engineers they could't understand what am I supposed to capturing and I was not able to answer the questions from the research data (Spread sheet) because it is looks like just an information.\n\nAlthough **I had collected a large amount of information, I hadn't organized it in a way that supported analysis.**\n\nThat was the moment I redesigned my entire research process.\n\n##  Think Like a UX Researcher, Not a Spreadsheet User\n\nEvery GitHub issue contains much more than a bug. It contains evidence. Each issue can tell you:\n\n  * who experienced the problem,\n  * where it happened,\n  * when it happened,\n  * why it happened,\n  * how it was resolved,\n  * and what impact it had.\n\n\n\nInstead of creating one large \"Summary\" column, I started breaking every issue into smaller research categories.\n\nEach category answered a different research question. **It needed to look like a qualitative and quantitative research dataset—similar to the raw data you would collect from a survey.** I will explain step by step on organizing the data and creating structured spread sheet for research.\n\n##  Step 1: Organize Community Activity\n\nThe first section captures GitHub metadata. For every issue, I record information such as:\n\n  * Issue title\n  * Issue type\n  * Labels\n  * Status\n  * Created date\n  * Closed date\n  * Resolution time\n  * Linked pull request\n  * Brief summary\n  * Maintainer and the engineer conversation summary\n\n\n\nThis allows me to analyze: community activity, response times, maintainer workload, issue trends, release cycles, and project health.\n\n##  Step 2: Identify the Developer\n\nGitHub issues rarely begin with \"I'm a Platform Engineer.\" Instead, you have to infer the user's role from the technical context.\n\nFor example:\n\n  * Platform Engineer\n  * DevOps Engineer\n  * ML Engineer\n  * Software Engineer\n  * Data Scientist\n  * Site Reliability Engineer\n\n\n\nI also record supporting evidence from the issue itself.\n\nOnce hundreds of issues are categorized, patterns begin to emerge.\n\nYou can see which personas experience the most friction and which groups need better tooling or documentation.\n\n##  Step 3: Break the Workflow into Stages\n\nMost repositories involve complex workflows.\n\nInstead of labeling everything as a deployment problem, I divide the workflow into stages.\n\nFor AI infrastructure research, my **deployment workflow includes** :\n\nInstallation → Configuration → Model Download → Runtime Initialization → Readiness → Networking → Inference → Scaling → Version Upgrade,\n\nEvery issue is mapped to the stage where the failure occurred. This immediately reveals which parts of the workflow generate the most problems.\n\n##  Step 4: Separate Deployment from Operations\n\nOne **mistake I made early was grouping everything together**.\n\nDeployment problems are different from operational problems.\n\nSo I created separate workflows for: Deployment, Observability, Day-2 Operations, Maintenance\n\nEach workflow has its own categories. For observability, I record activities such as:\n\n  * checking logs,\n  * inspecting Kubernetes events,\n  * reviewing metrics,\n  * debugging latency,\n  * identifying root causes.\n\n\n\nFor maintenance, I categorize:\n\n  * upgrades,\n  * configuration changes,\n  * rollbacks,\n  * runtime migration,\n  * capacity management,\n  * scaling.\n\n\n\nSeparating these workflows made the data significantly easier to analyze.\n\n##  Step 5: Capture Technical Context\n\nWithout technical context, patterns disappear. For every issue, I capture information such as:\n\n  * product version,\n  * Kubernetes version,\n  * runtime,\n  * model family,\n  * deployment type,\n  * infrastructure,\n  * storage backend,\n  * GPU or CPU usage\n\n\n\nThis allows me to answer questions like:\n\n\"Do upgrade issues increase after a specific release?\"\n\n\"Are GPU deployments failing more frequently than CPU deployments?\"\n\n\"Does one runtime produce more networking issues than another?\"\n\n##  Step 6: Design Your Spreadsheet Around Research Questions\n\nThe biggest change I made was this:\n\nI stopped asking,\n\n**\"What information does this issue contain?\"**\n\nInstead, I started asking,\n\n**\"What research questions do I want this dataset to answer?\"**\n\nBefore adding a new column to the spreadsheet, I first designed my research questions. Then, I cross-checked every spreadsheet category against those questions to make sure the data I was collecting would actually help answer them.\n\nFor example, if one of my research questions was, \"Which deployment stage causes the most developer challenges?\", I needed columns for the deployment workflow, failure stage, developer goal, and deployment summary. If I wanted to understand \"Which developer persona experiences the most friction?\", I needed columns for the developer role, experience level, and supporting evidence. Likewise, if I wanted to analyze version-specific challenges, I needed columns for the KServe version, Kubernetes version, runtime, and upgrade information.\n\nEvery new spreadsheet column had to justify its existence by supporting one or more research questions. If a column couldn't contribute to answering a research question or generating meaningful insights, I removed it.\n\nThis simple validation process ensured that I wasn't just collecting data—I was collecting evidence. Over time, the spreadsheet evolved from a list of GitHub issues into a research-ready dataset that supported both qualitative and quantitative analysis.\n\n##  The Result\n\nOnce the data was cleaned and organized, everything changed.\n\nInstead of manually rereading hundreds of issues, I could:\n\n  * identify recurring patterns,\n  * measure developer pain points,\n  * compare versions,\n  * identify workflow bottlenecks,\n  * analyze personas,\n  * create dashboards,\n  * generate quantitative metrics,\n  * perform thematic coding,\n  * and produce evidence-based recommendations.\n\n\n\nThe **spreadsheet became the foundation** for qualitative and quantitative analysis.\n\n##  Final Thoughts\n\nCleaning GitHub issues may not sound exciting, but it is one of the most valuable steps in developer experience research.\n\nWithout structured data, hundreds of issues remain just individual conversations.\n\n**With structured data, they become evidence**.\n\nWhether you're a UX researcher, open-source maintainer, DevRel engineer, or contributor, investing time in organizing GitHub issues will make every future analysis faster, more accurate, and more actionable.\n\nDon't think of GitHub issues as bugs.\n\nThink of them as research participants waiting to tell you how developers really experience your product.",
  "title": "The One Mistake That Made My First 2000+ GitHub Issues Almost Useless"
}