{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreibtkoa7pslgwxv2rapwktrfy7nlqte6yzfhthu7phawaio37wyx5m",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mg7bw5xilia2"
  },
  "path": "/t/open-source-tool-for-analyzing-your-social-media-data-want-to-help-me-make-it-better/173982#post_1",
  "publishedAt": "2026-03-03T23:26:44.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "What Bluesky’s Most-Followed Accounts Actually Post About - Chris Soria",
    "github.com/chrissoria/cat-vader"
  ],
  "textContent": "I classified 2,500 posts from Bluesky’s 10 most-followed accounts using an open-source LLM pipeline I built called cat-vader. The classified dataset is now public on my HF profile.\n\ncat-vader is a fork of cat-llm, a package I originally built for classifying open-ended survey responses in academic research. It supports multi-label classification, automatic category discovery, and direct Threads/Bluesky API integration.\n\nSome findings from the analysis:\n\n  1. Account identity explains ~62% of engagement variance\n  2. Political and social content outperforms within any given account\n  3. Economy posts appear to tank engagement, but the effect disappears once you control for who’s posting\n\n\n\nFull writeup: What Bluesky’s Most-Followed Accounts Actually Post About - Chris Soria\nGitHub: github.com/chrissoria/cat-vader",
  "title": "Open source tool for analyzing your social media data (want to help me make it better)?"
}