Open source tool for analyzing your social media data (want to help me make it better)?
Hugging Face Forums [Unofficial]
March 3, 2026
I classified 2,500 posts from Bluesky’s 10 most-followed accounts using an open-source LLM pipeline I built called cat-vader. The classified dataset is now public on my HF profile.
cat-vader is a fork of cat-llm, a package I originally built for classifying open-ended survey responses in academic research. It supports multi-label classification, automatic category discovery, and direct Threads/Bluesky API integration.
Some findings from the analysis:
1. Account identity explains ~62% of engagement variance
2. Political and social content outperforms within any given account
3. Economy posts appear to tank engagement, but the effect disappears once you control for who’s posting
Full writeup: What Bluesky’s Most-Followed Accounts Actually Post About - Chris Soria
GitHub: GitHub - chrissoria/catvader · GitHub
Discussion in the ATmosphere