{
"$type": "site.standard.document",
"content": {
"$type": "site.standard.content.markdown",
"text": "In the last few weeks, there has been a lot of activity on Bluesky. Bluesky is a social network built on open standards. Specifically, it is built on top of the [AT Protocol](https://atproto.com/). Most (if not all) data is exposed via XRPC endpoints.\n\nThis post is a quick glance at the AT Protocol and [its Python SDK](https://atproto.blue/en/latest/index.html). To do that we'll create a script to download all the `#dataBS` posters and create a graph with the connections around that community.\n\nYou can [explore the final interactive graph online](https://public.graphext.com/2b808d92830c526b/index.html?section=graph&colorMap=cluster&areaMap=page_rank)!\n\n[](https://public.graphext.com/2b808d92830c526b/index.html)\n\n## Setup\n\nTo follow along, you'll need to install the [`atproto` Python package](https://github.com/MarshalX/atproto) and, since we'll be using some API endpoints that require authentication, you'll need to have an account there and get your credentials.\n\nOnce you have your credentials, you can create the client with:\n\n```python\nfrom atproto import Client\n\nclient = Client()\nprofile = client.login('yourusername.com', 'hopefully-not-12345678')\n```\n\nIf you are using `uv`, you can spin up a quick Jupyter Notebook with `atproto` installed with:\n\n```bash\nuvx --with 'atproto' --with 'jupyterlab' jupyter lab\n```\n\n## Getting the posts\n\nTo get all the `#dataBS` posts, we can use the `app.bsky.feed.searchPosts`. From the Python SDK, [this is mapped to `app.bsky.feed.search_posts`](https://atproto.blue/en/latest/atproto/atproto_client.models.app.bsky.feed.search_posts.html). The endpoint returns a `cursor` that we can use to paginate through the results.\n\n```python\ncursor = None\ndatabs_posts = []\n\nwhile True:\n fetched = posts = client.app.bsky.feed.search_posts(params={'q': '#databs', 'cursor': cursor})\n databs_posts = databs_posts + fetched.posts\n\n if not fetched.cursor:\n break\n\n cursor = fetched.cursor\n```\n\n## Getting the social graph\n\nEach post has a `post.author.handle` property that can be used in our next XRPC call to `app.bsky.graph.getFollows` endpoint. This endpoint returns all the actors that the given actor follows (also using the `cursor` property to paginate through the results).\n\nWe can write a quick function to get all the follows for a given actor:\n\n```python\ndef get_all_follows(author):\n cursor = None\n follows = []\n while True:\n fetched = client.app.bsky.graph.get_follows(params={'actor': author, 'cursor': cursor})\n follows = follows + fetched.follows\n if not fetched.cursor:\n break\n cursor = fetched.cursor\n return follows\n```\n\nAnd then we can use that function to get all the follows for all the `#dataBS` authors. Since we don't know how big the graph is, we'll be dumping the results into a CSV file.\n\nBefore writing the results, let's get the unique authors:\n\n```python\nunique_authors = list(set(post.author.handle for post in databs_posts))\n```\n\nNow we can loop through all the unique authors and get all their follows and some profile information (obtained from `app.bsky.actor.getProfile`):\n\n```python\nfrom tqdm import tqdm\n\nwith open('databs.csv', 'w') as f:\n\n f.write(\"source,target,source_avatar_url,source_posts_count,source_followers_count,source_follows_count\\n\")\n\n for source in tqdm(unique_authors):\n\n author_follows = get_all_follows(source)\n source_actor = client.app.bsky.actor.get_profile(params={'actor': source})\n\n for follow in author_follows:\n f.write(f\"{source},{follow.handle},{source_actor.avatar},{source_actor.posts_count},{source_actor.followers_count},{source_actor.follows_count}\\n\")\n\n```\n\nThis, in Novermber 2024, takes around 30 minutes to run. After that, you should have a `databs.csv` file with all the connections between the `#dataBS` authors.\n\n## Visualizing the Graph\n\nAlthough there are many libraries to create graphs in Python, I've been a big fan of [Graphext](https://graphext.com) for this kind of tasks as it allows you to share the graphs in an interactive way.\n\nUpload the `databs.csv` file to Graphext, make sure the columns are typed as `category` and, after creating a source-target graph, you should see something like this:\n\n\n\nThe graph is also [accessible and explorable online](https://public.graphext.com/2b808d92830c526b/index.html)!\n\n## Conclusion\n\nThis small post is a great example of how powerful open APIs can be. The AT Protocol is a great example of that. It allows us to build all kinds of applications on top of it!\n\nIf you want to learn more about the AT Protocol, I recommend checking out the [official documentation](https://atproto.com/docs), the [Python SDK](https://atproto.blue/en/latest/index.html), and of course, joining the discussion on [Bluesky](https://bsky.app/profile/davidgasquez.com).",
"version": "1.0"
},
"description": "In the last few weeks, there has been a lot of activity on Bluesky. Bluesky is a social network built on open standards. Specifically, it is built on top of the AT Protocol. Most (if not all) data is exposed via XRPC endpoints. This post is a quick glance at the AT Protocol an...",
"path": "/exploring-atproto-python",
"publishedAt": "2024-11-14T00:00:00.000Z",
"site": "at://did:plc:4z5i7njrld66ew36htufcwry/site.standard.publication/3mo43d2tmt2ov",
"textContent": "In the last few weeks, there has been a lot of activity on Bluesky. Bluesky is a social network built on open standards. Specifically, it is built on top of the AT Protocol. Most (if not all) data is exposed via XRPC endpoints.\n\nThis post is a quick glance at the AT Protocol and its Python SDK. To do that we'll create a script to download all the #dataBS posters and create a graph with the connections around that community.\n\nYou can explore the final interactive graph online!\n\n[](https://public.graphext.com/2b808d92830c526b/index.html)\n\nSetup\n\nTo follow along, you'll need to install the atproto Python package and, since we'll be using some API endpoints that require authentication, you'll need to have an account there and get your credentials.\n\nOnce you have your credentials, you can create the client with:\n\nIf you are using uv, you can spin up a quick Jupyter Notebook with atproto installed with:\n\nGetting the posts\n\nTo get all the #dataBS posts, we can use the app.bsky.feed.searchPosts. From the Python SDK, this is mapped to app.bsky.feed.search_posts. The endpoint returns a cursor that we can use to paginate through the results.\n\nGetting the social graph\n\nEach post has a post.author.handle property that can be used in our next XRPC call to app.bsky.graph.getFollows endpoint. This endpoint returns all the actors that the given actor follows (also using the cursor property to paginate through the results).\n\nWe can write a quick function to get all the follows for a given actor:\n\nAnd then we can use that function to get all the follows for all the #dataBS authors. Since we don't know how big the graph is, we'll be dumping the results into a CSV file.\n\nBefore writing the results, let's get the unique authors:\n\nNow we can loop through all the unique authors and get all their follows and some profile information (obtained from app.bsky.actor.getProfile):\n\nThis, in Novermber 2024, takes around 30 minutes to run. After that, you should have a databs.csv file with all the connections between the #dataBS authors.\n\nVisualizing the Graph\n\nAlthough there are many libraries to create graphs in Python, I've been a big fan of Graphext for this kind of tasks as it allows you to share the graphs in an interactive way.\n\nUpload the databs.csv file to Graphext, make sure the columns are typed as category and, after creating a source-target graph, you should see something like this:\n\nThe graph is also accessible and explorable online!\n\nConclusion\n\nThis small post is a great example of how powerful open APIs can be. The AT Protocol is a great example of that. It allows us to build all kinds of applications on top of it!\n\nIf you want to learn more about the AT Protocol, I recommend checking out the official documentation, the Python SDK, and of course, joining the discussion on Bluesky.",
"title": "Exploring AT Protocol with Python"
}