Raw Record Source

{
  "$type": "site.standard.document",
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreibyr4fwqodtcbrgb5rzx2ktpnidk4dbkrvouxqlgldbslaqud4mqu"
    },
    "mimeType": "image/jpeg",
    "size": 77155
  },
  "description": "A better version of the script I wrote for Wallabag",
  "path": "/readeck-api-like-star-favorite-entries-by-url-python-script/",
  "publishedAt": "2026-01-26T23:29:00.000Z",
  "site": "https://www.autodidacts.io",
  "tags": [
    "100DaysToOffload",
    "View more posts in this series.",
    "the script I wrote for Wallabag",
    "Readeck",
    "`https://en.wikipedia.org/wiki/M%C3%BCnchhausen_trilemma`"
  ],
  "textContent": "****Note:**** this post is part of #100DaysToOffload, a challenge to publish 100 posts in 365 days. These posts are generally shorter and less polished than our normal posts; expect typos and unfiltered thoughts! View more posts in this series.\n\n\n\n\nI have adapted the script I wrote for Wallabag to work with the Readeck API, and made it vastly better in the process.\n\nPlease refer to the previous article for basic information on usage, jq pipelines for bulk starring, configuration, and dependencies; that stuff is mostly the same.\n\nIn this post, I’ll just cover what’s _different._\n\n  1. This script uses authlib for OAuth bearer token authentication. This means, in your environment variables, you can skip username, password, etc, and just provide _tw_ o environment variables: `READECK_BASE_URL` and `READECK_API_TOKEN`.\n  2. There are two additional dependencies, so the uv pip install command is: `uv pip install json requests authlib urlparse`\n  3. Readeck doesn’t provide an “exists” endpoint. I tried using the search parameter, but it didn’t work, so what I ended up doing is fetching all bookmarks from the (paginated) API the first time it runs, and caching them in /tmp/readeck-bookmarks.json. Subsequent runs always use the cached version if it’s there.\n  4. While Wallabag seemed to store the URLs imported from the Pocket CSV basically unchanged, Readeck URLs don’t always match the original URL that was imported. So this script has a long chain of redirect-following and URL cleaning logic. If it fails to match a URL, it then fetches the headers and checks where the URL currently redirects to, and tries that. If that doesn’t work, it cleans the original URL, removing the scheme, query parameters, and fragment, and tries that. Then it tries a cleaned version of the redirected URL. Then it gives up.\n\n\n\nThough this script is way fancier than my Wallabag script, it _still_ doesn’t work as well. The Wallabag script was able to star 654 articles (ie, it missed one).\n\nThis script starred 621 articles. It admits it failed, for good reason, on two. I’m still sorting out the remaining 31. One of them was `https://en.wikipedia.org/wiki/M%C3%BCnchhausen_trilemma`…\n\nHere’s the script:\n\n\n    import requests\n    from urllib.parse import urlparse, urlunparse, unquote\n    import json\n    import os\n    import os.path\n    import sys\n\n    class ReadeckAPI:\n        def __init__(self, BASE_URL, API_TOKEN):\n            self.BASE_URL = BASE_URL\n            self.API_TOKEN = API_TOKEN\n            self.access_token = None\n\n        def fetch_bookmarks(self):\n            params = {'limit': 100}\n            headers = {'Authorization': f'Bearer {self.API_TOKEN}'}\n\n            # params = {}\n            endpoint = f\"{self.BASE_URL}/api/bookmarks\"\n            self.data = []\n            def fetch_page(endpoint):\n                response = requests.get(endpoint,\n                                    headers=headers,\n                                    params=params)\n                response.raise_for_status()\n                newdata = response.json()\n                # print(newdata)\n                self.data = self.data + newdata\n                # print(self.data)\n                print(len(newdata))\n                print(len(self.data))\n                if  response.headers['Current-Page'] != response.headers['Total-Pages']:\n                    print(f\"Page {response.headers['Current-Page']} of {response.headers['Total-Pages']}\")\n                    # print(f\"Fetching {response.links['next']['url']}\")\n                    fetch_page(response.links['next']['url'])\n\n            fetch_page(endpoint)\n\n        def get_redirect_destination(self,url):\n            try:\n                headers = {}\n                # headers = {'User-Agent':'If you get 403 errors or redirect following, try putting something in here'}\n                r = requests.head(url, headers=headers,allow_redirects=True)\n                if r.status_code == 301 or r.status_code == 200:\n                    return r.url\n                else:\n                    print(r.status_code)\n                    return False\n            except Exception as e:\n                print(f\"Redir err:{e}\")\n\n        def star_article_by_url(self, urls):\n            \"\"\"Star an article by its URL\"\"\"\n            for bookmark in self.data:\n                for url in urls:\n                    if url in bookmark['url'] or bookmark['url'] in url:\n                        print(f\"Starring {bookmark['id']} ({url})\")\n\n                        # Star the article\n                        headers = {'Authorization': f'Bearer {self.API_TOKEN}'}\n                        response = requests.patch(\n                            f\"{self.BASE_URL}/api/bookmarks/{bookmark['id']}\",\n                            headers=headers,\n                            data={'is_marked': True}\n                        )\n                        response.raise_for_status()\n\n                        return True\n                    else:\n                        # print(f\"URL doesn't match ({url} != {bookmark['url']}\")\n                        continue\n            print(f\"URL not found in bookmarks: {urls}\")\n            return False\n\n    # Initialize API client\n    Readeck = ReadeckAPI(\n        BASE_URL=os.environ[\"READECK_BASE_URL\"],\n        API_TOKEN=os.environ[\"READECK_API_TOKEN\"],\n    )\n\n    if __name__ == \"__main__\":\n        if len(sys.argv) != 2:\n          print(\"Usage: python script.py <article_url>\")\n          sys.exit(1)\n\n        url = sys.argv[1]\n\n        # Readeck.fetch_bookmarks()\n\n        # Cache all bookmarks for bulk starring\n        if os.path.isfile('/tmp/readeck-bookmarks.json') != True:\n            Readeck.fetch_bookmarks()\n            with open('/tmp/readeck-bookmarks.json', 'w+', encoding='utf-8') as f:\n                json.dump(Readeck.data, f, ensure_ascii=False, indent=4)\n        else:\n            with open('/tmp/readeck-bookmarks.json') as f:\n                Readeck.data = json.load(f)\n\n        # success = Readeck.star_article_by_url(url)\n        # print(\"success\",success)\n        # print(\"trying redirect\")\n        u = urlparse(url)\n        # print(u)\n        newu = u._replace(scheme=\"\",fragment=\"\",query=\"\")\n        # print(newu)\n        cleaned_url= urlunparse(newu)\n        # print(f\"Trying with cleaned URL {cleaned_url}\")\n\n        cleaned_decoded_url = unquote(url)\n        # print(f\"Trying with cleaned idecoded URL {cleaned_decoded_url}\")\n        urls = [url,cleaned_url,cleaned_decoded_url]\n        success = Readeck.star_article_by_url(urls)\n        if success == True:\n          exit(0)\n\n        redirected_url =  Readeck.get_redirect_destination(url)\n        if redirected_url and redirected_url != url:\n            # print(\"redir url not match url\")\n            # print(type(redirected_url))\n            # print(f\"URL redirects to {redirected_url}. Trying that URL.\")\n            u = urlparse(redirected_url)\n            # print(u)\n            newu = u._replace(scheme=\"\",fragment=\"\",query=\"\")\n            # print(newu)\n            cleaned_redirected_url = urlunparse(newu)\n            cleaned_redirected_decoded_url = unquote(cleaned_redirected_url)\n            # print(f\"Trying with cleaned redirected URL {cleaned_redirected_url}\")\n            redirected_urls = [redirected_url,cleaned_redirected_url,cleaned_redirected_decoded_url]\n            urls = urls + redirected_urls\n            success = Readeck.star_article_by_url(redirected_urls)\n            if success == True:\n                exit(0)\n\n\n        print(f\"Total fail for url: {url} ({urls})\")\n        exit(1)\n",
  "title": "Script: bulk star Readeck entries by URL (with URL cleaning)",
  "updatedAt": "2026-02-05T02:48:38.978Z"
}