Raw Record Source

{
  "$type": "site.standard.document",
  "content": {
    "$type": "site.standard.content.markdown",
    "text": "I've been using [Claude Code to take care of scrappy data cleaning tasks](/scrappy-data-cleaning) for a while. These days though, I'm using [Codex](https://github.com/openai/codex) as my coding agent. Similar to what I did with Claude, I've been \"fine-tuning\" Codex CLI to work on a few different vaguely defined tasks like classification, voting, filtering, or ranking.\n\nThe pattern in this post works surprisingly well when you have the following conditions:\n\n- Loosely defined open-ended tasks. e.g., tagging tweets with a set of predefined labels, extracting structured information from a GitHub issue, ...\n- Powerful agentic capabilities. Doing the task requires something more than a simple [`llm` call](https://github.com/simonw/llm) or [PydanticAI](https://ai.pydantic.dev/) script. e.g., using `gh api` CLI to get the number of stars of a repository.\n- [Structured outputs](https://github.com/openai/codex/blob/main/docs/exec.md#structured-output). You need a response in a certain shape! This is something `codex exec` can do that `claude` couldn't and is really powerful. e.g., return exactly `True` or `False` and nothing else.\n- Save money! Unlike `llm` or other tools/libraries that require an `OPENAI_API_KEY`, Codex can use your ChatGPT subscription, making things \"free\".\n\nLet's walk through an example of how everything fits together. We'll be building a very silly command, `chooser`, that compares two GitHub repositories and returns which one is friendlier to new users.\n\n## How\n\nSince we want an isolated `codex exec` experience, the first thing to do is to create a folder to act as the new agent home.\n\n```bash\nmkdir ~/.chooser\n```\n\nOnce the folder is there, sign up with your ChatGPT subscription by running `CODEX_HOME=~/.chooser codex`.\n\nNow, add an `AGENTS.md` file there to be used as the \"persona\" or \"instructions\" prompt across all calls. Here is mine.\n\n```md\n# Chooser\n\nChoose which project is friendlier to new users. Take into account DX and repository activity.\n\n## Skills\n\n- **GitHub Stats**. Use `gh api` to explore repository stats, descriptions, and any other metadata that can help you decide if the answer is not clear.\n- **GitHub Readme**. You can read any repository README with `gh api repos/$USER/$REPOSITORY/readme --jq '.content' | base64 --decode`.\n```\n\nYou can also add a [configuration file (`config.toml`)](https://github.com/openai/codex/blob/main/docs/config.md) to fix parameters like `model_reasoning_effort` or `model`.\n\nIf your task can be mapped to a [structured output file](https://github.com/openai/codex/blob/main/docs/exec.md#structured-output), create a `schema.json` relevant to the task. This is the one I used for `chooser`.\n\n```json\n{\n  \"type\": \"object\",\n    \"properties\": {\n      \"winner\": {\n        \"type\": \"string\",\n            \"enum\": [\n              \"item_a\",\n                \"item_b\"\n            ]\n        }\n    },\n    \"required\": [\n      \"winner\"\n    ],\n    \"additionalProperties\": false\n}\n```\n\nFinally, we can call a `codex exec` to tie things up.\n\n```bash\nCODEX_HOME=~/.chooser codex exec --full-auto --output-schema schema.json \"\"\"\n  Which project is friendlier to new users? Check contributor diversity.\n  <item_a>Flask</item_a>\n  <item_b>Django</item_b>\n\"\"\"\n```\n\nIn my case, codex took around 2 minutes (reasoning set to medium), used the `gh` CLI to do a bunch of calls, did a couple of web searches, and returned `{\"winner\":\"item_b\"}`. Yay! 🎉\n\nTo wrap things up, I created a bash script to make a better interface to it. I wanted to be able to run `chooser ItemA ItemB` and get the JSON. No extra logs or context. Here is the script. Place it in `~/.local/bin` and call it from anywhere!\n\n```bash\n#!/usr/bin/env bash\nset -euo pipefail\n\nexport CODEX_HOME=~/.chooser\n\nif (( $# < 2 )); then\n  echo \"Usage: ${0##*/} ITEM_A ITEM_B\" >&2\n  exit 1\nfi\n\nitem_a=\"$1\"\nitem_b=\"$2\"\nshift 2\n\nprompt=$(cat <<EOF\nWhich project is friendlier to new users?\n<options>\n<item_a>${item_a}</item_a>\n<item_b>${item_b}</item_b>\n</options>\nEOF\n)\n\ncodex_args=(\n  exec\n  -m gpt-5\n  -c model_reasoning_effort=\"medium\"\n  -c model_verbosity=\"low\"\n  --output-schema schema.json\n  --full-auto\n  --skip-git-repo-check\n)\n\nif (( $# > 0 )); then\n  codex_args+=(\"$@\")\nfi\n\ncodex \"${codex_args[@]}\" \"$prompt\" 2>/dev/null\n```\n\nI created yet another script, this time in Python to act as the orchestrator. From my tests, you can call up to 500 `chooser` instances at once without being rate-limited. Quite wild taking into account that is GPT-5 with thinking!\n\n## Conclusion\n\nI've done more than 10,000 invocations of `chooser` and friends for loosely defined tasks and am really happy with the results I'm getting. The main drawback I would note is that the \"fine-tuned\" command still has the [`codex` system prompt](https://github.com/openai/codex/blob/bac7acaa7c3476361859905f708eba82c53abf68/codex-rs/core/gpt_5_codex_prompt.md). One could fork and update it but my hunch is that doesn't work if you are using the ChatGPT subscription.\n\nI'm not sure how much I would have spent using plain OpenAI API calls, but having this pattern at hand makes experimenting a bit less scary for me. Also, I don't need to worry about model costs, just check if it fits that time window usage limits quota!",
    "version": "1.0"
  },
  "description": "I've been using Claude Code to take care of scrappy data cleaning tasks for a while. These days though, I'm using Codex as my coding agent. Similar to what I did with Claude, I've been \"fine-tuning\" Codex CLI to work on a few different vaguely defined tasks like classification...",
  "path": "/specializing-codex",
  "publishedAt": "2025-10-22T00:00:00.000Z",
  "site": "at://did:plc:4z5i7njrld66ew36htufcwry/site.standard.publication/3mo43d2tmt2ov",
  "textContent": "I've been using Claude Code to take care of scrappy data cleaning tasks for a while. These days though, I'm using Codex as my coding agent. Similar to what I did with Claude, I've been \"fine-tuning\" Codex CLI to work on a few different vaguely defined tasks like classification, voting, filtering, or ranking.\n\nThe pattern in this post works surprisingly well when you have the following conditions:\nLoosely defined open-ended tasks. e.g., tagging tweets with a set of predefined labels, extracting structured information from a GitHub issue, ...\nPowerful agentic capabilities. Doing the task requires something more than a simple llm call or PydanticAI script. e.g., using gh api CLI to get the number of stars of a repository.\nStructured outputs. You need a response in a certain shape! This is something codex exec can do that claude couldn't and is really powerful. e.g., return exactly True or False and nothing else.\nSave money! Unlike llm or other tools/libraries that require an OPENAIAPIKEY, Codex can use your ChatGPT subscription, making things \"free\".\n\nLet's walk through an example of how everything fits together. We'll be building a very silly command, chooser, that compares two GitHub repositories and returns which one is friendlier to new users.\n\nHow\n\nSince we want an isolated codex exec experience, the first thing to do is to create a folder to act as the new agent home.\n\nOnce the folder is there, sign up with your ChatGPT subscription by running CODEXHOME=~/.chooser codex.\n\nNow, add an AGENTS.md file there to be used as the \"persona\" or \"instructions\" prompt across all calls. Here is mine.\n\nYou can also add a configuration file (config.toml) to fix parameters like modelreasoningeffort or model.\n\nIf your task can be mapped to a structured output file, create a schema.json relevant to the task. This is the one I used for chooser.\n\nFinally, we can call a codex exec to tie things up.\n\nIn my case, codex took around 2 minutes (reasoning set to medium), used the gh CLI to do a bunch of calls, did a couple of web searches, and returned {\"winner\":\"itemb\"}. Yay! 🎉\n\nTo wrap things up, I created a bash script to make a better interface to it. I wanted to be able to run chooser ItemA ItemB and get the JSON. No extra logs or context. Here is the script. Place it in ~/.local/bin and call it from anywhere!\n\nI created yet another script, this time in Python to act as the orchestrator. From my tests, you can call up to 500 chooser instances at once without being rate-limited. Quite wild taking into account that is GPT-5 with thinking!\n\nConclusion\n\nI've done more than 10,000 invocations of chooser and friends for loosely defined tasks and am really happy with the results I'm getting. The main drawback I would note is that the \"fine-tuned\" command still has the codex system prompt. One could fork and update it but my hunch is that doesn't work if you are using the ChatGPT subscription.\n\nI'm not sure how much I would have spent using plain OpenAI API calls, but having this pattern at hand makes experimenting a bit less scary for me. Also, I don't need to worry about model costs, just check if it fits that time window usage limits quota!",
  "title": "Specializing Codex"
}