{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidi2mge7czfrksgfcnrafbd63twqysd4cjd4kzo42rytygesyc2xy",
"uri": "at://did:plc:46ti67tc37qcmwp2vaynk6fq/app.bsky.feed.post/3meiowr4okyn2"
},
"path": "/blog/debusine-write-task/",
"publishedAt": "2026-02-10T09:53:13.634Z",
"site": "https://www.freexian.com",
"tags": [
"Debusine",
"debusine.debian.net",
"salsa.debian.org",
"Sbuild",
"Lintian",
"Debdiff",
"available tasks",
"worker task",
"reprotest",
"Contribute section",
"we might add",
"lookup",
"@staticmethod",
"@staticmethod"
],
"textContent": "Debusine is a tool designed for Debian developers and Operating System developers in general. You can try out Debusine on debusine.debian.net, and follow its development on salsa.debian.org.\n\nThis post describes how to write a new worker task for Debusine. It can be used to add tasks to a self-hosted Debusine instance, or to submit to the Debusine project new tasks to add new capabilities to Debusine.\n\nTasks are the lower-level pieces of Debusine workflows. Examples of tasks are Sbuild, Lintian, Debdiff (see the available tasks).\n\nThis post will document the steps to write a new basic worker task. The example will add a worker task that runs reprotest and creates an artifact of the new type `ReprotestArtifact` with the reprotest log.\n\nTasks are usually used by workflows. Workflows solve high-level goals by creating and orchestrating different tasks (e.g. a Sbuild workflow would create different Sbuild tasks, one for each architecture).\n\n## Overview of tasks\n\nA task usually does the following:\n\n * It receives structured data defining its input artifacts and configuration\n * Input artifacts are downloaded\n * A process is run by the worker (e.g. `lintian`, `debdiff`, etc.). In this blog post, it will run `reprotest`\n * The output (files, logs, exit code, etc.) is analyzed, artifacts and relations might be generated, and the work request is marked as completed, either with `Success` or `Failure`\n\n\n\nIf you want to follow the tutorial and add the `Reprotest` task, your Debusine development instance should have at least one worker, one user, a debusine client set up, and permissions for the client to create tasks. All of this can be setup following the steps in the Contribute section of the documentation.\n\nThis blog post shows a functional `Reprotest` task. This task is not currently part of Debusine. The Reprotest task implementation is simplified (no error handling, unit tests, specific view, docs, some shortcuts in the environment preparation, etc.). At some point, in Debusine, we might add a `debrebuild` task which is based on buildinfo files and uses snapshot.debian.org to recreate the binary packages.\n\n## Defining the inputs of the task\n\nThe input of the reprotest task will be a source artifact (a Debian source package). We model the input with pydantic in `debusine/tasks/models.py`:\n\n\n class ReprotestData(BaseTaskDataWithExecutor):\n \"\"\"Data for Reprotest task.\"\"\"\n\n source_artifact: LookupSingle\n\n class ReprotestDynamicData(BaseDynamicTaskDataWithExecutor):\n \"\"\"Reprotest dynamic data.\"\"\"\n\n source_artifact_id: int | None = None\n\n\nThe `ReprotestData` is what the user will input. A `LookupSingle` is a lookup that resolves to a single artifact.\n\nWe would also have configuration for the desired `variations` to test, but we have left that out of this example for simplicity. Configuring variations is left as an exercise for the reader.\n\nSince `ReprotestData` is a subclass of `BaseTaskDataWithExecutor` it also contains `environment` where the user can specify in which environment the task will run. The environment is an artifact with a Debian image.\n\nThe `ReprotestDynamicData` holds the resolution of all lookups. These can be seen in the “Internals” tab of the work request view.\n\n## Add the new `Reprotest` artifact data class\n\nIn order for the reprotest task to create a new Artifact of the type `DebianReprotest` with the log and output metadata: add the new category to `ArtifactCategory` in `debusine/artifacts/models.py`:\n\n\n REPROTEST = \"debian:reprotest\"\n\n\nIn the same file add the `DebianReprotest` class:\n\n\n class DebianReprotest(ArtifactData):\n \"\"\"Data for debian:reprotest artifacts.\"\"\"\n\n reproducible: bool | None = None\n\n def get_label(self) -> str:\n \"\"\"Return a short human-readable label for the artifact.\"\"\"\n return \"reprotest analysis\"\n\n\nIt could also include the package name or version.\n\nIn order to have the category listed in the work request output artifacts table, edit the file `debusine/db/models/artifacts.py`: In `ARTIFACT_CATEGORY_ICON_NAMES` add `ArtifactCategory.REPROTEST: \"folder\",` and in `ARTIFACT_CATEGORY_SHORT_NAMES` add `ArtifactCategory.REPROTEST: \"reprotest\",`.\n\n## Create the new Task class\n\nIn `debusine/tasks/` create a new file `reprotest.py`.\n\nreprotest.py\n\n\n # Copyright © The Debusine Developers\n # See the AUTHORS file at the top-level directory of this distribution\n #\n # This file is part of Debusine. It is subject to the license terms\n # in the LICENSE file found in the top-level directory of this\n # distribution. No part of Debusine, including this file, may be copied,\n # modified, propagated, or distributed except according to the terms\n # contained in the LICENSE file.\n\n \"\"\"Task to use reprotest in debusine.\"\"\"\n\n from pathlib import Path\n from typing import Any\n\n from debusine import utils\n from debusine.artifacts.local_artifact import ReprotestArtifact\n from debusine.artifacts.models import (\n ArtifactCategory,\n CollectionCategory,\n DebianSourcePackage,\n DebianUpload,\n WorkRequestResults,\n get_source_package_name,\n get_source_package_version,\n )\n from debusine.client.models import RelationType\n from debusine.tasks import BaseTaskWithExecutor, RunCommandTask\n from debusine.tasks.models import ReprotestData, ReprotestDynamicData\n from debusine.tasks.server import TaskDatabaseInterface\n\n\n class Reprotest(\n RunCommandTask[ReprotestData, ReprotestDynamicData],\n BaseTaskWithExecutor[ReprotestData, ReprotestDynamicData],\n ):\n \"\"\"Task to use reprotest in debusine.\"\"\"\n\n TASK_VERSION = 1\n\n CAPTURE_OUTPUT_FILENAME = \"reprotest.log\"\n\n def __init__(\n self,\n task_data: dict[str, Any],\n dynamic_task_data: dict[str, Any] | None = None,\n ) -> None:\n \"\"\"Initialize object.\"\"\"\n super().__init__(task_data, dynamic_task_data)\n\n self._reprotest_target: Path | None = None\n\n def build_dynamic_data(\n self, task_database: TaskDatabaseInterface\n ) -> ReprotestDynamicData:\n \"\"\"Compute and return ReprotestDynamicData.\"\"\"\n input_source_artifact = task_database.lookup_single_artifact(\n self.data.source_artifact\n )\n\n assert input_source_artifact is not None\n self.ensure_artifact_categories(\n configuration_key=\"input.source_artifact\",\n category=input_source_artifact.category,\n expected=(\n ArtifactCategory.SOURCE_PACKAGE,\n ArtifactCategory.UPLOAD,\n ),\n )\n assert isinstance(\n input_source_artifact.data, (DebianSourcePackage, DebianUpload)\n )\n subject = get_source_package_name(input_source_artifact.data)\n version = get_source_package_version(input_source_artifact.data)\n\n assert self.data.environment is not None\n\n environment = self.get_environment(\n task_database,\n self.data.environment,\n default_category=CollectionCategory.ENVIRONMENTS,\n )\n\n return ReprotestDynamicData(\n source_artifact_id=input_source_artifact.id,\n subject=subject,\n parameter_summary=f\"{subject}_{version}\",\n environment_id=environment.id,\n )\n\n def get_input_artifacts_ids(self) -> list[int]:\n \"\"\"Return the list of input artifact IDs used by this task.\"\"\"\n if not self.dynamic_data:\n return []\n\n return [\n self.dynamic_data.source_artifact_id,\n self.dynamic_data.environment_id,\n ]\n\n def fetch_input(self, destination: Path) -> bool:\n \"\"\"Download the required artifacts.\"\"\"\n assert self.dynamic_data\n\n artifact_id = self.dynamic_data.source_artifact_id\n assert artifact_id is not None\n self.fetch_artifact(artifact_id, destination)\n\n return True\n\n def configure_for_execution(self, download_directory: Path) -> bool:\n \"\"\"\n Find a .dsc in download_directory.\n\n Install reprotest and other utilities used in _cmdline.\n Set self._reprotest_target to it.\n\n :param download_directory: where to search the files\n :return: True if valid files were found\n \"\"\"\n self._prepare_executor_instance()\n\n if self.executor_instance is None:\n raise AssertionError(\"self.executor_instance cannot be None\")\n\n self.run_executor_command(\n [\"apt-get\", \"update\"],\n log_filename=\"install.log\",\n run_as_root=True,\n check=True,\n )\n self.run_executor_command(\n [\n \"apt-get\",\n \"--yes\",\n \"--no-install-recommends\",\n \"install\",\n \"reprotest\",\n \"dpkg-dev\",\n \"devscripts\",\n \"equivs\",\n \"sudo\",\n ],\n log_filename=\"install.log\",\n run_as_root=True,\n )\n\n self._reprotest_target = utils.find_file_suffixes(\n download_directory, [\".dsc\"]\n )\n return True\n\n def _cmdline(self) -> list[str]:\n \"\"\"\n Build the reprotest command line.\n\n Use configuration of self.data and self._reprotest_target.\n \"\"\"\n target = self._reprotest_target\n assert target is not None\n\n cmd = [\n \"bash\",\n \"-c\",\n f\"TMPDIR=/tmp ; cd /tmp ; dpkg-source -x {target} package/; \"\n \"cd package/ ; mk-build-deps ; apt-get install --yes ./*.deb ; \"\n \"rm *.deb ; \"\n \"reprotest --vary=-time,-user_group,-fileordering,-domain_host .\",\n ]\n\n return cmd\n\n @staticmethod\n def _cmdline_as_root() -> bool:\n r\"\"\"apt-get install --yes ./\\*.deb must be run as root.\"\"\"\n return True\n\n def task_result(\n self,\n returncode: int | None,\n execute_directory: Path, # noqa: U100\n ) -> WorkRequestResults:\n \"\"\"\n Evaluate task output and return success.\n\n For a successful run of reprotest:\n -must have the output file\n -exit code is 0\n\n :return: WorkRequestResults.SUCCESS or WorkRequestResults.FAILURE.\n \"\"\"\n reprotest_file = execute_directory / self.CAPTURE_OUTPUT_FILENAME\n\n if reprotest_file.exists() and returncode == 0:\n return WorkRequestResults.SUCCESS\n\n return WorkRequestResults.FAILURE\n\n def upload_artifacts(\n self, exec_directory: Path, *, execution_result: WorkRequestResults\n ) -> None:\n \"\"\"Upload the ReprotestArtifact with the files and relationships.\"\"\"\n if not self.debusine:\n raise AssertionError(\"self.debusine not set\")\n\n assert self.dynamic_data is not None\n assert self.dynamic_data.parameter_summary is not None\n\n reprotest_artifact = ReprotestArtifact.create(\n reprotest_output=exec_directory / self.CAPTURE_OUTPUT_FILENAME,\n reproducible=execution_result == WorkRequestResults.SUCCESS,\n package=self.dynamic_data.parameter_summary,\n )\n\n uploaded = self.debusine.upload_artifact(\n reprotest_artifact,\n workspace=self.workspace_name,\n work_request=self.work_request_id,\n )\n\n assert self.dynamic_data is not None\n assert self.dynamic_data.source_artifact_id is not None\n self.debusine.relation_create(\n uploaded.id,\n self.dynamic_data.source_artifact_id,\n RelationType.RELATES_TO,\n )\n\n\nBelow are the main methods with some basic explanation.\n\nIn order for Debusine to discover the task, add `\"Reprotest\"` in the file `debusine/tasks/__init__.py` in the `__all__` list.\n\nLet’s explain the different methods of the `Reprotest` class:\n\n### `build_dynamic_data` method\n\nThe worker has no access to Debusine’s database. Lookups are all resolved before the task gets dispatched to a worker, so all it has to do is download the specified input artifacts.\n\n`build_dynamic_data` method lookup the artifact, assert that is a valid category, extract the package name and version, and get the environment in which it will be executed.\n\nThe `environment` is needed to run the task (`reprotest` will run in a container using `unshare`, `incus`…).\n\n\n def build_dynamic_data(\n self, task_database: TaskDatabaseInterface\n ) -> ReprotestDynamicData:\n \"\"\"Compute and return ReprotestDynamicData.\"\"\"\n input_source_artifact = task_database.lookup_single_artifact(\n self.data.source_artifact\n )\n\n assert input_source_artifact is not None\n self.ensure_artifact_categories(\n configuration_key=\"input.source_artifact\",\n category=input_source_artifact.category,\n expected=(\n ArtifactCategory.SOURCE_PACKAGE,\n ArtifactCategory.UPLOAD,\n ),\n )\n assert isinstance(\n input_source_artifact.data, (DebianSourcePackage, DebianUpload)\n )\n subject = get_source_package_name(input_source_artifact.data)\n version = get_source_package_version(input_source_artifact.data)\n\n assert self.data.environment is not None\n\n environment = self.get_environment(\n task_database,\n self.data.environment,\n default_category=CollectionCategory.ENVIRONMENTS,\n )\n\n return ReprotestDynamicData(\n source_artifact_id=input_source_artifact.id,\n subject=subject,\n parameter_summary=f\"{subject}_{version}\",\n environment_id=environment.id,\n )\n\n\n### `get_input_artifacts_ids` method\n\nUsed to list the task’s input artifacts in the web UI.\n\n\n def get_input_artifacts_ids(self) -> list[int]:\n \"\"\"Return the list of input artifact IDs used by this task.\"\"\"\n if not self.dynamic_data:\n return []\n\n assert self.dynamic_data.source_artifact_id is not None\n return [self.dynamic_data.source_artifact_id]\n\n\n### `fetch_input` method\n\nDownload the required artifacts on the worker.\n\n\n def fetch_input(self, destination: Path) -> bool:\n \"\"\"Download the required artifacts.\"\"\"\n assert self.dynamic_data\n\n artifact_id = self.dynamic_data.source_artifact_id\n assert artifact_id is not None\n self.fetch_artifact(artifact_id, destination)\n\n return True\n\n\n### `configure_for_execution` method\n\nInstall the packages needed by the task and set `_reprotest_target`, which is used to build the task’s command line.\n\n\n def configure_for_execution(self, download_directory: Path) -> bool:\n \"\"\"\n Find a .dsc in download_directory.\n\n Install reprotest and other utilities used in _cmdline.\n Set self._reprotest_target to it.\n\n :param download_directory: where to search the files\n :return: True if valid files were found\n \"\"\"\n self._prepare_executor_instance()\n\n if self.executor_instance is None:\n raise AssertionError(\"self.executor_instance cannot be None\")\n\n self.run_executor_command(\n [\"apt-get\", \"update\"],\n log_filename=\"install.log\",\n run_as_root=True,\n check=True,\n )\n self.run_executor_command(\n [\n \"apt-get\",\n \"--yes\",\n \"--no-install-recommends\",\n \"install\",\n \"reprotest\",\n \"dpkg-dev\",\n \"devscripts\",\n \"equivs\",\n \"sudo\",\n ],\n log_filename=\"install.log\",\n run_as_root=True,\n )\n\n self._reprotest_target = utils.find_file_suffixes(\n download_directory, [\".dsc\"]\n )\n return True\n\n\n### `_cmdline` method\n\nReturn the command line to run the task.\n\nIn this case, and to keep the example simple, we will run `reprotest` directly in the worker’s executor VM/container, without giving it an isolated virtual server.\n\nSo, this command installs the build dependencies required by the package (so `reprotest` can build it) and runs reprotest itself.\n\n\n def _cmdline(self) -> list[str]:\n \"\"\"\n Build the reprotest command line.\n\n Use configuration of self.data and self._reprotest_target.\n \"\"\"\n target = self._reprotest_target\n assert target is not None\n\n cmd = [\n \"bash\",\n \"-c\",\n f\"TMPDIR=/tmp ; cd /tmp ; dpkg-source -x {target} package/; \"\n \"cd package/ ; mk-build-deps ; apt-get install --yes ./*.deb ; \"\n \"rm *.deb ; \"\n \"reprotest --vary=-time,-user_group,-fileordering,-domain_host .\",\n ]\n\n return cmd\n\n\nSome reprotest variations are disabled. This is to keep the example simple with the set of packages to install and reprotest features.\n\n### `_cmdline_as_root` method\n\nSince during the execution it’s needed to install packages, run it as root (in the container):\n\n\n @staticmethod\n def _cmdline_as_root() -> bool:\n r\"\"\"apt-get install --yes ./\\*.deb must be run as root.\"\"\"\n return True\n\n\n### `task_result` method\n\nTask succeeded if a log is generated and the return code is 0.\n\n\n def task_result(\n self,\n returncode: int | None,\n execute_directory: Path, # noqa: U100\n ) -> WorkRequestResults:\n \"\"\"\n Evaluate task output and return success.\n\n For a successful run of reprotest:\n -must have the output file\n -exit code is 0\n\n :return: WorkRequestResults.SUCCESS or WorkRequestResults.FAILURE.\n \"\"\"\n reprotest_file = execute_directory / self.CAPTURE_OUTPUT_FILENAME\n\n if reprotest_file.exists() and returncode == 0:\n return WorkRequestResults.SUCCESS\n\n return WorkRequestResults.FAILURE\n\n\n### `upload_artifacts` method\n\nCreate the `ReprotestArtifact` with the log and the reproducible boolean, upload it, and then add a relation between the `ReprotestArtifact` and the source package:\n\n\n def upload_artifacts(\n self, exec_directory: Path, *, execution_result: WorkRequestResults\n ) -> None:\n \"\"\"Upload the ReprotestArtifact with the files and relationships.\"\"\"\n if not self.debusine:\n raise AssertionError(\"self.debusine not set\")\n\n assert self.dynamic_data is not None\n assert self.dynamic_data.parameter_summary is not None\n\n reprotest_artifact = ReprotestArtifact.create(\n reprotest_output=exec_directory / self.CAPTURE_OUTPUT_FILENAME,\n reproducible=execution_result == WorkRequestResults.SUCCESS,\n package=self.dynamic_data.parameter_summary,\n )\n\n uploaded = self.debusine.upload_artifact(\n reprotest_artifact,\n workspace=self.workspace_name,\n work_request=self.work_request_id,\n )\n\n assert self.dynamic_data is not None\n assert self.dynamic_data.source_artifact_id is not None\n self.debusine.relation_create(\n uploaded.id,\n self.dynamic_data.source_artifact_id,\n RelationType.RELATES_TO,\n )\n\n\n## Execution example\n\nTo run this task in a local Debusine (see steps to have it ready with an environment, permissions and users created) you can do:\n\n\n $ python3 -m debusine.client artifact import-debian -w System http://deb.debian.org/debian/pool/main/h/hello/hello_2.10-5.dsc\n\n\n(get the artifact ID from the output of that command)\n\nThe artifact can be seen in `http://$DEBUSINE/debusine/System/artifact/$ARTIFACTID/`.\n\nThen create a `reprotest.yaml`:\n\n\n $ cat <<EOF > reprotest.yaml\n source_artifact: $ARTIFACT_ID\n environment: \"debian/match:codename=bookworm\"\n EOF\n\n\nInstead of `debian/match:codename=bookworm` it could use the artifact ID.\n\nFinally, create the work request to run the task:\n\n\n $ python3 -m debusine.client create-work-request -w System reprotest --data reprotest.yaml\n\n\nUsing Debusine web you can see the work request, which should go to `Running` status, then `Completed` with `Success` or `Failure` (depending if `reprotest` could reproduce it or not). Clicking on the `Output` tab would have an artifact of type `debian:reprotest` with one file: the log. In the `Metadata` tab of the artifact it has Data: the package name and reproducible (true or false).\n\n## What is left to do?\n\nThis was a simple example of creating a task. Other things that could be done:\n\n * unit tests\n * documentation\n * configurable `variations`\n * running `reprotest` directly on the worker host, using the executor environment as a `reprotest` “virtual server”\n * in this specific example, the command line might be doing too many things that could maybe be done by other parts of the task, such as `prepare_environment`.\n * integrate it in a workflow so it’s easier to use (e.g. part of `QaWorkflow`)\n * extract more from the log than just pass/fail\n * display the output in a more useful way (implement an artifact specialized view)\n\n",
"title": "Freexian Collaborators: Writing a new worker task for Debusine (by Carles Pina i Estany)",
"updatedAt": "2026-02-10T00:00:00.000Z"
}