{
"$type": "site.standard.document",
"canonicalUrl": "https://rednafi.com/python/dataclasses-and-methods/",
"description": "Why you shouldn't add state-mutating methods to Python dataclasses. Maintain semantic clarity and use dataclasses as pure data containers.",
"path": "/python/dataclasses-and-methods/",
"publishedAt": "2023-12-16T00:00:00.000Z",
"site": "at://did:plc:fgtm2c26vfcj74rfmeggbyqj/site.standard.publication/3mnl6f7ob462z",
"tags": [
"Python"
],
"textContent": "Data classes are containers for your data - not behavior. The delineation is right there in\nthe name. Yet, I see state-mutating methods getting crammed into data classes and polluting\ntheir semantics all the time. While this text will primarily talk about data classes in\nPython, the message remains valid for any language that supports data classes and allows you\nto add state-mutating methods to them, e.g., Kotlin, Swift, etc. By state-mutating method, I\nmean methods that change attribute values during runtime. For instance:\n\nIn this case, calling the make_older method will change the value of age in-place.\n\nEvery time I spot a data class decked out with such methods, I feel like I'm looking at the\n[penguin with an elephant head] from the Family Guy. Whenever I traverse down to see how the\ninstances of the class are being used, more often than not, I find them being treated just\nlike regular mutable class instances with fancy reprs. But if you only need a nice repr\nfor your large OO class, adding a __repr__ to the class definition is not that difficult.\nWhy pay the price for building heavier data class instances only for that?\n\nIn Python, data classes are [considerably slower] to define and import compared to vanilla\nclasses. However, they serve a different purpose than your typical run-of-the-mill classes.\nWhen you decorate a class with the @dataclass decorator without changing any of the\ndefault parameters, Python automatically generates __init__, __eq__, and __repr__\nmethods. If you set @dataclass(order=True), it'll also generate __lt__, __le__,\n__gt__, and __ge__ special methods that enable you to compare and sort the data class\ninstances. All of this implicates that the construct was specifically designed to contain\nrich data that provides the means for you to create nice abstractions around lower-level\nprimitives.\n\nMy gripe isn't against using data classes because of their heavier size. If it were, Python\nprobably wouldn't be one of my favorite languages. I use data classes all the time and love\nhow they often allow me to craft nicer APIs with little effort. My issue is when people add\nstate-mutating methods to data classes. The moment you're doing that, you're breaking the\nsemantics of the data structure. You probably wouldn't use hashmaps to represent sequential\ndata even though Python currently [maintains insertion order] of the keys in dicts.\n\nIn Kotlin, I almost always define immutable data classes and pass them around in different\nfunctions that perform transformations and calculations. In Python, however, instantiating\nfrozen data classes (@dataclass(frozen=True)) is almost [twice as slow as mutable ones].\nSo I just set slots=True to make the instantiation quicker and call it a day. But in\neither case, if I need to add a method that mutates the attributes of the class instance, I\nreconsider whether a data class is the right abstraction for the problem at hand. The\nnecessity to add a state-mutating method is an indicator that you need a regular OO class.\nYou'll signal incorrect intent to the reader if you keep using data classes in this context.\n\nDataclasses are also great candidates for domain modeling with types. With the help of mypy,\nyou can leverage [sum types] to emulate [ADTs] as follows (using [PEP-695] generic syntax):\n\nBut it only works if your data containers don't exhibit any behavior. Here the data classes\nare just labels for values in a set that can contain the instances of the classes. Adding\nstate-mutating methods to either Barcode or Sku would break the semantics of how these\ntypes can be composed.\n\nI still think it's okay if you need to validate the data class attributes in a\n__post_init__ method or override the __eq__ or __hash__ for some reason. Read-only\nmethods are also acceptable since they don't do in-place state modification. Comparing two\ndata class instances that have read-only methods is not as awkward as comparing data class\ninstances with methods that mutate attributes. So if you need to slap a method on a data\nclass, write a function and pass the instance as a parameter or write a normal class with a\nrepr and add the method there. This way, the reader won't have to wonder whether your data\ncontainers have some hidden behavior attached to them or not.\n\n\n\n\n[penguin with an elephant head]:\n https://i.imgflip.com/3gb0nh.jpg?a472776\n\n[considerably slower]:\n https://discuss.python.org/t/improving-dataclasses-startup-performance/15442/20\n\n[maintains insertion order]:\n https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/en/\n\n[twice as slow as mutable ones]:\n https://docs.python.org/3.12/library/dataclasses.html#frozen-instances\n\n[sum types]:\n https://fsharpforfunandprofit.com/posts/discriminated-unions/\n\n[adts]:\n https://threeofwands.com/algebraic-data-types-in-python/\n\n[pep-695]:\n https://peps.python.org/pep-0695/",
"title": "Banish state-mutating methods from data classes"
}