Reduce boilerplate code with Python's dataclasses
Recently, my work needed me to create lots of custom data types and draw comparison among them. So, my code was littered with many classes that somewhat looked like this:
This class only creates a CartesianPoint type and shows a pretty output of the instances created from it. However, it already has two methods inside, init and repr that don't do much.
Dataclasses
Let's see how data classes can help to improve this situation. Data classes were introduced to python in version 3.7. Basically they can be regarded as code generators that reduce the amount of boilerplate you need to write while generating generic classes. Rewriting the above class using dataclass will look like this:
In the above code, the magic is done by the dataclass decorator. Data classes require you to use explicit type annotations and it automatically implements methods like init, repr, eq etc beforehand. You can inspect the methods that dataclass auto defines via Python's help.
Using default values
You can provide default values to the fields in the following way:
Using arbitrary field type
If you don't want to specify your field type during type hinting, you can use Any type from python's typing module.
Instance ordering
You can check if two instances are equal without making any modification to the class.
However, if you want to compare multiple instances of dataclasses, aka add gt or lt methods to your instances, you have to turn on the order flag manually.
By default, while comparing instances, all of the fields are used. In our above case, all the fields x, y, zof point_1 instance are compared with all the fields of point_2 instance. You can customize this using the field function.
Suppose you want to acknowledge two instances as equal only when attribute x of both of them are equal. You can emulate this in the following way:
You can see the above code prints out True despite the instances have different y and z attributes.
Adding methods
Methods can be added to dataclasses just like normal classes. Let's add another method called dist to our CartesianPoint class. This method calculates the distance of a point from origin.
Making instances immutable
By default, instances of dataclasses are mutable. If you want to prevent mutating your instance attributes, you can set frozen=True while defining your dataclass.
If you try to mutate the any of the attributes of the above class, it will raise FrozenInstanceError.
Making instances hashable
You can turn on the unsafe_hash parameter of the dataclass decorator to make the class instances hashable. This may come in handy when you want to use your instances as dictionary keys or want to perform set operation on them. However, if you're using unsafe_hash make sure that your dataclasses don't contain any mutable data structure in it.
Converting instances to dicts
The asdict() function converts a dataclass instance to a dict of its fields.
Post-init processing
When dataclass generates the init method, internally it'll call post_init method. You can add additional processing in the post_init method. Here, I've added another attribute tup that returns the cartesian point as a tuple.
Refactoring the CartesianPoint class
The feature rich original CartesianPoint looks something like this:
Let's see the class in action:
Below is the same class refactored using dataclass.
Use this class like before.
Further reading
Discussion in the ATmosphere