ehmatthes.com

code, politics, and life

Review: Serious Python

Jul 26, 2020

Serious reading at the driving range.

I just finished reading Serious Python, by Julien Danjou. It was really satisfying to read a technical book cover to cover again, and it was also nice to read a technical book that didn’t require sitting in front of a laptop the whole time.

Serious Python was a really good read for me, because I had heard of most of the topics covered in the book but hadn’t dug into any of them in depth. As I was reading, I could think of a number of previous projects where I could have applied these concepts, and I’ll use many of them in my current and future projects.

The book is well organized; you can either read it cover to cover, or pick the parts you’re most interested in and read just those chapters. I read it cover to cover, skimming just a few parts that are less relevant to my work. You can find a complete table of contents here. I had a lot of takeaways, and even the concepts I won’t use directly leave me with a better understanding of how Python works internally. It also helps me understand some of the code I see in the libraries I use. The parts that will have the most impact on my projects are the sections on unit testing, methods, and optimizing your code.

I liked the profiling section.

I used to keep my books nice and clean, but then I found that marking them up helped me get a lot more out of them. I’ll share some of my takeaways from the sections I marked up most heavily in the book.


Writing better classes and methods

I have long been aware of more advanced concepts related to object-oriented programming, but I’ve gotten a lot of work done using just the basics. As I start to work on more significant data analysis projects though, with larger data sets and more complex analysis, I’m starting to see my code slow down. People used to claim this happens because Python is inherently a slow language, but it almost always means you’re just not using Python efficiently. That certainly applies to me. One clear takeaway from reading this book is how to write better classes and methods that make my intentions clearer to myself and other programmers, and result in more efficient code as well.

Using __slots__

The first takeaway for me is to consider more carefully how my classes will be used, and then consider the most efficient way to structure my classes. For example many of my classes will only ever use the attributes I define; they’ll never have new attributes added at run time. One of my projects uses readings from a stream gauge. The gauge measures things like river height and river flow rate. Here’s a simplified version of a class I created, focusing on river height readings and associated timestamps:

class RiverReading:

    def __init__(self, height, ts_reading):
        self.height = height
        self.ts_reading = ts_reading

I’ve written classes like this for a long time now, without spending much time thinking about what’s happening internally when this code is run. Now that my code is slowing down due to the volume of data I’m working with and the complexity of the analysis I’m doing, it’s helpful to know more about what Python is doing internally. One thing I learned from this book is that Python, when it interprets this code, builds the data structure in a way that allows us to add more attributes later. This is the simple flexibility that we love about Python, but when we use a large enough volume of data, it can start to affect performance.

If we know we’re going to use only these attributes, we can tell Python this by defining the __slots__ attribute. This tells Python to build a data structure for just these attributes, without the flexibility it normally includes. Here’s how the RiverReading class would look with __slots__ defined:

class RiverReading:

    __slots__ = ('height', 'ts_reading')

    def __init__(self, height, ts_reading):
        self.height = height
        self.ts_reading = ts_reading

I’m going to try this on my project when I’m finished this post, and I’m curious to see if this has any impact on the project’s performance.

Named Tuples

Using __slots__ is good if we know the attributes of a class won’t change, but we need to write a number of custom methods. If we don’t need any methods, we can use an even simpler data structure, named tuples. A named tuple keeps the dot notation that makes class attributes simple to work with, but stores the data even more efficiently. They’re useful if we don’t need methods, and the values assigned to attributes won’t change once an object is created.

Here’s how RiverReading would be written as a named tuple:

from collections import namedtuple

RiverReading = namedtuple('RiverReading', ['height', 'ts_reading'])

You create an object just like you would for a class, and access attributes using dot notation. You can also pass a list of default values for the attributes in a named tuple. The values in a named tuple object can’t be modified, but there’s a _replace() method that lets you create a new object from an existing one, replacing the values of whichever attributes you need to. There’s also an _asdict() method that returns a dictionary representation of the object.

I don’t know that I’ll use named tuples in my current project, because I need some methods to work with readings. But I know I’ve written regular classes in the past where named tuples would have sufficed, and I’ll keep my eye out for the opportunity to use them in new projects.

Data classes

There’s another option that we should be aware of, if the values assigned to attributes might need to be modified after creation. The dataclass structure is similar to a named tuple, but you can change the value of attributes using dot notation after an object has been created.

Here’s what RiverReading looks like as a dataclass:

import datetime
from dataclasses import dataclass


@dataclass
class RiverReading:
    ts_reading: datetime.datetime
    height: float


ts = datetime.datetime.now()
rr = RiverReading(ts, 23.25)

Data classes require you to declare what type of data each attribute will refer to. The @dataclass decorator automatically generates an __init__() method that creates attributes from the values you pass in when you create an object; you don’t need to manually attach these values to the self object. As with regular classes and namedtuples, you can assign default values for attributes in a dataclass. You can also write custom methods in a dataclass.

Data classes weren’t covered in the book, but reading about slots and named tuples reminded me to finally research them, and make sure I’m ready to use them in the next project I work on where they’d be appropriate.

Static methods and class methods

I’ve been aware of static and class methods before, and I’ve used them at times, but I haven’t been entirely clear about how to think about them when designing a class from scratch.

A static method doesn’t need access to any of an object’s attributes. When this is the case, we should decorate the function definition with @staticmethod. This tells Python that the method doesn’t need a self argument sent each time the method is called. Also, each new object doesn’t need a static method bound to it; Python just creates one method for the entire class, which is more efficient.

A class method needs access to class attributes, but it doesn’t need access to individual objects. A class method should be decorated with @classmethod. These methods automatically receive an argument referencing the class, which is usually labelled cls or klass to avoid a name clash with the keyword class.


Optimization and performance

The second big takeaway for me was a more disciplined and informed approach to refactoring. Since I’ve mostly worked on small solo projects for nontechncial users, I’ve gotten away with writing messy code that works. This has been perfectly fine, and I wouldn’t change much going back. Most of these were one-off projects, and there was no need for optimization; my time was better spent working with the results of my code, not the code itself.

Now that I’m writing more complex code and collaborating more with other technical people, I need to pay more attention to what my code looks like and how it performs after I’ve gotten through the exploratory phase of a project. This book was hugely helpful in offering a clear approach to optimization, with a focus on performance. In the past, I would just look for my ugliest and most repetitive code and start refactoring, writing clearer comments and breaking things into more coherent chunks. Sometimes I’d even write a few tests along the way. After reading Serious Python, my approach will be:

I was aware of these kinds of tools and approaches, but I never had a pressing need to learn about them earlier. This book offered a great high-level overview that really helps me get started on a more disciplined approach to optimization. I’m going to start with cProfile, and see how far that takes me. I’m really looking forward to using this on one of my current projects, which takes about a minute to run and has code that’s messy enough that I’m embarassed to show it in its current state.


Other takeaways

I had many smaller takeaways. Here’s a brief summary of some of these:


It’s been a long time since I read a technical book cover to cover, and learned as much as I could from it. As a mostly self-taught programmer, I fell into many of the less disciplined and less efficient approaches that Julien Danjou set out to help people move past. I appreciate his efforts in putting together this fantastic resource.

If you want to work through Serious Python yourself, you can buy it direct from No Starch Press and you’ll get a copy of the ebook with your print copy. You can also find it at Barnes and Noble, and on Amazon. (I do not use affiliate links, and I was not asked to write this review.)