ehmatthes.com

code, politics, and life

Exploring fractals on a cloud computer

Oct 12, 2020

I’ve always been curious about fractals, and the book I’m reading uses fractals in an example. I started playing with the example, and found it a good chance to see how much faster a powerful cloud computer can render a fractal than my modest laptop. In this post you’ll see what a fractal is, how to generate an image of a fractal, and how to animate a fractal. We’ll do all of this locally, and then when we’ve found something interesting to focus on we’ll move the rendering to a more powerful cloud computer.

This post will not get into highly-optimized cloud deployments such as clusters, but it will show you how to run medium-size jobs much faster than you can locally, for just a few dollars of cloud computing time. You will need a credit card to try this, as the free tiers on most cloud platforms are no more powerful than a modest laptop.

By the time you’re finished, you’ll be able to generate an animation like this:

If you’re following along but don’t want to build up all the code files yourself, you can find all of the code shown here in this repository, with commits corresponding to each section in this post.



Generating a fractal image

Fractals can be visualized on the complex plane, which shows the real and imaginary components of complex numbers on separate axes. It might be helpful to have a quick overview of complex numbers.

Complex numbers

If you’re unfamiliar with complex numbers, they’re numbers made up of a real component and an imaginary component. A simple complex number would be written like this:

2 + 3j

In this notation, j is the square root of -1. Mathematicians tend to use i to represent this value, while computer scientists tend to use j to avoid confusion with the convention of using i to represent an index in a loop. An arbitrary complex number is typically written as a + bi. We can do many operations with complex numbers such as adding and subtracting, multiplying, squaring, and more.

In Python, you can define a complex number in several ways. You can simply use the notation just shown:

>>> c = 2 + 3j
>>> type(c)
<class 'complex'>

You can also explicitly construct a complex number by passing the real and imaginary components to the complex() function:

>>> c = complex(2, 3)
>>> c
(2+3j)

When you have a complex number, you can extract its real and imaginary parts at any time:

>>> c = 2 + 3j
>>> c.real
2.0
>>> c.imag
3.0

Once you have a complex number, you can do all of the common operations on them:

>>> c1 = 2 + 3j
>>> c2 = -1.5 + 2.4j
>>> c1 + c2
(0.5+5.4j)
>>> c1 - c2
(3.5+0.6000000000000001j)
>>> c1 * c2
(-10.2+0.2999999999999998j)
>>> c1**2
(-5+12j)
>>> abs(c1)
3.605551275463989

Complex numbers can be plotted on a plane in much the same way we plot coordinate pairs on an xy plane. Instead of an x-axis and a y-axis, we have a real axis and an imaginary axis. The real axis is usually horizontal, like the x axis, and the imaginary axis is typically vertical.

Complex numbers, despite their use of the word “imaginary”, have many real-world applications. Complex numbers appear in the physics of electromagnetic phenomena, fluid dynamics, and quantum mechanics.

Testing points in a fractal

A fractal can be generated by asking a question at every point in a given region on the complex plane. Consider the point p = 0.8 + 0.8j, which is a little up and to the right of the origin. Consider another complex number, c = -0.62772 - 0.42193j. We want to repeatedly apply the following sequence of operations, and ask if the result ever exceeds a critical value:

Let’s look at this sequence for the point 0.8 + 0.8j, with a critical value of 2:

Iteration Input value Input value squared plus c Abs value of result
0 0.8+0.8j -0.628+0.858j 1.063
1 -0.628+0.858j -0.97-1.499j 1.786
2 -0.97-1.499j -1.934+2.486j 3.15

After the 0th iteration, the value we’re examining is 1.063, which is less than our critical value of 2.0. After iteration 1, the value is 1.786. After iteration 2, the value for this point is 3.15, which exceeds our critical value of 2.0.

Not all points are as simple as this. Consider the point p = 0 + 0j, which is the origin in the complex plane. Let’s look at the sequence for this point:

Iteration Input value Input value squared plus c Abs value of result
0 0.00.0j -0.628-0.422j 0.756
1 -0.628-0.422j -0.412+0.108j 0.426
2 -0.412+0.108j -0.47-0.511j 0.694
3 -0.47-0.511j -0.668+0.058j 0.67
4 -0.668+0.058j -0.185-0.499j 0.533
5 -0.185-0.499j -0.843-0.237j 0.875
6 -0.843-0.237j 0.026-0.022j 0.035
7 0.026-0.022j -0.628-0.423j 0.757
8 -0.628-0.423j -0.413+0.109j 0.427
9 -0.413+0.109j -0.469-0.512j 0.694

The final value for each iteration is bouncing around between 0 and 1. We have no way of knowing at this point if it will ever break out of this region. Let’s graph these values over a larger number of iterations. Here’s what that pattern looks like:

The point 0 + 0j appears to bounce without leaving this region.

After 50 iterations, the result is still oscillating between 0 and 2. For any given point, we can’t tell whether the result will stay bounded as we complete more iterations; we can only state a number of iterations by which the point is still within the bounds we’ve set.

Generating a fractal image

To generate a fractal image, we choose a region and conduct this set of calculations for every point in the region. We then assign each point in the region a color based on how many iterations it took for that point to exceed the critical value. We’ll first do this mathematically, by making a scatter plot. But then, to focus on the visualization, we’ll simply generate an image.

To get started, make a new directory for this project. Then create a virtual environment and install the two libraries we’ll need:

$ mkdir exploring_fractals
$ cd exploring_fractals
$ python3 -m venv ef_env
$ source ef_env/bin/activate
(ef_env)$ pip install --upgrade pip
(ef_env)$ pip install numpy matplotlib
(ef_env)$ pip freeze > requirements.txt

We’re going to need NumPy to make a range of floats, and we’ll plot the fractal using Matplotlib. We’ll use the requirements.txt file to create an identical project environment on the remote server.

Here’s the first part of our code, which I’m saving as julia_set_plot.py:

"""Generate an image of the julia set."""

import numpy as np
from matplotlib import pyplot as plt


# Bounds and critical values
a_bound, b_bound = 1.8, 1.8
c = -0.62772 - 0.42193j
max_iterations = 20
escape_bound = 2
num_steps = 200

The values a_bound and b_bound specify the region of the complex plane that we’ll focus on. We’ll focus on the region bounded by a_bound + b_bound*j, a_bound - b_bound*j, -a_bound - b_bound*j, and -a_bound + b_bound*j. We need to specify how many iterations to try on each point; we’ll set this value at 20 right now to make sure our program runs reasonably quickly. The value num_steps specifies how many points to test betweeen -a_bound and a_bound, and -b_bound and b_bound. A value of 200 steps will give us a scatter plot with 200 * 200 points. We’ll increase this value once we know things are working.

Now we need a function to determine the number of iterations it takes to exceed the critical value for any given point, up to max_iterations:

...
num_steps = 200

def get_iterations(point):
    """How many iterations does it take to exceed the critical value?"""
    iteration = 0
    pv = point**2 + c
    while iteration < max_iterations:
        if abs(pv) > escape_bound:
            return iteration
        else:
            pv = pv**2 + c
            iteration += 1
    return iteration

The function get_iterations() takes in a complex point. It keeps track of the number of iterations, starting at 0. We square the point and add c, and call this pv for point value. While the iteration number is less than max_iterations, we check if the absolute value of the point value is less than the critical escape bound that we’ve previously set. If it is, we return the number of iterations. If the point value hasn’t exceeded the critical value yet, we prepare for another iteration by setting the new point value equal to the previous value squared plus c, and increment the number of iterations. Finally, we return the iteration number if we exit the loop without having exceeded the critical value.

Now we need a function that will loop over all the points in the region, and return these points along with the number of iterations for each point. Here’s the function, which we’ll call get_julia_points():

...
def get_iterations(point):
    ...

def get_julia_points():
    """Define the points, and the number of iterations for each point."""
    points, point_iterations = [], []
    for a in np.linspace(-a_bound, a_bound, num_steps):
        for b in np.linspace(-b_bound, b_bound, num_steps):
            point = complex(a, b)
            iterations = get_iterations(point)
            points.append(point)
            point_iterations.append(iterations)

    return (points, point_iterations)

We first make a list to store the points in the region, and a list to store the number of iterations for each point. We use the NumPy linspace() function to get a set of values evenly spaced between the bounds along the real axis, and between the bounds along the imaginary axis. At each of these points we create a complex number representing that point, get the number of iterations up to max_iterations that it takes to reach the critical value for each point, and then append these values. Finally, we return the two lists.

Now we need a function to plot a set of julia points.

...
def get_iterations(point):
    ...

def get_julia_points():
    ...

def plot_julia_points(points, point_iterations):
    # Plot set.
    x_values = [p.real for p in points]
    y_values = [p.imag for p in points]

    plt.style.use('classic')
    fig, ax = plt.subplots(figsize=(14, 10))

    ax.scatter(x_values, y_values, c=point_iterations, cmap=plt.cm.viridis,
            edgecolors='none', s=4)
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    plt.show()

This function takes in the list of points, and the number of iterations for each point. We then separate the complex points into their real and imaginary parts for plotting along the x and y axes respectively. Finally we make a scatter plot, using a colormap.

Now we can put all this together to generate a fractal image:

...
def get_iterations(point):
    ...

def get_julia_points():
    ...

def plot_julia_points(points, point_iterations):
    ...

if __name__ == '__main__':
    points, point_iterations = get_julia_points()
    plot_julia_points(points, point_iterations)

We fill the points and point_iterations lists, and then call plot_julia_points(). Here’s the result:

A low-resolution image of the Julia set.

Generating a higher resolution plot

When you’re writing code that processes a large amount of data, it’s good to start with a subset of that data during the intitial phase of the project. This way you get results quickly, and you don’t spend time waiting for buggy code to run over the entire dataset. At one point in developing the code above, I accidentally took the absolute value of each point too early, and I ended up with a blob in the middle of the plot instead of a fractal shape. It was interesting to see, but it’s not what I was after. It was nice to get a quick image of the blob and correct the code, rather than taking minutes to render buggy output.

Now that we know the code is correct, let’s generate a higher-resolution plot. To change the resolution, we need to increase the value of num_steps, and decrease the size of each point. Here’s the plot that’s generated with num_steps = 1000 and s=1:

A higher resolution image of the Julia set.

We can also change the value of max_iterations to spend more time examining whether each point exceeds the critical value; here’s a plot with max_iterations = 300:

A plot with max_iterations = 300.

Next we’ll benchmark this code, so we can know if we’re improving things or not.

Simple benchmarking

Let’s start to get a rough benchmark for this program. Before we can time this code, we need to have it save the plot file as an image instead of showing it in Matplotlib’s viewer. Otherwise we’ll be timing how long we keep the plot viewer open, rather than timing the actual program execution.

Make a new directory in your project folder called output. Then change the line plt.show() to this:

    filename = f"output/julia_plot_mpl-{num_steps}-maxiter-{max_iterations}.png"
    plt.savefig(filename)

Now we can get a rough sense of the efficiency of any algorithm we try. We’ll use time.perf_counter() to time the two main function calls. First add a call to import time at the top of the file. Then modify the main section to time the execution of the analysis and plotting sections of the program:

if __name__ == '__main__':
    start = time.perf_counter()

    points, point_iterations = get_julia_points()
    generated_points = time.perf_counter()

    plot_julia_points(points, point_iterations)
    generated_plot = time.perf_counter()

    # Summarize execution time:
    analysis_time = round(generated_points - start, 1)
    plotting_time = round(generated_plot - generated_points, 1)
    total_time = round(generated_plot - start, 1)

    print(f"num_steps: {num_steps}")
    print(f"max_iterations: {max_iterations}")

    print(f"\nSpent {analysis_time} seconds generating julia points.")
    print(f"Spent {plotting_time} seconds generating plot.")
    print(f"Spent {total_time} seconds in total.")

Now when we run the file, we get a helpful summary of how complex the plot was, and how long it took to generate:

num_steps: 1000
max_iterations: 300

Spent 6.9 seconds generating julia points.
Spent 10.1 seconds generating plot.
Spent 17.0 seconds in total.

It’s interesting to note that this is taking longer to plot than it is to generate the points.

Let’s try an even higher resolution plot. If we double the value of num_steps, we’ll process 4 times as many points. I’m guessing this will take about 4 times as long, or just over a minute. Let’s see:

num_steps: 2000
max_iterations: 300

Spent 27.1 seconds generating julia points.
Spent 39.4 seconds generating plot.
Spent 66.5 seconds in total.

This was correct. Next we’ll make it easier to experiment with the resolution by using command line arguments for num_steps and max_iterations.

Using argparse for CLI arguments

It would be nice to specify the main parameters for a plot without having to change the code directly in the file. Let’s add some command-line arguments to allow setting the values for num_steps and max_iterations.

Here’s what the first part of the file looks like, using argparse to build a command-line argument to set the value of num_steps:

"""Generate an image of the julia set."""

import time, argparse

import numpy as np
from matplotlib import pyplot as plt

# Parse cli arguments.
parser = argparse.ArgumentParser()

parser.add_argument('--num-steps',
    help="How many steps between horizontal and vertical bounds?",
    action='store', type=int, default=200)

args = parser.parse_args()


# Bounds and critical values
a_bound, b_bound = 1.8, 1.8
c = -0.62772 - 0.42193j
max_iterations = 300
escape_bound = 2


def get_iterations(point):
    ...

def get_julia_points():
    """Define the points, and the number of iterations for each point."""
    points, point_iterations = [], []
    for a in np.linspace(-a_bound, a_bound, args.num_steps):
        for b in np.linspace(-b_bound, b_bound, args.num_steps):
            ...

def plot_julia_points(points, point_iterations):
    ...
    filename = f"output/julia_plot_mpl-{args.num_steps}-maxiter-{max_iterations}.png"
    plt.savefig(filename)

if __name__ == '__main__':
    ...
    # Summarize execution time:
    analysis_time = round(generated_points - start, 1)
    plotting_time = round(generated_plot - generated_points, 1)
    total_time = round(generated_plot - start, 1)

    print(f"num_steps: {args.num_steps}")
    print(f"max_iterations: {max_iterations}")
    ...

We import argparse, and then make an instance of ArgumentParser. We add an argument --num-steps, and define it as an integer with a default value of 200. We call parse_args(), which assigns any arguments that have been defined to the object args. If you’re unfamiliar with building CLI arguments in Python, I’ve found this article from Real Python quite helpful.

A CLI argument of the form --num-steps will be assigned to the attribute args.num_steps. Anywhere we used num_steps, we now need to use args.num_steps. Notice we remove the line num_steps = 200 as well.

We can see that this works by first running the file without a CLI argument:

$ python julia_set_plot.py
num_steps: 200
max_iterations: 300

Spent 0.3 seconds generating julia points.
Spent 0.8 seconds generating plot.
Spent 1.1 seconds in total.

This is what we expected; without passing a specific value for num_steps, the default value of 200 is used. Now let’s try 400 steps:

$ python julia_set_plot.py --num-steps 400
num_steps: 400
max_iterations: 300

Spent 1.1 seconds generating julia points.
Spent 1.9 seconds generating plot.
Spent 3.0 seconds in total.

This works as expected. We can now do the same for max_iterations:

parser.add_argument('--num-steps',
    ...

parser.add_argument('--max-iterations',
    help="What's the max number of iterations to try on each point?",
    action='store', type=int, default=20)

args = parser.parse_args()
...

Make sure you remove the line max_iterations = 300, and change all references to max_iterations to args.max_iterations.

Running with no arguments gives a quick, low-resolution plot, but now we can specify exactly how detailed to make the plot by using both arguments:

$ python julia_set_plot.py --num-steps 800 --max-iterations 500
num_steps: 800
max_iterations: 500

Spent 7.8 seconds generating julia points.
Spent 6.6 seconds generating plot.
Spent 14.4 seconds in total.

If you’re curious to explore different boundaries, you could build in more CLI arguments for values such as a_bound and b_bound.

Using PIL instead of Matplotlib

I started using Matplotlib because this was a mathematical investigation, and we were plotting values at specific points. But what we’re really interested in are the images that are being generated. If we’re just generating images, it’s more efficient to work with an imaging library than a plotting library. We’ll generate the images directly, rather than having a plotting framework generate a plot and then convert that plot into an image. However, we’ll still use Matplotlib’s colormaps to specify the color for each point.

To do this we’ll use the PIL imaging library, from the pillow package. Pillow should already be installed as a dependency of Matplotlib, but if it’s not for some reason you can use the command pip install pillow. Make sure you add a call to import PIL in the import section of the file.

We need to build the set of points differently, because PIL expects information about what color each pixel should be, rather than a series of x and y values. Here’s what get_julia_points() should look like now:

def get_julia_points():
    """Get the number of iterations for each point.
    Returns a list of rows."""
    point_iterations = []
    for a in np.linspace(-a_bound, a_bound, args.num_steps):
        row = []
        for b in np.linspace(-b_bound, b_bound, args.num_steps):
            point = complex(a, b)
            num_iterations = get_iterations(point)
            row.append(num_iterations / args.max_iterations)

        point_iterations.append(row)

    return point_iterations

This function needs to generate a list of rows, where each row contains the information for each pixel in the row. So we get rid of the list points, and just keep the list point_iterations. Inside the first loop, over the real components of the complex points, we make a new list called row. In the inner loop we create each complex point, and get the number of iterations for that point. Instead of appending the value of num_iterations, we divide this value by args.max_iterations to get a normalized value. We then append this normalized value to the current row. After filling each row, we append the full row to point_iterations. We just return the list point_iterations, which contains all the rows that have been calculated.

Now we need a function to generate the image. This function takes the place of plot_julia_points():

def generate_julia_image(point_iterations):
    """Generate an image of the julia set."""
    
    colors = plt.cm.viridis(point_iterations)*255
    colors = np.array(colors, dtype=np.uint8)
    new_image = PIL.Image.fromarray(colors)

    filename = f"output/julia_plot_pil-{args.num_steps}-maxiter-{args.max_iterations}.png"
    new_image.save(filename)
    print(f"Wrote file: {filename}")

This is shorter than the plotting code. We convert the values in point_iterations to colors using the Viridis colormap from Matplotlib. We multiply each of these values by 255, to convert them to the color format that PIL expects. Then we convert these color values to a NumPy array, which is easier and more efficient for PIL to use. We use the fromarray() function to generate a new image, and save this image.

The main section of the program looks a little different now as well:

if __name__ == '__main__':
    start = time.perf_counter()

    point_iterations = get_julia_points()
    generated_points = time.perf_counter()

    generate_julia_image(point_iterations)
    generated_image = time.perf_counter()

    # Summarize execution time:
    analysis_time = round(generated_points - start, 1)
    image_generation_time = round(generated_image - generated_points, 1)
    total_time = round(generated_image - start, 1)

    print(f"num_steps: {args.num_steps}")
    print(f"max_iterations: {args.max_iterations}")

    print(f"\nSpent {analysis_time} seconds generating julia points.")
    print(f"Spent {image_generation_time} seconds generating image.")
    print(f"Spent {total_time} seconds in total.")

When we ran the plotting version of the program with num_steps=800 and max_iterations=500, it took 7.8 seconds to generate the points and 6.6 seconds to generate the plot, for a total of 14.4 seconds. Let’s see how fast it works now:

$ python julia_set_plot.py --num-steps 800 --max-iterations 500
Wrote file: output/julia_plot_pil-800-maxiter-500.png
num_steps: 800
max_iterations: 500

Spent 7.9 seconds generating julia points.
Spent 0.2 seconds generating image.
Spent 8.0 seconds in total.

It takes about the same time to generate the points, which is expected, but it only takes two tenths of a second to generate the image! This is a much more efficient approach if all we’re interested in is the final image.

There’s another advantage to working with an imaging library. The value of num_steps is now the pixel size of the final image. We don’t have to think about dpi issues and plot sizes to work out the resolution of our output image; we just set num_steps to the pixel size we want our final image to have. If we generate a 4000-pixel image, we can zoom in and see a higher level of detail for this image.

A zoomed-in portion of a 4000-pixel image.

It’s also much faster to generate these higher-resolution images:

$ python julia_set_plot.py --num-steps 4000 --max-iterations 500
num_steps: 4000
max_iterations: 500

Spent 169.0 seconds generating julia points.
Spent 4.3 seconds generating image.
Spent 173.3 seconds in total.

This is still somewhat slow, but it’s almost twice as fast as it would have been using the plotting approach. Next we’ll focus on animating this fractal.

Animating a fractal

There are many ways you can generate a slightly different fractal. You can zoom in or out by changing the bounds of the image. You can move the bounds in any direction. You can change the value of c, and since c is a complex number you can do this in a number of different ways. You can vary the real part of c, the imaginary part, or both.

CLI arguments for num_frames and framerate

Let’s start by adding CLI arguments for num_frames and framerate, which we’ll want to modify often as we explore various animations, and when we want to generate higher-quality animations:

parser.add_argument('--num-frames',
    help="How many frames do you want in the animation?",
    action='store', type=int, default=20)

parser.add_argument('--framerate',
    help="What frame rate do you want for the animation?",
    action='store', type=int, default=5)

This will let us generate quick, low-resolution animations by default, but also allow us to generate larger, smoother animations when we want to.

Varying c values

To keep things simple, we’ll just vary one factor: the imaginary part of c. We’ll generate a list of values to use for c, then make a separate image for each value of c, and then generate a single animation file from these separate images.

We’ll want to use a different c value for each frame in the animation. Here’s a function that takes in a value for how much to increment the imaginary component of c overall. This function returns a dictionary where the keys are the frame number, and the values are the c values:

def get_plot_nums_c_vals(c_imag_increment):
    """Return a set of c values for a fractal animation.
    c values range from complex(a, b) to complex(a, b+c_imag_increment),
      where a and b are the initial components of c.

    We want one c-value for every frame in the animation.
    Return a dict of form {'00001': complex}
      The keys will be used in the filename, to make sure image files are
      processed in the correct order for the animation.
    """
    c_initial = complex(-0.62772, -0.42193)
    c_imag_max = c_initial.imag + c_imag_increment
    c_dict = {}
    for plot_num, c_imag in enumerate(
            np.linspace(c_initial.imag, c_imag_max, args.num_frames)):
        key = f"{plot_num:05}"
        new_c = complex(c_initial.real, c_imag)
        c_dict[key] = new_c

    return c_dict

We start with the same c value we were using to generate a single image, which we assign to c_initial. We then get a range of values for c_imag, the imaginary component of c. We build a sequence of complex numbers with the same real component, and varying imaginary components. The keys are not just numbers; they’re zero-padded numbers. These will be used to generate a sequence of unique filenames, which can be sorted alphabetically. (We’ll use ffmpeg to generate the animation, and this program sorts image files alphabetically when generating an animation.)

Python provides a simple way to generate zero-padded numbers. Using f-strings, a numerical value followed by a colon and a value like 05 will produce a number padded to the specified number of characters:

>>> x = 4
>>> f"{x:05}"
'00004'

Note: Make sure you remove the initial definition we had for c near the beginning of the file. Now that we’re making animations, all of the values of c will come from this function.

Generating a sequence of images

First, make a new directory to hold the images that will be used to generate the animation file:

julia_set$ mkdir output/animation_files

This will allow us to have a specific place that we can store animation files, and empty each time we want to generate a new animation.

Now that we can generate a set of c values, we can generate one image for each value of c. We’ll do this in a function called generate_images():

def generate_images():
    """Generate a sequence of images to make an animation."""
    plot_nums_c_vals = get_plot_nums_c_vals(1.5)

    # Generate one file for each value of c.
    for plot_num, c in plot_nums_c_vals.items():
        point_iterations = get_julia_points(c)
        generate_julia_image(point_iterations, plot_num)

We pass a value for c_imag_increment in the call to get_plot_nums_c_vals(). This means the c values will range evenly from -0.62772 - 0.42193j to -0.62772 + 1.07807j. We fill the dict plot_nums_c_vals, and then loop over this dictionary. For each value of c, we grab the point_iterations list and call generate_julia_image(). The definition of get_julia_points() needs a parameter for c, since it will be different every time it’s called:

def get_julia_points(c):
    """Define the points, and the number of iterations for each point.
    Returns a list of rows."""
    point_iterations = []
    for a in np.linspace(-a_bound, a_bound, args.num_steps):
        row = []
        for b in np.linspace(-b_bound, b_bound, args.num_steps):
            point = complex(a, b)
            num_iterations = get_iterations(point, c)
            row.append(num_iterations/args.max_iterations)

        point_iterations.append(row)

    return point_iterations

The call to get_iterations() needs to pass c, and get_iterations() needs to accept c:

def get_iterations(point, c):
    ...

The function generate_julia_image() needs a slight modification as well:

def generate_julia_image(point_iterations, plot_num):
    """Generate an image of the julia set."""
    colors = plt.cm.viridis(point_iterations)*255
    colors = np.array(colors, dtype=np.uint8)
    new_image = PIL.Image.fromarray(colors)

    filename = f"output/animation_files/julia_plot_{plot_num}.png"
    new_image.save(filename)
    print(f"Wrote file: {filename}")

We add a parameter for plot_num, and include this value in the filename. We also save the output to output/animation_files/.

Now we need to call the function generate_images(), and update the timing code:

if __name__ == '__main__':
    start = time.perf_counter()

    generate_images()
    generated_images = time.perf_counter()

    # Summarize execution time:
    image_generation_time = round(generated_images - start, 1)
    total_time = round(generated_images - start, 1)

    print(f"\nGenerated {args.num_frames} image files.")
    print(f"num_steps: {args.num_steps}")
    print(f"max_iterations: {args.max_iterations}")

    print(f"Spent {image_generation_time} seconds generating images.")
    print(f"Spent {total_time} seconds in total.")

Make sure you remove the calls to get_julia_points() and generate_julia_image() that were in the main section as well.

Here’s the results of running this code:

$ python julia_set_plot.py 
Wrote file: output/animation_files/julia_plot_00000.png
Wrote file: output/animation_files/julia_plot_00001.png
...
Wrote file: output/animation_files/julia_plot_00019.png

Generated 20 image files.
num_steps: 200
max_iterations: 20
Spent 1.4 seconds generating images.
Spent 1.4 seconds in total.

At the default low resolution settings, it takes 1.5 seconds to generate 20 image files.

Generating an animation file

The program ffmpeg is a command-line tool for manipulating video files. One of the many things it can do is generate an animation from a set of image files. You’ll need to install ffmpeg if it’s not already set up on your system.

Once you have ffmpeg installed, you can use it from within a Python program using the os.system() function. Add a call to import os in the import section. Here’s a function that will generate an animation from the image files that were generated:

def generate_animation():
    """Process image files to generate an animation."""
    os.system(f"cd output/animation_files && ffmpeg -framerate {args.framerate} -pattern_type glob -i '*.png'   -c:v libx264 -pix_fmt yuv420p julia_animation.mp4")

We need to call this function in the main section, after the image files are generated:

if __name__ == '__main__':
    ...
    # Generate one file for each value of c.
    generate_images()
    generated_images = time.perf_counter()

    # Generate animation.
    generate_animation()
    generated_animation = time.perf_counter()

    # Summarize execution time:
    image_generation_time = round(generated_images - start, 1)
    animation_generation_time = round(generated_animation - generated_images, 1)
    total_time = round(generated_animation - start, 1)

    print(f"\nGenerated {num_frames} image files.")
    print(f"num_steps: {num_steps}")
    print(f"max_iterations: {max_iterations}")

    print(f"Spent {image_generation_time} seconds generating images.")
    print(f"Spent {animation_generation_time} seconds generating animation.")
    print(f"Spent {total_time} seconds in total.")

When we run this file, this is the summary part of the output:

Generated 20 image files.
num_steps: 200
max_iterations: 20
Spent 1.4 seconds generating images.
Spent 0.1 seconds generating animation.
Spent 1.5 seconds in total.

It takes one tenth of a second to generate the animation file from 20 low-resolution image files. This is a really jerky animation, but it shows that the process works:

We’ll do a bit more to clean things up before generating a higher quality animation, and exploring other sequences.

Clearing previous output

If you try to run the program a second time, you’ll get a prompt asking if you want to overwrite the current animation file. Also, if you run a program with an equivalent or greater number of frames, things will work fine. But if you generate an animation with a higher number of frames and then try to generate an animation with a lower number of frames, some of the higher-numbered frames will be included in the current animation.

To fix this, let’s automatically clear the previous output files before generating new files:

def generate_images():
    """Generate a sequence of images to make an animation."""
    plot_nums_c_vals = get_plot_nums_c_vals(1.5)

    # Clear previously-generated animation output files.
    os.system('rm -rf output/animation_files')
    os.system('mkdir -p output/animation_files')

    # Generate one file for each value of c.
    ...

We first remove the directory animation_files, and then make the directory. This effectively deletes all the files that were in output/animation_files.

Generating a longer, higher-resolution animation

The flag --num-steps sets the pixel size of each frame. The following command allows us to make a 500px animation with 90 frames and a framerate of 30, for a 3-second animation:

$ python julia_set_plot.py --num-steps 500 --num-frames 90 --framerate 30
...
Generated 90 image files.
num_steps: 500
max_iterations: 20
Spent 36.6 seconds generating images.
Spent 0.5 seconds generating animation.
Spent 37.0 seconds in total.

It took about 37 seconds, but this is the first reasonably pleasing animation we’ve made:

Now that things are working well, let’s implement parallel processing to speed things up locally, and set ourselves up to run much faster on a more powerful cloud computer.

Implementing parallel processing

If you’re unfamiliar with parallel processing, it’s an approach where a work load is spread over several processes instead of all the work being done in one process. This is an approach that shares the work load among the different processors in a multi-core CPU.

Start by adding a call to import multiprocessing in the import section of the file.

You can write code that spreads the load over as many processes as you want to; there’s an ideal number for each system, and it can take some experimentation to find the right number of processes to use. We’ll add a CLI argument to let us do this experimentation easily:

parser.add_argument('--num-processes',
    help="How many processes should be used?",
    action='store', type=int, default=2)

I know 2 processes works best on my system, and it’s also a safe default for the number of processes. In cases where parallel processing will produce a speedup, 2 processes will always help on any CPU with more than one core. When we run this code on a system that can handle more processes, we can use a CLI flag to bump the number of processes.

Here’s the loop that we were using to generate images in the function generate_images():

    for plot_num, c in plot_nums_c_vals.items():
        point_iterations = get_julia_points(c)
        generate_julia_image(point_iterations, plot_num)

This loop calls two separate functions for every combination of plot_num and c. It’s easier to write multiprocessing code if you’re just calling one function inside a loop, so we’ll move the call to get_julia_points() into the generate_julia_image() function. This way we’ll just have one function to call for each item in plot_nums_c_vals.

Let’s modify generate_images() to use multiprocessing instead of a simple for loop:

def generate_images():
    """Generate a sequence of images for an animation, in parallel."""
    plot_nums_c_vals = get_plot_nums_c_vals(1.5)

    # Clear previously-generated animation output files.
    os.system('rm -rf output/animation_files')
    os.system('mkdir -p output/animation_files')

    # Generate image files in parallel.
    args_list = [(plot_num, c) for plot_num, c in plot_nums_c_vals.items()]
    pool = multiprocessing.Pool(processes=args.num_processes)
    pool.map(generate_julia_image, args_list)

In place of the for loop, we use the map() method to do this. If you haven’t used the multiprocessing module before, the best way to understand this code is by looking at this line first:

   pool = multiprocessing.Pool(processes=args.num_processes)

This line creates a pool of processes. In the default version of this code, we’ll end up with a pool consisting of two processes.

Now let’s consider the next line:

    pool.map(generate_julia_image, args_list)

The map() function in Python applies a function to each item in a collection. We can only call one function using pool.map(), so we’ll move the get_julia_points() call to generate_julia_image(). This line calls generate_julia_image() once for each item in args_list, and it does this by spreading the workload over all the processes in the pool.

Each time we call generate_julia_image(), we need to send it two arguments, a value for plot_num and a value for c. We need to create a list of these arguments that pool.map() can use. The list of arguments is created by this line:

    args_list = [(plot_num, c) for plot_num, c in plot_nums_c_vals.items()]

This is a list comprehension that generates a sequence of tuples - one tuple for each combination of plot_num and c. Each tuple contains a plot_num string and a value for c. The pool.map() method calls generate_julia_image() once for each of these tuples, and spreads out this workload over the number of processes we have specified.

Now we’ll modify generate_julia_image() to work with these tuples:

def generate_julia_image(args_list):
    """Generate an image of the julia set."""
    plot_num, c = args_list
    point_iterations = get_julia_points(c)

    colors = plt.cm.viridis(point_iterations)*255
    ...

We update the function definition to accept just the one tuple, instead of separate arguments. On the first line of the body, we unpack this tuple. Also, we call get_julia_points() from inside this function. This goes against the principle of having each function do just one thing, but this is a much simpler approach than building a collection of point values for every image and then generating the images. In this way this function is still doing just one thing; it’s generating an image of a julia set; it’s just calling out to calculate those points now as well. Every program is a tradeoff between how tightly its parts are coupled, and how efficient it is. Here we’re tightening the coupling between the task of getting the points and the task of generating an image, but that’s going to be worthwhile for the efficiency we’ll gain, especially when we move to a system with a higher number of CPU cores.

We don’t need to change the main section at all. Here’s the output from running the program with its default values:

...
Generated 20 image files.
num_steps: 200
max_iterations: 20
Spent 1.4 seconds generating images.
Spent 0.1 seconds generating animation.
Spent 1.5 seconds in total.

This isn’t a significant speedup. When we use the multiprocessing module, there’s some additional work that needs to be done to distribute and monitor the workload across the multiple processes. With a small work load, this overhead can mean there isn’t much performance improvement, and in some cases it can even slow down the execution time.

But let’s try generating the same higher-resolution animation we did a moment ago, and see if that’s any faster:

$ python julia_set_plot.py --num-steps 500 --num-frames 90 --framerate 30
Wrote file: output/animation_files/julia_plot_00000.png
Wrote file: output/animation_files/julia_plot_00012.png
Wrote file: output/animation_files/julia_plot_00001.png
Wrote file: output/animation_files/julia_plot_00013.png
...
Generated 90 image files.
num_steps: 500
max_iterations: 20
Spent 22.4 seconds generating images.
Spent 0.5 seconds generating animation.
Spent 22.9 seconds in total.

What previously took 37.0 seconds now takes only 22.9 seconds, for a 1.6x speedup. You can see from the output that the images are not being created in order, and if you watch carefully you can see that the output appears roughly in batches of 2. This will be even clearer for people on systems with more than 2 cores, which we’ll all be able to see clearly in the last section where we move this work to a cloud server.

A slightly simpler way of defining args_list

In the last section I used a list comprehension to generate args_list, because you can almost always make a list of tuples to pass a set of arguments to a function using the multiprocessing module. This is what I’ve always done when I’ve used multiprocessing. But the second argument to pool.map() can be any iterator. The .items() method of a dictionary object is just an iterator; since the only arguments we need are the tuples put out by .items(), we can just use this to define the value for args_list:

args_list = plot_nums_c_vals.items()

This is simpler, and if all you’re passing to a function in pool.map() is the contents of a dictionary, this is an appropriate way to define args_list.

How many processes?

It’s not always clear how many processes is optimal for a given program and system. For example, I’m using a 13” Macbook Pro with a base level processor. System Report says this computer has 1 processor and 2 cores. You may find information about some systems that talks about virtual processors, and both physical cores and logical cores.

There are several ways to get the number of processes you should use. Here’s a method using the psutil module, which you’ll need to install with pip:

>>> import psutil
>>> psutil.cpu_count()
4
>>> psutil.cpu_count(logical=False)
2

When you call psutil.cpu_count(), the default value for the logical argument is True. My CPU has 4 logical cores. Passing logical=False, I can see that my system has 2 physical cores, as I saw with System Report. This kind of probing can be especially helpful when working on a remote system such as a cloud server.

To see what value to use for num_processes, run your code with a moderate workload using several different values of num_processes. I’m going to try generating a 10-second animation file at a framerate of 30fps, with 1, 2, 4, and 8 processes:

$ python julia_set_plot.py --framerate 30 --num-frames 300 --num-processes 1
...
Spent 23.5 seconds in total.
$ python julia_set_plot.py --framerate 30 --num-frames 300 --num-processes 2
...
Spent 14.4 seconds in total.
$ python julia_set_plot.py --framerate 30 --num-frames 300 --num-processes 4
...
Spent 14.6 seconds in total.
$ python julia_set_plot.py --framerate 30 --num-frames 300 --num-processes 8
...
Spent 15.1 seconds in total.

To summarize:

num_processes seconds
1 23.5
2 14.4
4 14.6
8 15.1

If you try to use more processes than your system can handle, the multiprocessing code that parses out workloads to individual processes has a harder time managing the distribution of work. It’s important to understand the optimal settings to use on any given system.

When you’re finished exploring your system, make sure to run pip freeze > requirements.txt again to add psutil to our set of requirements. This way we’ll be able to use this utility on the remote server as well.

More CLI arguments

Now that we have a sense of how to run this code reasonably efficiently on our own system, let’s add a few more CLI arguments that make it easier to explore fractals before moving to a cloud server.

Duration

When we want to make an animation, we often know the framerate we want, and how long we want the animation to last. We’ll specify a --duration flag; when this flag is set it will calculate the appropriate value for num_frames:

parser.add_argument('--duration',
    help="How long do you want the animation to be? (overrides --num-frames)",
    action='store', type=int)

In the help documentation, we note that passing a value for --duration will override any value that’s manually passed for --num-frames.

Here’s how we’ll use --duration:

...
args = parser.parse_args()

if args.duration:
    # Override num_frames, and calculate num_frames to get requested duration.
    args.num_frames = args.duration * args.framerate
...

c_imag and c_imag_increment

Let’s make it easier to choose the initial imaginary component of c, and how much to vary this component:

parser.add_argument('--c-imag-initial',
    help="What's the initial value for the imaginary component of c?",
    action='store', type=float, default=-0.42193)

parser.add_argument('--c-imag-increment',
    help="How much do you want to vary the imaginary component of c?",
    action='store', type=float, default=1.5)

We need to change the function get_plot_nums_c_vals():

def get_plot_nums_c_vals():
    ...
    c_initial = complex(-0.62772, args.c_imag_initial)
    c_imag_max = c_initial.imag + args.c_imag_increment
    c_dict = {}
    for plot_num, c_imag in enumerate(
            np.linspace(c_initial.imag, c_imag_max, args.num_frames)):
        ...

We no longer need any parameters for this function, and we need to add the prefix args. to c_imag_increment.

Finally, we need to remove the argument from the call to this function in generate_images():

def generate_images():
    """Generate a sequence of images for an animation."""
    plot_nums_c_vals = get_plot_nums_c_vals()
    ...

Other CLI arguments

I won’t show any more arguments here, but you might want to add arguments for the boundary of the region that’s being examined. If you’re interested in single images, you might also create a CLI flag like --single-image that would set num_frames to 1, and not call generate_animation(). You might even save the image file to a different location if this flag is set. You might also want to make an argument for which colormap to use, which would have a significant impact on the visual appeal of the final animation.

Selecting an animation to focus on

Now that we have all of these parameters, let’s explore this region a little more closely to find an animation to focus on when we run this code on a higher-performance server. One of the first things I tried was running this code with a higher value of max_iterations. This should create more detail in each image of the animation. I first did this at a low framerate and duration:

$ python julia_set_plot.py --max-iterations 500
...
Spent 5.5 seconds in total.

The initial image has more detail, and there’s something interesting happening near the end. Let’s increase the framerate and the duration, and see if things get more interesting:

$ python julia_set_plot.py --max-iterations 500 --framerate 15 --duration 5
...
Spent 20.4 seconds in total.

There’s some really interesting stuff happening near the end of this. This is working from a starting c.imag value of -0.42193, and increasing 1.5 from here, for a range of approximately -0.4 to 0.9. I’m going to try using a range that’s symmetrical around 0; I’ll start c_imag at -0.7, and end at 0.7. I’m also going to spend some processor time and generate a higher-resolution animation with a longer duration and a higher framerate. That will allow me to move frame-by-frame through the entire range and get a much better sense of where to focus.

$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 10 --num-steps 500 --c-imag-initial -0.7 --c-imag-increment 1.4
Spent 608.5 seconds in total.

This is really interesting, and this is one of the videos I want to make a high-resolution version of on a more efficient computer. This starts off with a mostly dark screen, then some interesting areas appear, then it grows to the large mostly-yellow shape, and then it recedes again following the same pattern in reverse. I’d like to see this whole sequence play out at a larger size, and in a longer high quality video.

I want to focus on the first half of the animation, where it grows from a mostly-dark region to the yellow region with a vertical orientation. There are several frames that get really intricate, and I’d like to focus in on that segment before setting up a remote server. To make a new animation focusing on the first half of this one, we’ll start at the same value for c_imag, but halve the value for --c-imag-increment:

$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 10 --num-steps 500 --c-imag-initial -0.7 --c-imag-increment 0.7
Spent 578.4 seconds in total.

This is nice; I’ll probably make a high-quality version of this as well. It starts off mostly dark, and then we watch the interesting regions grow. We see the intricate segment briefly appear, and then the large yellow region grows until it’s in a pleasing vertical orientation:

Now let’s see if we can focus on the portion of this animation that has the most intricate patterns. When I step through it frame by frame, the most interesting parts seem to happen between 4 and 5 seconds. In this one second it turns from light blue to blue and yellow to all yellow, before fading back into a yellow and blue pattern that resembles a series of spiral galaxies.

How do we make a new animation that focuses on seconds 4-5 of the previous animation? There were 10 seconds in the animmation and the increment was 0.7, so every second covers an increment of 0.07 in the imaginary compoenent of c. After 4 seconds, c_imag has increased by 4 * 0.07 = 0.28. c_imag started at -0.7, so this time we want to start at -0.7 + 0.28 = -0.42. We want to end one second later in this last animation, which again is an increment of 0.07. So we’ll use --c-imag-increment 0.07 in the next animation.

$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 10 --num-steps 500 --c-imag-initial -0.42 --c-imag-increment 0.07
Spent 608.8 seconds in total.

This is really enjoyable to watch; we can really see the spiral regions come into existence and then fade back out. But the animation spends an unsatisfyingly long time in the less interesting large yellow region, so let’s try one more segment.

We want to start at the same place, so --c-imag-initial will be the same. But the most interesting part of the video is the first six seconds. The total increment over 10 seconds for this last animation was 0.07, so for each second the increment was 0.007. We want to keep the first six seconds, which covered an increment of 6 * 0.007 = 0.042:

$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 10 --num-steps 500 --c-imag-initial -0.42 --c-imag-increment 0.042

This is great; we can watch the image change from an interesting yellow pattern, to the spiral shapes, and back to a yellow pattern. Most of the animation is spent in the intricate spiral patterns.

Making a plan for what we want to run on a server.

The first thing I’ll do on the server is run the original quick plot and make sure there are no errors. Then I’ll run one of these 10-second animations, which each took about ten minutes on my system with num_processes=2. If we run on a server with 48 cores, this should run on the order of 24 times faster, for a total of about 25 seconds. The extrapolation from 2 to 48 cores isn’t that simple, so I won’t be surprised by anything between 20 seconds and one minute.

So, here are the commands we’ll run on the faster server, except for the --num-processes flag which we’ll finalize later:

If all of these estimates are correct, I’m looking at about an hour of rendering time. With setup time and time to upload and download files, I’m looking at 2-3 hours of server time in all.

Using a powerful cloud server to render animations

Now we’ll get to the fun part: running this code on a much more powerful computer than most of us have at home. Here’s an outline of the steps we’ll take:

What kinds of servers are available?

For this project I’m going to use Linode, although you can use any cloud provider that you’re familiar with. We need a server with a dedicated CPU. We’re going to max out the CPU, and it’s not a good idea to run that kind of a workload on a server with a shared CPU.

For this project I’m going to use a Dedicated CPU server from Linode. Here’s a summary of the options for this kind of server:

Linode dedicated CPU server options.

I’m looking for something that’s not going to be too expensive to use for 2-3 hours, but is also significantly more powerful than my laptop. I’m going to go with the 48-core option, which is $1.08/hr. That should be just a few dollars, and if I somehow forget to destroy the server or don’t do it correctly, it will be about $25/day. That’s better than the ~$125/day I’d accrue if I chose the 64-core option and didn’t destroy the server correctly.

If you’ve never used a cloud server before, you might want to try one of the lowest tiers, and not run any risk of accruing significant charges. It’s not too difficult to destroy a server, but I would not have wanted my first exploration of cloud servers to be on something that costs on the order of $1000/month or more.

If you’ve used a cloud server before but aren’t super familiar with the process, you might want to run through this once on a low-end server, and then run through it again on a higher end server once you’re familiar with the process. This is the approach I took when drafting this post.

Setting up the server

To do this, you’ll need an account with a valid credit card. There are some free cloud servers, but none of them are going to be more powerful than a moderate laptop. The steps you see here largely follow Linode’s documentation for the most basic setup and securing of a server. There’s a lot more that goes into effecttive server management, but this is enough considering that we’re going to destroy the server in just a few hours at most.

To start, log in to Linode and click Create > Linode. Choose the Ubuntu 20.04 LTS distribution, and choose an appropriate region. Under Linode Plan click Dedicated CPU, and select the option you’re interested in. If you want you can add a “Linode Label” for this server. Enter a Root Password, and jot it down or make sure you’ll remember it. If you forget it you can open a console in the Linode dashboard, or you can always destroy the server and start over. When you’ve made all the selections, click Create.

In the next page that appears you can see statistics about this server’s status, its resource usage, and you can see the server’s IP Address. There are also a number of actions you can take such as resizing, rebuilding, backing up, and others. Just to be clear, you can delete the server by clicking Settings > Delete Linode > Delete.

To log in to the server, open a new terminal on your local system. I like to change the background color of a terminal that I’ll use for remote access, so it’s really clear that this terminal is not acting on my local system. In this terminal, enter the following to log in to the server you just created (use the IP address for your server wherever you see xx.xxx.x.xxx):

$ ssh root@xx.xxx.x.xxx

You’ll be asked to verify that you want to connect to the remote server, and then asked to enter the root password. Finally, you’ll have a prompt for the remote server:

$ ssh root@xx.xxx.x.xxx
The authenticity of host 'xx.xxx.x.xxx (xx.xxx.x.xxx)' can't be established.
...
Are you sure you want to continue connecting (yes/no)? yes
root@xx.xxx.x.xxx's password: 
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-48-generic x86_64)
...
root@li433-138:~#

Now we’re going to set up a non-root user, and give this user administrative privileges.

root@li433-138:~# adduser julia
Adding user `julia' ...
Adding new group `julia' (1000) ...
Adding new user `julia' (1000) with group `julia' ...
Creating home directory `/home/julia' ...
Copying files from `/etc/skel' ...
New password: 
Retype new password: 
passwd: password updated successfully
Changing the user information for julia
Enter the new value, or press ENTER for the default
    Full Name []: 
    Room Number []: 
    Work Phone []: 
    Home Phone []: 
    Other []: 
Is the information correct? [Y/n] y
root@li433-138:~# usermod -aG sudo julia
root@li433-138:~#

The adduser command creates a new user, and the usermod command modifies the user by adding them to the sudo group. We can now use sudo to run administrative commands as this user.

Now we’ll configure a basic firewall, and make sure we can continue to log in through SSH:

root@li433-138:~# ufw allow OpenSSH
Rules updated
Rules updated (v6)
root@li433-138:~# ufw enable
Command may disrupt existing ssh connections. Proceed with operation (y|n)? y
Firewall is active and enabled on system startup
root@li433-138:~# ufw status
Status: active

To                         Action      From
--                         ------      ----
OpenSSH                    ALLOW       Anywhere                  
OpenSSH (v6)               ALLOW       Anywhere (v6)             

root@li433-138:~# 

This tells ufw to allow connections using OpenSSH, and then enables the firewall. Checking the status shows that connections through OpenSSH are allowed.

Open a new terminal tab and make sure you can log in as the non-root user:

$ ssh julia@xx.xxx.x.xxx
julia@xx.xxx.x.xxx's password: 
Welcome to Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-48-generic x86_64)
...
julia@li433-138:~$ 

If you have trouble with any of these steps, you can click Launch Console in the dashboard for this Linode, and run any command you need to.

Now that we have logged in as a non-root user, we can exit the root session and close that terminal tab:

root@li433-138:~# exit
logout
Connection to xx.xxx.x.xxx closed.
~$

Now we’ll update the server:

julia@li433-138:~$ sudo apt-get update
[sudo] password for julia: 
Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [107 kB]
...
Fetched 4,412 kB in 7s (595 kB/s)
Reading package lists... Done
julia@li433-138:~$ sudo apt-get dist-upgrade
Reading package lists... Done
...
15 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 8,044 kB of archives.
After this operation, 225 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
...
julia@li433-138:~$ 

On Ubuntu, the output of ls /var/run will tell you whether you need to restart after an update. Run this command, and look for reboot-required. If you see that, run sudo shutdown -r now, which will end your session and restart the server. Wait just a moment, and then log back in.

We need to install ffmpeg:

julia@li433-138:~$ sudo apt-get install ffmpeg

Python 3 is installed by default on Ubuntu 20.04, and it should be a recent enough version to run the code we’ve been working with. Let’s check the version:

julia@li433-138:~$ python3 -V
Python 3.8.5

Python on Ubuntu does not include the venv package, which we’ll use to make a virtual environment for this project. We’ll also need to install some additional resources:

julia@li433-138:~$ sudo apt-get install python3-venv python3-dev gcc

Now we’ll make a directory for our project on the remote server, and change into this directory:

julia@li433-138:~$ mkdir exploring_fractals
julia@li433-138:~$ cd exploring_fractals

Now we need to do a little work on our local system. This back-and-forth is why it’s helpful to have different background colors in your local and remote terminal windows, or some other way to clearly and quickly distinguish between the two. There are some commands you don’t want to accidentally run on the wrong system, especially potentially destructive commands, or commands that install things you don’t want or need locally.

Now we need to copy our local project to the remote server. Before you do this, delete any existing output from the output/ and output/animation_files/ directories. Also, make a new file called exclude.txt in your local project; this will specify which files and directories we want to exclude from copying to the remote server. Here’s what goes in exclude.txt:

.DS_Store

ef_env/
.git/

.gitignore
readme.md
exclude.txt

The .DS_Store entry is a default hidden file that’s created on macOS systems. If you have any other cruft like project configuration files from an IDE, you can add them as well. Including those files shouldn’t cause any issues, but it’s nice to leave them out if they’re not needed on the remote server.

The following command should be run from the exploring_fractals/ directory on your local system:

$ rsync -arv --exclude-from exclude.txt . julia@xx.xxx.x.xxx:/home/julia/exploring_fractals/

The rsync command is really useful for copying files and directories between local and remote systems. It works in both directions, and is highly configurable. This command says to copy all the files and directories in the current directory (.), except what’s listed in exclude.txt, to the directory /home/julia/exploring_fractals/ on the remote server. The -arv flags make the copying recursive, and generates verbose output that tells us exactly what’s being copied. Make sure you don’t leave out the lone dot in the middle of this command; that dot represents the current directory, which is where we’re copying from.

If we go back to the remote server, we can see what was copied there:

julia@li433-138:~/exploring_fractals$ ls -alh
total 24K
drwxr-xr-x 3 julia julia 4.0K Oct  8 01:57 .
drwxr-xr-x 4 julia julia 4.0K Oct  8 04:11 ..
-rw-r--r-- 1 julia julia 5.2K Oct  7 16:32 julia_set_plot.py
drwxr-xr-x 3 julia julia 4.0K Oct  8 01:52 output
-rw-r--r-- 1 julia julia  150 Oct  7 15:17 requirements.txt

Now we can build a virtual environment on the remote server that’s identical to our local environment:

julia@li433-138:~/exploring_fractals$ python3 -m venv ef_env
julia@li433-138:~/exploring_fractals$ source ef_env/bin/activate
(ef_env) julia@li433-138:~/exploring_fractals$ pip install --upgrade pip
(ef_env) julia@li433-138:~/exploring_fractals$ pip install -r requirements.txt 
Collecting certifi==2020.6.20
  ...
Installing collected packages: certifi, six, cycler, kiwisolver, numpy, python-dateutil, Pillow, pyparsing, matplotlib
Successfully installed Pillow-7.2.0 certifi-2020.6.20 cycler-0.10.0 kiwisolver-1.2.0 matplotlib-3.3.2 numpy-1.19.2 pyparsing-2.4.7 python-dateutil-2.8.1 six-1.15.0
(ef_env) julia@li433-138:~/exploring_fractals$ 

Now we should be able to generate an animation on the remote server. We’ll use the default configuration first:

(ef_env) julia@li433-138:~/exploring_fractals$ python julia_set_plot.py 
...
Generated 20 image files.
num_steps: 200
max_iterations: 20
Spent 0.7 seconds generating images.
Spent 0.1 seconds generating animation.
Spent 0.8 seconds in total.

Now we can go back to the local terminal and use rsync to copy the output animation file to our local system. We’ll also make a directory for this remote output:

$ mkdir remote_output
$ rsync julia@xx.xxx.x.xxx:/home/julia/exploring_fractals/output/animation_files/julia_animation.mp4 remote_output/
julia@xx.xxx.x.xxx's password: 
$ ls remote_output/
julia_animation.mp4

Opening this file, we can see it’s the same low-resolution animation we were generating earlier. That’s great; it means everything is working and we can try generating higher-resolution files!

First let’s verify the number of cores on the remote server:

(ef_env) julia@li433-138:~/exploring_fractals$ python
Python 3.8.5 (default, Jul 28 2020, 12:59:40) 
...
>>> import psutil
>>> psutil.cpu_count()
48

If you use the logical=False argument you’ll probably see 1 physical core. This has to do with how cloud servers are architected. If you’re on some kind of dedicated CPU plan, you’ll be able to use the full number of processes you see in this output.

Now we’ll generate the 90-frame animation that took 22.9 seconds on my local system earlier, with 2 processes. We’ll use 48 processes now:

(ef_env) julia@li433-138:~/exploring_fractals$ python julia_set_plot.py --num-steps 500 --num-frames 90 --framerate 30 --num-processes 48
...
Generated 90 image files.
num_steps: 500
max_iterations: 20
Spent 1.4 seconds generating images.
Spent 0.3 seconds generating animation.
Spent 1.7 seconds in total.

This is a 13.5x speedup over what we saw locally.

Now we’ll generate the file that took 10 minutes locally, and see how much faster that is:

(ef_env) julia@li433-138:~/exploring_fractals$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 10 --num-steps 500 --c-imag-initial -0.7 --c-imag-increment 1.4 --num-processes 48
...
Generated 300 image files.
num_steps: 500
max_iterations: 500
Spent 32.3 seconds generating images.
Spent 0.8 seconds generating animation.
Spent 33.1 seconds in total.

At 18.4x, this is an even bigger speedup. As the workload increases, the benefit of a parallel approach increases as well.

Now we’re ready to generate the higher-resolution, longer animations we were looking forward to. Here goes:

(ef_env) julia@li433-138:~/exploring_fractals$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 60 --num-steps 500 --c-imag-initial -0.7 --c-imag-increment 1.4 --num-processes 48
...
Generated 1800 image files.
num_steps: 500
max_iterations: 500
Spent 202.6 seconds generating images.
Spent 3.1 seconds generating animation.
Spent 205.7 seconds in total.

This took just over three minutes, which is what we were expecting. Here’s the video:

This is nice to see. It’s slow to start, but the animation is really smooth. It spends a lot of time in the yellow region, so let’s move on to the next animation.

(ef_env) julia@li433-138:~/exploring_fractals$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 60 --num-steps 1080 --c-imag-initial -0.7 --c-imag-increment 1.4 --num-processes 48
...
Generated 1800 image files.
num_steps: 1080
max_iterations: 500
Spent 943.6 seconds generating images.
Spent 10.1 seconds generating animation.
Spent 953.7 seconds in total.

This took just under 16 minutes, a little over the 12 minutes we were expecting. Here’s the video:

The first time I tried to generate this animation, I forgot to add the --num-processes setting. I couldn’t figure out why it was taking 25+ minutes to run. I ran top, and saw that there were only two processes running at max CPU. I finally realized I had left the flag off, and that it was only using the default two processess. I canceled the command, and re-ran the command with --num-processes 48, and it completed in just under 16 minutes. Knowing some basic Linux terminal commands like top is really helpful in these kinds of situations.

Now let’s generate the animation that focuses on the first half of the sequence.

(ef_env) julia@li433-138:~/exploring_fractals$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 60 --num-steps 1080 --c-imag-initial -0.7 --c-imag-increment 0.7 --num-processes 48
...
Generated 1800 image files.
num_steps: 1080
max_iterations: 500
Spent 983.9 seconds generating images.
Spent 10.4 seconds generating animation.
Spent 994.3 seconds in total.

This took just over 15 minutes as well, which is much nicer than the ~3-5 hours it would probably take on my local system. Here’s the video:

If you haven’t done so yet, try playing this in the pop-out window that lets you see it at full resolution. It’s fun to step through some of the more interesting sequences and find individual frames that you particularly like.

Now we’ll generate the most focused sequence.

(ef_env) julia@li433-138:~/exploring_fractals$ python julia_set_plot.py --max-iterations 500 --framerate 30 --duration 60 --num-steps 1080 --c-imag-initial -0.42 --c-imag-increment 0.042 --num-processes 48
...
Generated 1800 image files.
num_steps: 1080
max_iterations: 500
Spent 842.3 seconds generating images.
Spent 9.8 seconds generating animation.
Spent 852.1 seconds in total.

Here’s the video:

This is a bit slow to watch, but it’s also interesting to see the detailed transition through the blue-green phase of the animation.

This hasn’t taken too long, so let’s go ahead and generate a 60fps version.

(ef_env) julia@li433-138:~/exploring_fractals$ python julia_set_plot.py --max-iterations 500 --framerate 60 --duration 60 --num-steps 1080 --c-imag-initial -0.42 --c-imag-increment 0.042 --num-processes 48
...
Generated 3600 image files.
num_steps: 1080
max_iterations: 500
Spent 1635.8 seconds generating images.
Spent 18.8 seconds generating animation.
Spent 1654.5 seconds in total.

This took just under half an hour, or $0.50 on this 48-core server. Here’s the video:

It doesn’t look any different than the previous animation, but if you’re looking for individual frames to focus on, for example while searching for an interesting single image to generate, this is a useful animation.

Destroying the server

Once you’ve finished generating all the animations you want, make sure to destroy the server. You can do this on the Settings tab for your server. If you have closed out of this page, go to the Linode cloud dashboard, and click Linodes. Click the server you’ve been using, and click Settings > Delete Linode > Delete.

I was able to run all the animations for this blog post for under $6, using a low-end server for the first run through while drafting the steps and then using a 48-core server to do the final renders.

Unless you’ve worked with remote servers often, it’s probably a good idea to log in over the next couple days if you used a high-end server, and make sure you aren’t accruing significant charges. It’s easy to accidentally leave a server running, and for small servers that’s not a big deal. For larger servers, that can be a more costly mistake than you want to make. Log in and look at the Account tab on your dashboard, and make sure the Uninvoiced Charges line item is not increasing.

Note: Exiting your local terminal session or shutting down your server does not destroy it! If all you do is issue a shutdown command, your server still exists and you will continue to accrue charges. The only way to stop accruing charges is to fully delete the server.

Conclusions

While there’s a lot more that goes into running high-load processes on cloud servers, this was a fun way to explore the basic concepts involved. If you have a task that can be spread out over multiple processes, uploading your project to a cloud server for an hour or two can be quite worthwhile in some situations, especially once you’re familiar with the setup and teardown process. This is a great way to become familiar with using remote servers of varying sizes.

Feedback

I don’t have comments enabled at this time, but if you have any feedback feel free to reach out. I will correct any mistakes or misunderstandings on my part that are pointed out.