Efficiency

1) Introduction §

What makes for a good program? One of the first things that people may think of is program correctness. In most programming courses, style and readability are also emphasized. Correct and readable programs are not very useful, however, if they do not run in a reasonable amount of time. This is where program efficiency comes into play.

Print-statement debugging and thoughtful test cases can help programmers fix correctness issues, and tools like Pylint and Black can find and fix many style issues. Efficiency bugs , which are pieces of code that lead to extra work or memory which slow a program down without affecting correctness, can be more difficult to pinpoint. In 6.101, efficiency bugs often show up as a timed-out test case on the submission server, but may also appear as a program that takes a long time to run on your own machine -- or even never stops running!

Efficiency bugs may be difficult to locate, but learning about common efficiency bugs is a good first step towards finding and fixing them. And, just like program correctness bugs and style issues, efficiency bugs become easier to find and fix with practice.

2) Categorizing Efficiency Bugs §

What constitutes an efficiency bug is often dependent on the characteristics of the current problem. For example, many small problems do not require a highly optimized solution; it is fine to sort through a small list of numbers with a slow selection sort, as long as the list will always be small. Efficiency becomes more important as the size or frequency of the problem increases. The efficiency of sorting algorithms used by search engines has important implications for global energy usage, for example.

When approaching a new problem, it can be helpful to focus initially on correctness for small, simple cases first, because these programs are usually easier to implement and debug. It takes time and effort to optimize a program, so it's best to start with a simple, working version, and then optimize it to make it more efficient as needed.

The root causes of efficiency bugs are often similar across problems. For the purposes of 6.101, this reading focuses on time-related efficiency bugs, across three general categories:

Superfluous Computation : Code that can be deleted, or computation that can be repeated fewer times, without affecting correct functionality.
Suboptimal Data Structure Design : Creating or using a data structure that has slow operations, when a more efficient data structure exists.
Suboptimal Algorithm Design : Not taking advantage of problem characteristics that would enable a significantly more efficient algorithm.

2.1) Superfluous Computation §

Superfluous computations can be fixed with small modifications to code that eliminate unnecessary or duplicate computation. The 6.101 course staff have noticed that the first two bugs in this category — failing to cache and failing to short circuit — are the most common efficiency bugs in student code.

2.1.1) Failing to Cache §

Definition : Not storing or not using data, resulting in repeated computation.

The term caching means temporarily storing a data value that was expensive to obtain, with the expectation of reusing it later in a program. In the code below, the result of calling slow_call on y is cached in the variable y_result . Notice, however, that the cached value stored in y_result is never used -- instead, slow_call(y) is just called repeatedly.

y_result = slow_call(y)
if slow_call(x) < 0:
    foo(slow_call(x), slow_call(y))
else:
    bar(slow_call(x), slow_call(y))

The following code is an improvement, using the cached value stored in y_result rather than recalculating slow_call(y) . However, there still remains a failure to cache in the code below.

y_result = slow_call(y)
if slow_call(x) < 0:
    foo(slow_call(x), y_result)
else:
    bar(slow_call(x), y_result)

The following code caches the result of slow_call(x) in the variable x_result , eliminating the redundant calls to slow_call(x) .

y_result = slow_call(y)
x_result = slow_call(x)
if x_result < 0:
    foo(x_result, y_result)
else:
    bar(x_result, y_result)

Keep an eye out for ways to avoid calling functions repeatedly with the same inputs in your own code!

2.1.2) Failing to Short Circuit §

Definition : Continuing a computation after the answer has already been determined, or ordering computation in a way that prevents exiting sooner, when it is possible to exit sooner.

The following function determines if a given list of words from a book has any spelling errors.

def book_has_error(book_words):
    has_error = False
    for word in book_words:
        if spelling_error(word):
            has_error = True
    return has_error

If there is an error in a word early in the book, this function will still keep running until the end of the book, long after the return value of the function has been determined. This is a failure to short circuit. One way to fix the code could be to add a break statement after the return value is determined, to prevent the continued execution of the loop.

def book_has_error(book_words):
    has_error = False
    for word in book_words:
        if spelling_error(word):
            has_error = True
            break
    return has_error

Another way to fix the efficiency bug is by returning as soon as spelling_error(word) evaluates to True .

def book_has_error(book_words):
    for word in book_words:
        if spelling_error(word):
            return True
    return False

The term short circuiting is also used in programming to refer to the short-circuit evaluation of conditional expressions. The Python interpreter stops evaluating boolean expressions as soon as it determines the value of the expression.

Check Yourself:

What happens when you execute the following block of code?

if not (1 == 2 and ("5" + 5) == 10):
    print("I am printed")

This program outputs:

I am printed

This may be surprising because evaluating "5" + 5 would normally cause a TypeError . However, because 1 == 2 evaluates to False , and False and something_else will always evaluate to False , the Python interpreter short-circuits and does not evaluate the rest of the conditional statement. Because "5" + 5 is never executed, the TypeError never occurs!

Check Yourself:

Ordering the boolean-valued functions in a boolean expression strategically can often lead to program speedups. Which of the following code blocks do you think will generally run faster on random integer inputs?

def even_with_n_prime_factors(x, n):
    return len(get_prime_factors(x)) == n and x % 2 == 0

def even_with_n_prime_factors(x, n):
    return x % 2 == 0 and len(get_prime_factors(x)) == n

The second code block will generally perform better on a random integer. A random integer is odd about half the time, and in those cases, the slower


         get_prime_factors

function will not be called. In contrast, the first implementation will call


         get_prime_factors

on every integer.

2.1.3) Executing Dead Code §

Definition : Including code that can be removed with very slight changes, when removing the code would increase efficiency without modifying correct functionality.

Strategically placed print statements are a great debugging tool. However, when print statements are left in debugged code, they can cause significant code slowdown.

Check Yourself:

How much more time do you think it will take to run the second code block than the first?

total = 0
for i in range(1_000_000):
    total += i

total = 0
for i in range(1_000_000):
    total += i
    print(i, total)

Using the Python time module, we can find out!

import time
start_time = time.time()

total = 0
for i in range(1_000_000):
    total += i

end_time = time.time()
total_time = end_time - start_time
print(f"{total_time=}")

Running the following code produces the output total_time=0.09513139724731445 , which means it ran in under .1 seconds.

start_time = time.time()

total = 0
for i in range(1_000_000):
    total += i
    print(i, total)

end_time = time.time()
total_time = end_time - start_time
print(f"{total_time=}")

This code, however, prints out total_time=90.36704635620117 . While the total_time you get on your computer may differ, it is safe to say that adding the print statement makes the code well over a hundred times slower!

Not all instances of dead code are as straightforward to find as stray print statements. For example, if you look closely at the get_nested_items function below you will notice that the inner helper function does a lot of duplicate work.

def get_nested_items(nested_list):
    """
    Given a nested list of numbers, return a set of all the numbers that it contains.

    input: [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
    output: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
    """
    items = set()

    def helper(nested_list):
        for item in nested_list:
            if isinstance(item, list):
	        # Recurse on sublist
                helper(item)

		# Add items of sublist
                for sub_item in item:
                    if isinstance(sub_item, list):
                        helper(sub_item)
                    else:
                        items.add(sub_item)
            else:
                items.add(item)

    helper(nested_list)
    return(items)

The second version of the function, shown below, produces an identical result:

def get_nested_items(nested_list):
    """
    Given a nested list of numbers, return a set of all the numbers that it contains.

    input: [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
    output: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
    """
    items = set()

    def helper(nested_list):
        for item in nested_list:
            if isinstance(item, list):
	    	# Recurse on sublist.
		# After this, there's no need to iterate!
                helper(item)
            else:
                items.add(item)

    helper(nested_list)
    return(items)

Check Yourself:

How much extra work does the first version of get_nested_items perform?

Changing items from a set to a list makes it clear how much extra work the initial code does.

def get_nested_items(nested_list):
    """
    input: [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
    (expected) output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    """
    items = []

    def helper(nested_list):
        for item in nested_list:
            if isinstance(item, list):
                helper(item)
                for sub_item in item:
                    if isinstance(sub_item, list):
                        helper(sub_item)
                    else:
                        items.append(sub_item)
            else:
                items.append(item)

    helper(nested_list)
    return(items)

input = [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
print(get_nested_items(input))

def get_nested_items(nested_list):
    """
    input: [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
    output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    """
    items = []

    def helper(nested_list):
        for item in nested_list:
            if isinstance(item, list):
                helper(item)
            else:
                items.append(item)

    helper(nested_list)
    return(items)

input = [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
print(get_nested_items(input))

The first code block outputs [1, 2, 3, 4, 5, 6, 7, 5, 6, 7, 3, 4, 5, 6, 7, 8, 9, 10, 9, 10] whereas the second version outputs [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] , as desired.

It is often tempting when writing a recursive function to include an extra layer of explicit computation beyond the necessary base cases and recursive cases, but this can lead to the creation of dead code.

2.1.4) Extra Copying §

Definition : Creating shallow or deep copies of data that either will not be modified, or that could safely be modified without copying.

Unexpected modification of a mutable object (called an aliasing bug ) is a common source of correctness bugs. For example, in the following function, modifying a list while iterating through it results in an infinite loop:

def duplicate_even_elts(my_list):
    """
    Modify my_list to create adjacent duplicates of even numbers.
    input: [1, 2, 3, 4, 5, 6]
    (expected) output: [1, 2, 2, 3, 4, 4, 5, 6, 6]
    """
    index = 0

    for elt in my_list:
        if elt % 2 == 0:
            my_list.insert(index, elt)
            index += 2
        else:
            index += 1

    return my_list

This issue can be quickly fixed by making a copy of my_list to iterate through, as shown in the code block below (which produces the correct output):

def duplicate_even_elts(my_list):
    """
    Modify my_list to create adjacent duplicates of even numbers.
    input: [1, 2, 3, 4, 5, 6]
    output: [1, 2, 2, 3, 4, 4, 5, 6, 6]
    """
    index = 0

    for elt in my_list.copy():
        if elt % 2 == 0:
            my_list.insert(index, elt)
            index += 2
        else:
            index += 1

    return my_list

Check Yourself:

How can you fix duplicate_even_elts without creating a copy of my_list ?

This is one example where a while loop is useful!

def duplicate_even_elts(my_list):
    """
    Modify my_list to create adjacent duplicates of even numbers.
    input: [1, 2, 3, 4, 5, 6]
    output: [1, 2, 2, 3, 4, 4, 5, 6, 6]
    """
    index = 0
    while index < len(my_list):
        elt = my_list[index]
        if elt % 2 == 0:
            my_list.insert(index, elt)
            index += 2
        else:
            index += 1

    return my_list

Due to the common occurrence of aliasing bugs, many programmers have a tendency to create copies of data structures, even when there is no risk of aliasing bugs. In the case below, the programmer creates a copy before iterating over it, which results in unnecessary computation:

def list_special_sum(my_list):
    """
    Sums list values, doubling the values of even elements

    input: [1, 2, 3, 4, 5, 6]
    output: 33
    """
    total = 0

    for elt in my_list.copy(): # unnecessary copy -- we can use my_list without copying
        if elt % 2 == 0:
            total += 2 * elt
        else:
            total += elt

    return total

Check Yourself:

Other instances of extra copying are less explicit. Can you spot the extra copying in the code below?

def make_full_names(name_tuples):
    """
    name_tuples is a list of tuples in the form of
    [("Kevin", "Bacon"), ("Tim, "Beaver"), ...]

    Returns a list of strings in the form of ["Kevin Bacon", "Tim Beaver", ... ]
    """
    full_names = []

    for first, last in name_tuples:
        full_names = full_names + [str(first) + " " + str(last)]

    return full_names

In this case, first and last are already of type str , so calling str() results in the creation of extra copies of first and last . This extra-copying bug is fixed in the following code:

def make_full_names(name_tuples):
    """
    name_tuples is a list of tuples in the form of
    [("Kevin", "Bacon"), ("Tim, "Beaver"), ...]

    Returns a list of strings in the form of ["Kevin Bacon", "Tim Beaver", ... ]
    """
    full_names = []

    for first, last in name_tuples:
        full_names = full_names + [first + " " + last]

    return full_names

However, even after removing the extra copying of strings, there is still another instance of extra copying in the code! Another type of extra copying to watch out for results from creating a new list during concatenation, rather than adding values to an existing list through mutation.

The following example illustrates this difference:

def concat_copy(list1, list2):
    list1 = list1 + list2
    return list1

def concat_mutate(list1, list2):
    list1 += list2
    return list1

a = [1, 2, 3]
b = [4, [5]]
print(f"output: {a=} {b=}")
# output: a=[1, 2, 3] b=[4, [5]]

copy_result = concat_copy(a, b)
print(f"output: {a=} {b=} {copy_result=}")
# output: a=[1, 2, 3] b=[4, [5]] copy_result=[1, 2, 3, 4, [5]]

mutate_result = concat_mutate(a, b)
print(f"{a=} {b=} {mutate_result=}")
# output: a=[1, 2, 3, 4, [5]] b=[4, [5]] mutate_result=[1, 2, 3, 4, [5]]

During list concatenation list1 = list1 + list2 , a new list is created that makes a shallow copy of all of the elements in the two lists, that is then stored in list1 . If list1 is very long and list2 is short, a lot of extra time will be spent copying the references to all the elements in list1 .

On the other hand, during list1 += list2 , list1 is extended to contain references to all of the elements in the second list. More details about these operations can be found in the environment model reading .

Check Yourself:

What will the code below print after executing the code above?

a[0] -= 1
b[1][0] += 10

print(f"{a=}, {b=}")
print(f"{copy_result=}, {mutate_result=}")

The final two print statements will output:

a=[0, 2, 3, 4, [15]], b=[4, [15]]
copy_result=[1, 2, 3, 4, [15]], mutate_result=[0, 2, 3, 4, [15]]

If this is surprising, try drawing an environment diagram!

Check Yourself:

How would you resolve the last extra copying bug in make_full_names ?

def make_full_names(name_tuples):
    """
    name_tuples is a list of tuples in the form of
    [("Kevin", "Bacon"), ("Tim, "Beaver"), ...]

    Returns a list of strings in the form of ["Kevin Bacon", "Tim Beaver", ... ]
    """
    full_names = []

    for first, last in name_tuples:
        full_names += [first + " " + last]
        # or full_names.extend([first + " " + last])
        # or full_names.append(first + " " + last)

    return full_names

2.1.5) Failing to Hoist §

Definition : Performing a computation at every recursive step or loop iteration, when a single computation external to the loop or recursive structure would suffice.

It may be best to demonstrate failing to hoist with an example. In the code below, the is_palindrome function checks the type of the input word during each recursive call, even though whether or not word is a str does not change between calls.

def is_palindrome(word):
    if not isinstance(word, str):
        raise TypeError("Input must be a str")
    if not word:
        return True
    return word[0] == word[-1] and is_palindrome(word[1:-1])

The repeated type checking can be hoisted out of the recursive structure by using a helper function, as shown below:

def is_palindrome(word):
    if not isinstance(word, str):
        raise TypeError("Input must be a str")

    def helper(word):
        if not word:
            return True
        return word[0] == word[-1] and helper(word[1:-1])

    return helper(word)

2.1.6) Extra Loop / Recursive Overhead §

Definition : Implementing multiple steps of an algorithm using similar loops or recursive operations, when those steps could be combined more efficiently into one loop or recursive operation.

The check_name_formats function below checks for the first and last names in separate for loops:

def check_name_formats(name_tuples):
    """
    name_tuples is a list of tuples in the form of
    [("Kevin", "Bacon"), ("Tim, "Beaver"), ...]

    Returns whether the first letter of each first and last name
    is capitalized and that the remaining letters are lowercase
    """

    for first, last in name_tuples:
        if not str.isupper(first[0]) or not str.islower(first[1:]):
            return False

    for first, last in name_tuples:
        if not str.isupper(last[0]):
            return False

    return True

This not only costs more in loop overhead (since the list is iterated twice), but here also means failing to short circuit if all the first names are properly formatted but some last name is not.

The code below improves on the previous version by combining the two separate loops for the first and last names into a single for loop.

def check_name_formats(name_tuples):
    """
    Name tuples is a list of tuples in the form
    [("Kevin", "Bacon"), ("Tim, "Beaver"), ...]

    Returns whether the first letter of each first and last name
    is capitalized and that the remaining letters are lowercase
    """

    for first, last in name_tuples:
        if not str.isupper(first[0]) or not str.islower(first[1:]):
            return False
        if not str.isupper(last[0]):
            return False

    return True

2.2) Suboptimal Data Structure Design §

Optimally structuring data often depends both on the information that is being stored and the tasks that it is being used for. Designing efficient data structures requires careful planning.

2.2.1) Slow Data Structure §

Definition : Poor choice of data structure resulting in inefficient operations.

When choosing a data structure for a program, it helps to think about the most common operations the data will be used for. For example, if frequent containment checks will be necessary, a set or dict may be most effective, but if data is going to be added to a collection in order, a list might be most appropriate.

For example, the following function returns the unique words from a collection of words.

def get_unique_words(words):
    unique_words = []
    for word in words:
        if word not in unique_words:
            unique_words.append(word)
    return unique_words

As discussed in the Flood Fill reading , the convenience of the Python in keyword disguises the slow list containment check operation. Using a set for unique_words leads to faster containment checks, so it is better suited to this problem:

def get_unique_words(words):
    unique_words = set()
    for word in words:
        unique_words.add(word)
    return unique_words

Conveniently, the get_unique_words function can be written much more simply as follows:

def get_unique_words(words):
    return set(words)

The 6.101 course staff have noticed that using a list or tuple where using a set would lead to more efficient operations is one of the most common causes of slow data structure efficiency bugs.

Sometimes, the provided data isn’t formatted well for solving a given problem. Consider the following code:

color_preferences = [
    ("Jaeyun", "blue"),
    ("Paola", "blue"),
    ("Nayra", "violet"),
    ("Julia", "green"),
    ("Mayu", "blue"),
    ("Jack", "pink"),
    ("Ximeng", "green"),
    ("Shreya", "red"),
    ("Kareem", "blue"),
]

unique_colors = set([color for _,color in color_preferences])

for color in unique_colors:
    names = [name for name, color_pref in color_preferences if color_pref == color]
    print(f"These people like {color}: {names}")

In order to go from the color_preferences list to the names of people with each color preference, the code above repeatedly loops through color_preferences for each color. While this does not pose a big problem for a small list of color_preferences , if the list grew to hundreds of thousands or millions of people, this would start to pose a greater issue.

Check Yourself:

What kind of data structure would allow us to more efficiently display people's color preferences?

In this case, a dictionary mapping a color to a list of people who prefer that color will do the trick:

color_preferences = [
    ("Jaeyun", "blue"),
    ("Paola", "blue"),
    ("Nayra", "violet"),
    ("Julia", "green"),
    ("Mayu", "blue"),
    ("Jack", "pink"),
    ("Ximeng", "green"),
    ("Shreya", "red"),
    ("Kareem", "blue"),
]

color_preference_dict = {}

for name, color in color_preferences:
    if color not in color_preference_dict:
        color_preference_dict[color] = []
    color_preference_dict[color].append(name)

for color in color_preference_dict:
    print(f"These people like {color}: {color_preference_dict[color]}")

Adding additional computation to store the color preferences in a dict eliminates all but one iteration through the original color_preferences list.

2.2.2) Expensive Data Transformation §

Definition : The amount of time taken to create a data structure is significantly greater than the amount of time saved by using this new structure.

There is a tradeoff between time spent transforming data and the time spent interacting with data. As shown in the previous section, effective data transformations may take some extra initial computation, but can greatly improve the overall efficiency of a program, especially when common operations will be repeated many times. In some cases, however, the time taken to transform data outweighs the efficiency gains from the transformation.

The code below precomputes the positive factors of a million integers. In cases where these factors were used many times, this could be useful, but in this case, where a human is entering in the input manually, it is likely that precomputing a million answers will lead to a lot of wasted computation.

def get_positive_factors(x):
    factors = set()
    for num in range(1, int(x ** 0.5) + 1):
        if x % num == 0:
            factors.add(num)
            factors.add(x // num)
    return factors

factor_map = {num : get_positive_factors(num) for num in range(1, 1_000_001)}

while True:
    try:
        x = int(input("Enter a positive integer between 1 and 1,000,000: "))
    except ValueError:
        break

    if not 1 <= x <= 1_000_000:
        print("The integer must be between 1 and 1,000,000 \n")
        continue
    print(f"The factors of {x} are: {', '.join(str(num) for num in factor_map[x])}\n")

Check Yourself:

How would you restructure the code to avoid the expensive data transformation?

One way to fix this is to compute and cache the factors only as needed as shown in this code:

def get_positive_factors(x):
    factors = set()
    for num in range(1, int(x ** 0.5) + 1):
        if x % num == 0:
            factors.add(num)
            factors.add(x // num)
    return factors

factor_map = {}

while True:
    try:
        x = int(input("Enter a positive integer between 1 and 1,000,000: "))
    except ValueError:
        break

    if not 1 <= x <= 1_000_000:
        print("The integer must be between 1 and 1,000,000 \n")
        continue
    if x not in factor_map:
        factor_map[x] = get_positive_factors(x)
    print(f"The factors of {x} are: {', '.join(str(num) for num in factor_map[x])}\n")

Notice that we are still using factor_map as a cache , to avoid recomputing the factors of a number we've already seen. But getting rid of factor_map and computing the factors each time a number is entered would also work, assuming that a human user isn't likely to ask for the same numbers over and over again.

If you are unsure whether a data transformation is too expensive, you can use time.time() to determine how long the transformation takes and how long the actual operation takes.

2.3) Suboptimal Algorithm Design §

Different solutions to the same problem may be appropriate for different contexts. As mentioned in the introduction, a slow sorting algorithm may work well for small cases, but more efficient sorting algorithms become necessary as the data size scales up. Whereas superfluous computations tend to be low-level efficiency bugs that can be solved with small adjustments to existing code, suboptimal algorithm design may require making more substantial changes to the code as a result of changing the higher-level algorithm.

The first three subcategories of suboptimal algorithm design can be fixed with algorithm optimizations or more effective use of problem constraints. The last subcategory is more language specific: reimplementing builtins reduces performance for Python programs and may result in slowdowns in other languages as well, depending on their implementation.

2.3.1) Solving Irrelevant Subproblems §

Definition : A brute-force approach involving solving unnecessary subproblems before checking constraints.

Consider the get_valid_moves function that finds the valid_moves from a current_loc , avoiding obstacles.

def get_all_locations(board):
    for r in range(board["rows"]):
        for c in range(board["cols"]):
            yield (r, c)

def get_valid_moves(board, current_loc):
    """
    Gets valid_moves from current_loc
    A valid move must be in bounds and adjacent horizontally
    or vertically to the current_loc and not be on an obstacle

    inputs:

    board = {
        "rows" : 5,
        "cols" : 6,
        "obstacles" : {(0, 0), (3, 0), (2, 5)}, # (row, col)
    }
    current_loc = (1, 5)

    output: {(1, 4), (0, 5)}
    """
    valid_moves = set()
    for r, c in get_all_locations(board):
        if (r, c) in board["obstacles"]:
            continue
        # move must be directly above or to the side of current_loc
        distance = abs(r - current_loc[0]) + abs(c - current_loc[1])
        if distance == 1:
            valid_moves.add((r, c))
    return valid_moves

In the code above, generating and looping through all the locations before checking the constraint of the move being directly adjacent to the current_loc is an example of unnecessary brute force. The moves that are not directly adjacent horizontally or vertically to current_loc do not need to be explored at all, so generating and checking each of these distant locations is irrelevant.

Check Yourself:

How would you refactor get_valid_moves ?

The approach shown here uses the constraint that valid_moves must be directly adjacent to the current_loc in order to only explore viable neighboring moves.

def get_valid_moves(board, current_loc):
    valid_moves = set()

    for row_delta, col_delta in ((1, 0), (-1, 0), (0, 1), (0, -1)):
        r = current_loc[0] + row_delta
        c = current_loc[1] + col_delta
        if not ( 0 <= r < board["rows"] ):
            continue
        if not ( 0 <= c < board["cols"] ):
            continue
        move = (r,c)
        if move in board["obstacles"]:
            continue
        valid_moves.add(move)

    return valid_moves

However, the utility of writing brute force approaches to problem solving should not be fully discounted. It is often helpful to write a correct solution to a problem (even if it is slow) before figuring out a more efficient solution as needed. Sometimes, this means writing a preliminary brute force approach and then applying problem constraints to a faster future version.

2.3.2) Misordering Subproblems §

Definition : Failing to explore subproblems according to a valid heuristic, when doing so would lead to performance gains.

Typically, a heuristic technique involves a ranking of items to explore, such that items which are more likely to lead to a solution are ranked higher, and explored first. For example, the popular word game Wordle involves guessing a five-letter word in as few guesses as possible. Each time you guess a word, you gain information about which letters are correct, which letters are in the word but in the wrong location, and which letters are incorrect. A good heuristic to help solve the subproblem of unknown letters within the word might involve selecting a word that has distinct letters that haven't been used in previous guesses but are commonly found in five-letter words. For example, a word like "raise" may be a good first guess because it contains five distinct common letters, while "esses" may be less wise because it contains only two distinct letters.

More concretely, in the following example, suppose shortest_to_longest is a list of the words found in Mary Shelly's Frankenstein sorted in ascending order by length. The goal of the code is to identify the longest word in Frankenstein starting with the letter 'z'.

# explore words from shortest to longest by length
for word in shortest_to_longest:
    if word[0] == "z":
        longest_z_word = word

print(longest_z_word)

This code has the subproblem of finding the words in the book that start with the desired letter "z". However, exploring the words from shortest to longest means we are exploring some of the least likely words first. If instead we used a heuristic like exploring the longest words first, we could potentially avoid looking through all the words, because the first word found starting with 'z' would also be the longest word starting with 'z'.

# explore words from longest to shortest by length
for word in reversed(shortest_to_longest):
    if word[0] == "z":
        longest_z_word = word
        break

print(longest_z_word)

However, if there are no words starting with 'z' in all of Frankenstein , both solutions would have to check all the words. (This is not the case; Frankenstein uses the word "zeal" four times.) While iterating through words from longest to shortest will not lead to performance gains for all books or individual letters, for most general cases, this change will significantly improve efficiency.

2.3.3) Repeatedly Solving Subproblems §

Definition : Re-exploring parts of the problem space.

Consider the problem of finding all the pairwise sums of numbers in a list called nums :

all_pairwise_sums = set()

for num1 in nums:
    for num2 in nums:
        all_pairwise_sums.add(num1 + num2)

This code is doing about twice as much work as it has to, because addition is commutative. Suppose that nums = [1, 2, 4, 7, 13] . Then, the code above will add both 2 + 13 and 13 + 2 to all_pairwise_sums . By adjusting the index of the first num2 element, this duplicate computation can be eliminated:

all_pairwise_sums = set()

for i in range(len(nums)):
    for j in range(i, len(nums)):
        all_pairwise_sums.add(nums[i] + nums[j])

However, sometimes we need to keep track of which subproblems we have solved in order to avoid re-solving them. This is the purpose of using a visited set, as discussed in the Flood Fill reading . Later on, the Functional Programming reading discusses the concept of memoization , which is when a function caches the answers to inputs to avoid re-computing the answer when it comes across an input it has seen previously. For the purposes of 6.101, memoization is a useful technique but is not emphasized in the labs.

2.3.4) Reimplementing Builtins §

Definition : Implementing an algorithm instead of using a more efficient builtin.

Because CPython, the default Python interpreter, implements Python builtin functions such as sum or max in the C language (which is much faster), using builtins rather than reimplementing equivalent functionality in Python usually leads to faster code.

The code below finds the maximum value in a list of numbers.


def max_num(nums):
    if not nums:
        raise ValueError("nums argument is empty")

    best = nums[0]
    for num in nums:
        if num > best:
            best = num
    return best

The max_num function has nearly identical behavior to the builtin max function. Using the Python time module, the time difference between the builtin and reimplemented max function can be seen.

from random import randint
import time


nums = [randint(1, 1000) for _ in range(1000)]

def max_num(nums):
    if not nums:
        raise ValueError("max_num() nums argument is empty")

    best = nums[0]
    for num in nums:
        if num > best:
            best = num
    return best


start_time = time.time()

for i in range(100_000):
    x = max_num(nums)

end_time = time.time()
total_time_reimplemented = end_time - start_time
print(f"{total_time_reimplemented=}")


start_time = time.time()

for i in range(100_000):
    x = max(nums)

end_time = time.time()
total_time_builtin = end_time - start_time
print(f"{total_time_builtin=}")

Running the code above produces the following:

total_time_reimplemented=2.5645499229431152
total_time_builtin=0.8306694030761719

Reimplementing builtins can be a great way to learn how to program, but if efficiency is important for the task at hand, using builtins is the way to go.

3) Finding and Fixing Efficiency Bugs §

The previous section spent a lot of time discussing ways common efficiency bugs arise and how to fix them. Even with this knowledge, the art of optimizing and speeding up code can be a very complex task and can even be more difficult than getting the code correct in the first place, so try not to feel discouraged when it proves difficult to speed things up. However, approaching efficiency bugs in a systematic way can help you find and fix them, especially when you have a large program with many interconnected pieces. The following steps can help make the efficiency bug finding and fixing process more systematic:

Time your code
Make a list of potential slow functions or blocks of code
Make a list of efficiency bug hypotheses and possible fixes within these places
Select your best hypothesis and implement the fix
Time your code again (and note the difference!)

These steps are explained in greater detail in the following sections.

3.1) Time your code §

The first step to timing your code is to determine what exactly you are timing. In the context of 6.101 labs, this is generally going to be a specific test case that is already passing, but runs slowly or it might be a test case that runs seemingly forever (despite passing the tests for simpler inputs.)

As discussed in the Command Line reading running pytest -k TEST_NAME test.py will allow you to select a specific test case to run. Adding the -s flag (i.e., pytest -k TEST_NAME test.py -s ) will allow you to see the print statements for a test case as it is running.

Once you have isolated a specific test case, put time.time() and print statements in your code to measure the parts you want to learn about. The Flood Fill reading demonstrates this, and another example is found earlier in this reading.

It is generally a good idea to run the program a few times to get a sense of how long the program actually takes. Remember to write the time down so that you can compare it to later versions of your program to see whether there was any improvement.

In general, the submission server is going to give you more consistent timing data than your computer, but it also has timeouts. If you hit the timeout you don't know exactly how long your code would take to finish running, so do not use the timeout limit as your time. Instead, it may be useful to note how many test cases passed within the time limit, or what test case was running when the timeout occurred. Also different machines run code faster or slower so when you time your code, make sure to use the same computer (control that variable in your experiment!)

NOTE about


        time.time()

Throughout some of the explanations above, we have used Python's time.time() function to measure the runtime of various pieces of code.

While this will be useful as a rough measure of runtime, note that using a single real-time runtime to determine a program's overall performance can be very misleading. This is because the real-time runtime of a piece of code can vary wildly, especially for small programs, as some machines are simply faster than others. And factors beyond your control can cause varied runtimes even on the same machine running the same code repeatedly. (For example, if the operating system is running many programs in the background, the Python program may run more slowly.)

To minimize the impact of this variance, it is a good idea to run the code multiple times and take an average of the trials. This is why for many of the small examples above, we calculated the runtime by repeating the operation thousands of times. As you make changes to your code running more repeated trials will allow you to determine whether your code's average runtime has substantially decreased or increased.

There are other reliable ways of evaluating the efficiency of programs that don't involve measuring the code's runtime, but 6.101 does not focus on those methods. ¹

3.2) Make a list of potential slow functions §

After timing your code, inspect the program you are running. What functions does the test case call, and do those functions use any helper functions? From this information, you can start to guess at what functions or places within the functions are taking the most time.

If you are not certain about which function is causing the slowdown you can collect more time data on individual functions. While pytest will report how much time it took to run the test overall, you may also want to use print statements and the time module as shown in the previous examples to get more specific timing details about specific functions that you think might be slow.

After you have at least one or two places in mind that may be sources of efficiency bugs, move on to the next step.

3.3) Make a list of efficiency bug hypotheses §

For each piece of slow code, make hypotheses for what specific efficiency bugs may be causing the slowdown. Start with superfluous computation and data transformations, because those tend to be easier to spot and fix. If you can rule those out, it may be time to think about your larger algorithm and why it might be suboptimal. For each efficiency bug hypothesis, think about what it would take to fix this bug.

3.4) Select your best hypothesis and implement the fix §

Take your best potential efficiency bug (either the one that is the easiest to fix or the one you think will lead to the biggest performance improvement) and implement it. It is usually a good idea to save the original code in a separate file, in case you need to go back to this version of the code.

3.5) Time your code again (and note the difference) §

Finally, determine if the change made things faster or slower by timing your code again on the same tests on the same machine. Note the difference between this time and your original code time.

Big efficiency bug fixes should be apparent on the first run (generally improvements that make ≥ 1 second or ≥ 10% improvements). Small improvements may require a few measurements to see if it truly made a difference.

If the code turns out to be slower instead (which happens sometimes!), then revert to the previous version and go back to selecting a new hypothesis.

If the code still passes the test and is faster (hooray!), determine whether you need to continue making the code more efficient by submitting to the server. If the code still needs to run faster, go back to collecting more time data and selecting a different function or a different hypothesis. Note that the same piece of code may contain multiple different kinds of efficiency bugs!

4) Conclusion §

This reading has described some of the most common efficiency bugs that 6.101 course staff have observed in student code across three general categories of superfluous computation, suboptimal data structure, and suboptimal algorithm. Being aware of these categories will aid you as you systematically time, identify, and fix potential sources of efficiency bugs both in 6.101 and in the future programs that you work with. ²

Footnotes

¹ However, 6.121 (Introduction to Algorithms) and other theory courses cover this topic in much more detail! (click to return to text)

² For those interested in learning more about efficiency, 6.106 Performance Engineering provides an in depth journey into the rabbit hole of speeding up code. (click to return to text)

Efficiency

Table of Contents

1) Introduction §

2) Categorizing Efficiency Bugs §

2.1) Superfluous Computation §

2.1.1) Failing to Cache §

2.1.2) Failing to Short Circuit §

2.1.3) Executing Dead Code §

2.1.4) Extra Copying §

2.1.5) Failing to Hoist §

2.1.6) Extra Loop / Recursive Overhead §

2.2) Suboptimal Data Structure Design §

2.2.1) Slow Data Structure §

2.2.2) Expensive Data Transformation §

2.3) Suboptimal Algorithm Design §

2.3.1) Solving Irrelevant Subproblems §

2.3.2) Misordering Subproblems §

2.3.3) Repeatedly Solving Subproblems §

2.3.4) Reimplementing Builtins §

3) Finding and Fixing Efficiency Bugs §

3.1) Time your code §

3.2) Make a list of potential slow functions §

3.3) Make a list of efficiency bug hypotheses §

3.4) Select your best hypothesis and implement the fix §

3.5) Time your code again (and note the difference) §

4) Conclusion §