Efficiency
Table of Contents
1) Introduction §
What makes for a good program? One of the first things that people may think of is program correctness. In most programming courses, style and readability are also emphasized. Correct and readable programs are not very useful, however, if they do not run in a reasonable amount of time. This is where program efficiency comes into play.
Print-statement debugging and thoughtful test cases can help programmers fix correctness issues, and tools like Pylint and Black can find and fix many style issues. Efficiency bugs , which are pieces of code that lead to extra work or memory which slow a program down without affecting correctness, can be more difficult to pinpoint. In 6.101, efficiency bugs often show up as a timed-out test case on the submission server, but may also appear as a program that takes a long time to run on your own machine -- or even never stops running!
Efficiency bugs may be difficult to locate, but learning about common efficiency bugs is a good first step towards finding and fixing them. And, just like program correctness bugs and style issues, efficiency bugs become easier to find and fix with practice.
2) Categorizing Efficiency Bugs §
What constitutes an efficiency bug is often dependent on the characteristics of the current problem. For example, many small problems do not require a highly optimized solution; it is fine to sort through a small list of numbers with a slow selection sort, as long as the list will always be small. Efficiency becomes more important as the size or frequency of the problem increases. The efficiency of sorting algorithms used by search engines has important implications for global energy usage, for example.
When approaching a new problem, it can be helpful to focus initially on correctness for small, simple cases first, because these programs are usually easier to implement and debug. It takes time and effort to optimize a program, so it's best to start with a simple, working version, and then optimize it to make it more efficient as needed.
The root causes of efficiency bugs are often similar across problems. For the purposes of 6.101, this reading focuses on time-related efficiency bugs, across three general categories:
- Superfluous Computation : Code that can be deleted, or computation that can be repeated fewer times, without affecting correct functionality.
- Suboptimal Data Structure Design : Creating or using a data structure that has slow operations, when a more efficient data structure exists.
- Suboptimal Algorithm Design : Not taking advantage of problem characteristics that would enable a significantly more efficient algorithm.
2.1) Superfluous Computation §
Superfluous computations can be fixed with small modifications to code that eliminate unnecessary or duplicate computation. The 6.101 course staff have noticed that the first two bugs in this category — failing to cache and failing to short circuit — are the most common efficiency bugs in student code.
2.1.1) Failing to Cache §
- Definition : Not storing or not using data, resulting in repeated computation.
The term
caching
means temporarily storing a data value that was expensive to obtain,
with the expectation of reusing it later in a program. In the code below, the result of calling
slow_call
on
y
is cached in the variable
y_result
. Notice, however, that the cached value
stored in
y_result
is never used -- instead,
slow_call(y)
is just called repeatedly.
y_result = slow_call(y)
if slow_call(x) < 0:
foo(slow_call(x), slow_call(y))
else:
bar(slow_call(x), slow_call(y))
The following code is an improvement, using the cached value stored in
y_result
rather than recalculating
slow_call(y)
. However, there still remains a failure
to cache in the code below.
y_result = slow_call(y)
if slow_call(x) < 0:
foo(slow_call(x), y_result)
else:
bar(slow_call(x), y_result)
The following code caches the result of
slow_call(x)
in the variable
x_result
,
eliminating the redundant calls to
slow_call(x)
.
y_result = slow_call(y)
x_result = slow_call(x)
if x_result < 0:
foo(x_result, y_result)
else:
bar(x_result, y_result)
Keep an eye out for ways to avoid calling functions repeatedly with the same inputs in your own code!
2.1.2) Failing to Short Circuit §
- Definition : Continuing a computation after the answer has already been determined, or ordering computation in a way that prevents exiting sooner, when it is possible to exit sooner.
The following function determines if a given list of words from a book has any spelling errors.
def book_has_error(book_words):
has_error = False
for word in book_words:
if spelling_error(word):
has_error = True
return has_error
If there is an error in a word early in the book, this
function will still keep running until the end of the book, long after the return value of the
function has been determined. This is a failure to short circuit. One
way to fix the code could be to add a
break
statement after the
return value is determined, to prevent the continued execution of the loop.
def book_has_error(book_words):
has_error = False
for word in book_words:
if spelling_error(word):
has_error = True
break
return has_error
Another way to fix the efficiency bug is by returning as soon as
spelling_error(word)
evaluates to
True
.
def book_has_error(book_words):
for word in book_words:
if spelling_error(word):
return True
return False
The term short circuiting is also used in programming to refer to the short-circuit evaluation of conditional expressions. The Python interpreter stops evaluating boolean expressions as soon as it determines the value of the expression.
What happens when you execute the following block of code?
if not (1 == 2 and ("5" + 5) == 10):
print("I am printed")
This program outputs:
I am printed
This may be surprising because evaluating
"5" + 5
would normally cause a
TypeError
. However, because
1 == 2
evaluates to
False
, and
False and something_else
will always evaluate to
False
, the Python
interpreter short-circuits and does not evaluate the rest of the
conditional statement. Because
"5" + 5
is never executed, the
TypeError
never occurs!
Ordering the boolean-valued functions in a boolean expression strategically can often lead to program speedups. Which of the following code blocks do you think will generally run faster on random integer inputs?
def even_with_n_prime_factors(x, n):
return len(get_prime_factors(x)) == n and x % 2 == 0
def even_with_n_prime_factors(x, n):
return x % 2 == 0 and len(get_prime_factors(x)) == n
get_prime_factors
function will not be called. In contrast,
the first implementation will call
get_prime_factors
on every integer.
2.1.3) Executing Dead Code §
- Definition : Including code that can be removed with very slight changes, when removing the code would increase efficiency without modifying correct functionality.
Strategically placed print statements are a great debugging tool. However, when print statements are left in debugged code, they can cause significant code slowdown.
How much more time do you think it will take to run the second code block than the first?
total = 0
for i in range(1_000_000):
total += i
total = 0
for i in range(1_000_000):
total += i
print(i, total)
import time
start_time = time.time()
total = 0
for i in range(1_000_000):
total += i
end_time = time.time()
total_time = end_time - start_time
print(f"{total_time=}")
Running the following code produces the output
total_time=0.09513139724731445
,
which means it ran in under .1 seconds.
start_time = time.time()
total = 0
for i in range(1_000_000):
total += i
print(i, total)
end_time = time.time()
total_time = end_time - start_time
print(f"{total_time=}")
This code, however, prints out
total_time=90.36704635620117
.
While the
total_time
you get on your computer may differ, it is safe to say
that adding the print statement makes the code well over a hundred times slower!
Not all instances of dead code are as straightforward to find as stray
print statements. For example, if you look closely at the
get_nested_items
function below you will notice that the inner
helper
function does
a lot of duplicate work.
def get_nested_items(nested_list):
"""
Given a nested list of numbers, return a set of all the numbers that it contains.
input: [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
output: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
"""
items = set()
def helper(nested_list):
for item in nested_list:
if isinstance(item, list):
# Recurse on sublist
helper(item)
# Add items of sublist
for sub_item in item:
if isinstance(sub_item, list):
helper(sub_item)
else:
items.add(sub_item)
else:
items.add(item)
helper(nested_list)
return(items)
The second version of the function, shown below, produces an identical result:
def get_nested_items(nested_list):
"""
Given a nested list of numbers, return a set of all the numbers that it contains.
input: [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
output: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
"""
items = set()
def helper(nested_list):
for item in nested_list:
if isinstance(item, list):
# Recurse on sublist.
# After this, there's no need to iterate!
helper(item)
else:
items.add(item)
helper(nested_list)
return(items)
How much extra work does the first version of
get_nested_items
perform?
Changing
items
from a
set
to a
list
makes it clear how much extra
work the initial code does.
def get_nested_items(nested_list):
"""
input: [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
(expected) output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
"""
items = []
def helper(nested_list):
for item in nested_list:
if isinstance(item, list):
helper(item)
for sub_item in item:
if isinstance(sub_item, list):
helper(sub_item)
else:
items.append(sub_item)
else:
items.append(item)
helper(nested_list)
return(items)
input = [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
print(get_nested_items(input))
def get_nested_items(nested_list):
"""
input: [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
output: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
"""
items = []
def helper(nested_list):
for item in nested_list:
if isinstance(item, list):
helper(item)
else:
items.append(item)
helper(nested_list)
return(items)
input = [1, 2, [3, 4, [5, 6, 7]], 8, [9, 10]]
print(get_nested_items(input))
The first code block outputs
[1, 2, 3, 4, 5, 6, 7, 5, 6, 7, 3, 4, 5, 6, 7, 8, 9, 10, 9, 10]
whereas the
second version outputs
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
, as desired.
It is often tempting when writing a recursive function to include an extra layer of explicit computation beyond the necessary base cases and recursive cases, but this can lead to the creation of dead code.
2.1.4) Extra Copying §
- Definition : Creating shallow or deep copies of data that either will not be modified, or that could safely be modified without copying.
Unexpected modification of a mutable object (called an aliasing bug ) is a common source of correctness bugs. For example, in the following function, modifying a list while iterating through it results in an infinite loop:
def duplicate_even_elts(my_list):
"""
Modify my_list to create adjacent duplicates of even numbers.
input: [1, 2, 3, 4, 5, 6]
(expected) output: [1, 2, 2, 3, 4, 4, 5, 6, 6]
"""
index = 0
for elt in my_list:
if elt % 2 == 0:
my_list.insert(index, elt)
index += 2
else:
index += 1
return my_list
This issue can be quickly fixed by making a copy of
my_list
to iterate
through, as shown in the code block below (which produces the correct output):
def duplicate_even_elts(my_list):
"""
Modify my_list to create adjacent duplicates of even numbers.
input: [1, 2, 3, 4, 5, 6]
output: [1, 2, 2, 3, 4, 4, 5, 6, 6]
"""
index = 0
for elt in my_list.copy():
if elt % 2 == 0:
my_list.insert(index, elt)
index += 2
else:
index += 1
return my_list
How can you fix
duplicate_even_elts
without creating a copy of
my_list
?
This is one example where a while loop is useful!
def duplicate_even_elts(my_list):
"""
Modify my_list to create adjacent duplicates of even numbers.
input: [1, 2, 3, 4, 5, 6]
output: [1, 2, 2, 3, 4, 4, 5, 6, 6]
"""
index = 0
while index < len(my_list):
elt = my_list[index]
if elt % 2 == 0:
my_list.insert(index, elt)
index += 2
else:
index += 1
return my_list
Due to the common occurrence of aliasing bugs, many programmers have a tendency to create copies of data structures, even when there is no risk of aliasing bugs. In the case below, the programmer creates a copy before iterating over it, which results in unnecessary computation:
def list_special_sum(my_list):
"""
Sums list values, doubling the values of even elements
input: [1, 2, 3, 4, 5, 6]
output: 33
"""
total = 0
for elt in my_list.copy(): # unnecessary copy -- we can use my_list without copying
if elt % 2 == 0:
total += 2 * elt
else:
total += elt
return total
Other instances of extra copying are less explicit. Can you spot the extra copying in the code below?
def make_full_names(name_tuples):
"""
name_tuples is a list of tuples in the form of
[("Kevin", "Bacon"), ("Tim, "Beaver"), ...]
Returns a list of strings in the form of ["Kevin Bacon", "Tim Beaver", ... ]
"""
full_names = []
for first, last in name_tuples:
full_names = full_names + [str(first) + " " + str(last)]
return full_names
In this case,
first
and
last
are already of type
str
, so calling
str()
results in the creation of extra copies of
first
and
last
. This extra-copying
bug is fixed in the following code:
def make_full_names(name_tuples):
"""
name_tuples is a list of tuples in the form of
[("Kevin", "Bacon"), ("Tim, "Beaver"), ...]
Returns a list of strings in the form of ["Kevin Bacon", "Tim Beaver", ... ]
"""
full_names = []
for first, last in name_tuples:
full_names = full_names + [first + " " + last]
return full_names
However, even after removing the extra copying of strings, there is still another instance of extra copying in the code! Another type of extra copying to watch out for results from creating a new list during concatenation, rather than adding values to an existing list through mutation.
The following example illustrates this difference:
def concat_copy(list1, list2):
list1 = list1 + list2
return list1
def concat_mutate(list1, list2):
list1 += list2
return list1
a = [1, 2, 3]
b = [4, [5]]
print(f"output: {a=} {b=}")
# output: a=[1, 2, 3] b=[4, [5]]
copy_result = concat_copy(a, b)
print(f"output: {a=} {b=} {copy_result=}")
# output: a=[1, 2, 3] b=[4, [5]] copy_result=[1, 2, 3, 4, [5]]
mutate_result = concat_mutate(a, b)
print(f"{a=} {b=} {mutate_result=}")
# output: a=[1, 2, 3, 4, [5]] b=[4, [5]] mutate_result=[1, 2, 3, 4, [5]]
During list concatenation
list1 = list1 + list2
, a new list is created that
makes a shallow copy of all of the elements in the two lists, that is then
stored in
list1
. If
list1
is very long and
list2
is short, a lot of extra
time will be spent copying the references to all the elements in
list1
.
On the other hand, during
list1 += list2
,
list1
is extended to contain
references to all of the elements in the second list. More details about these
operations can be found in the
environment model reading
.
What will the code below print after executing the code above?
a[0] -= 1
b[1][0] += 10
print(f"{a=}, {b=}")
print(f"{copy_result=}, {mutate_result=}")
a=[0, 2, 3, 4, [15]], b=[4, [15]]
copy_result=[1, 2, 3, 4, [15]], mutate_result=[0, 2, 3, 4, [15]]
If this is surprising, try drawing an environment diagram!
How would you resolve the last extra copying bug in
make_full_names
?
def make_full_names(name_tuples):
"""
name_tuples is a list of tuples in the form of
[("Kevin", "Bacon"), ("Tim, "Beaver"), ...]
Returns a list of strings in the form of ["Kevin Bacon", "Tim Beaver", ... ]
"""
full_names = []
for first, last in name_tuples:
full_names += [first + " " + last]
# or full_names.extend([first + " " + last])
# or full_names.append(first + " " + last)
return full_names
2.1.5) Failing to Hoist §
- Definition : Performing a computation at every recursive step or loop iteration, when a single computation external to the loop or recursive structure would suffice.
It may be best to demonstrate failing to hoist with an example. In the
code below, the
is_palindrome
function checks the type of the input
word
during each recursive call, even though whether or not
word
is a
str
does
not change between calls.
def is_palindrome(word):
if not isinstance(word, str):
raise TypeError("Input must be a str")
if not word:
return True
return word[0] == word[-1] and is_palindrome(word[1:-1])
The repeated type checking can be hoisted out of the recursive structure by using a helper function, as shown below:
def is_palindrome(word):
if not isinstance(word, str):
raise TypeError("Input must be a str")
def helper(word):
if not word:
return True
return word[0] == word[-1] and helper(word[1:-1])
return helper(word)
2.1.6) Extra Loop / Recursive Overhead §
- Definition : Implementing multiple steps of an algorithm using similar loops or recursive operations, when those steps could be combined more efficiently into one loop or recursive operation.
The
check_name_formats
function below checks for the
first
and
last
names
in separate
for
loops:
def check_name_formats(name_tuples):
"""
name_tuples is a list of tuples in the form of
[("Kevin", "Bacon"), ("Tim, "Beaver"), ...]
Returns whether the first letter of each first and last name
is capitalized and that the remaining letters are lowercase
"""
for first, last in name_tuples:
if not str.isupper(first[0]) or not str.islower(first[1:]):
return False
for first, last in name_tuples:
if not str.isupper(last[0]):
return False
return True
This not only costs more in loop overhead (since the list is iterated twice), but here also means failing to short circuit if all the first names are properly formatted but some last name is not.
The code below improves on the previous version by combining the two separate
loops for the
first
and
last
names into a single
for
loop.
def check_name_formats(name_tuples):
"""
Name tuples is a list of tuples in the form
[("Kevin", "Bacon"), ("Tim, "Beaver"), ...]
Returns whether the first letter of each first and last name
is capitalized and that the remaining letters are lowercase
"""
for first, last in name_tuples:
if not str.isupper(first[0]) or not str.islower(first[1:]):
return False
if not str.isupper(last[0]):
return False
return True
2.2) Suboptimal Data Structure Design §
Optimally structuring data often depends both on the information that is being stored and the tasks that it is being used for. Designing efficient data structures requires careful planning.
2.2.1) Slow Data Structure §
- Definition : Poor choice of data structure resulting in inefficient operations.
When choosing a data structure for a program, it helps to
think about the most common operations the data will be used for. For example,
if frequent containment checks will be necessary, a
set
or
dict
may be most
effective, but if data is going to be added to a collection in order, a
list
might be most appropriate.
For example, the following function returns the unique words from a collection of words.
def get_unique_words(words):
unique_words = []
for word in words:
if word not in unique_words:
unique_words.append(word)
return unique_words
As discussed in the
Flood Fill reading
,
the convenience of the Python
in
keyword disguises the slow
list
containment
check operation. Using a
set
for
unique_words
leads to faster containment
checks, so it is better suited to this problem:
def get_unique_words(words):
unique_words = set()
for word in words:
unique_words.add(word)
return unique_words
Conveniently, the
get_unique_words
function can be written much more simply
as follows:
def get_unique_words(words):
return set(words)
The 6.101 course staff have noticed that using a
list
or
tuple
where using a
set
would lead to more efficient operations is one of the most common
causes of slow data structure efficiency bugs.
Sometimes, the provided data isn’t formatted well for solving a given problem. Consider the following code:
color_preferences = [
("Jaeyun", "blue"),
("Paola", "blue"),
("Nayra", "violet"),
("Julia", "green"),
("Mayu", "blue"),
("Jack", "pink"),
("Ximeng", "green"),
("Shreya", "red"),
("Kareem", "blue"),
]
unique_colors = set([color for _,color in color_preferences])
for color in unique_colors:
names = [name for name, color_pref in color_preferences if color_pref == color]
print(f"These people like {color}: {names}")
In order to go from the
color_preferences
list to the names of people with
each color preference, the code above repeatedly loops through
color_preferences
for each color. While this does not pose a big problem for a small list of
color_preferences
, if the list grew to hundreds of thousands or millions of
people, this would start to pose a greater issue.
What kind of data structure would allow us to more efficiently display people's color preferences?
color_preferences = [
("Jaeyun", "blue"),
("Paola", "blue"),
("Nayra", "violet"),
("Julia", "green"),
("Mayu", "blue"),
("Jack", "pink"),
("Ximeng", "green"),
("Shreya", "red"),
("Kareem", "blue"),
]
color_preference_dict = {}
for name, color in color_preferences:
if color not in color_preference_dict:
color_preference_dict[color] = []
color_preference_dict[color].append(name)
for color in color_preference_dict:
print(f"These people like {color}: {color_preference_dict[color]}")
Adding additional computation to store the color preferences in a
dict
eliminates all but one iteration through the original
color_preferences
list.
2.2.2) Expensive Data Transformation §
- Definition : The amount of time taken to create a data structure is significantly greater than the amount of time saved by using this new structure.
There is a tradeoff between time spent transforming data and the time spent interacting with data. As shown in the previous section, effective data transformations may take some extra initial computation, but can greatly improve the overall efficiency of a program, especially when common operations will be repeated many times. In some cases, however, the time taken to transform data outweighs the efficiency gains from the transformation.
The code below precomputes the positive factors of a million integers. In cases where these factors were used many times, this could be useful, but in this case, where a human is entering in the input manually, it is likely that precomputing a million answers will lead to a lot of wasted computation.
def get_positive_factors(x):
factors = set()
for num in range(1, int(x ** 0.5) + 1):
if x % num == 0:
factors.add(num)
factors.add(x // num)
return factors
factor_map = {num : get_positive_factors(num) for num in range(1, 1_000_001)}
while True:
try:
x = int(input("Enter a positive integer between 1 and 1,000,000: "))
except ValueError:
break
if not 1 <= x <= 1_000_000:
print("The integer must be between 1 and 1,000,000 \n")
continue
print(f"The factors of {x} are: {', '.join(str(num) for num in factor_map[x])}\n")
How would you restructure the code to avoid the expensive data transformation?
def get_positive_factors(x):
factors = set()
for num in range(1, int(x ** 0.5) + 1):
if x % num == 0:
factors.add(num)
factors.add(x // num)
return factors
factor_map = {}
while True:
try:
x = int(input("Enter a positive integer between 1 and 1,000,000: "))
except ValueError:
break
if not 1 <= x <= 1_000_000:
print("The integer must be between 1 and 1,000,000 \n")
continue
if x not in factor_map:
factor_map[x] = get_positive_factors(x)
print(f"The factors of {x} are: {', '.join(str(num) for num in factor_map[x])}\n")
Notice that we are still using
factor_map
as a
cache
, to avoid recomputing the factors of a number we've already seen. But getting rid of
factor_map
and computing the factors each time a number is
entered would also work, assuming that a human user isn't likely to ask for the same
numbers over and over again.
If you are unsure whether a data transformation is too expensive,
you can use
time.time()
to determine how long the
transformation takes and how long the actual operation takes.
2.3) Suboptimal Algorithm Design §
Different solutions to the same problem may be appropriate for different contexts. As mentioned in the introduction, a slow sorting algorithm may work well for small cases, but more efficient sorting algorithms become necessary as the data size scales up. Whereas superfluous computations tend to be low-level efficiency bugs that can be solved with small adjustments to existing code, suboptimal algorithm design may require making more substantial changes to the code as a result of changing the higher-level algorithm.
The first three subcategories of suboptimal algorithm design can be fixed with algorithm optimizations or more effective use of problem constraints. The last subcategory is more language specific: reimplementing builtins reduces performance for Python programs and may result in slowdowns in other languages as well, depending on their implementation.
2.3.1) Solving Irrelevant Subproblems §
- Definition : A brute-force approach involving solving unnecessary subproblems before checking constraints.
Consider the
get_valid_moves
function that finds the
valid_moves
from
a
current_loc
, avoiding obstacles.
def get_all_locations(board):
for r in range(board["rows"]):
for c in range(board["cols"]):
yield (r, c)
def get_valid_moves(board, current_loc):
"""
Gets valid_moves from current_loc
A valid move must be in bounds and adjacent horizontally
or vertically to the current_loc and not be on an obstacle
inputs:
board = {
"rows" : 5,
"cols" : 6,
"obstacles" : {(0, 0), (3, 0), (2, 5)}, # (row, col)
}
current_loc = (1, 5)
output: {(1, 4), (0, 5)}
"""
valid_moves = set()
for r, c in get_all_locations(board):
if (r, c) in board["obstacles"]:
continue
# move must be directly above or to the side of current_loc
distance = abs(r - current_loc[0]) + abs(c - current_loc[1])
if distance == 1:
valid_moves.add((r, c))
return valid_moves
In the code above, generating and looping through all the locations before
checking the constraint of the move being directly adjacent to the
current_loc
is an example of unnecessary brute force. The moves that are not
directly adjacent horizontally or vertically to
current_loc
do not need to be
explored at all, so generating and checking each of these distant locations is
irrelevant.
How would you refactor
get_valid_moves
?
The approach shown here uses the constraint that
valid_moves
must be directly
adjacent to the
current_loc
in order to only explore viable neighboring moves.
def get_valid_moves(board, current_loc):
valid_moves = set()
for row_delta, col_delta in ((1, 0), (-1, 0), (0, 1), (0, -1)):
r = current_loc[0] + row_delta
c = current_loc[1] + col_delta
if not ( 0 <= r < board["rows"] ):
continue
if not ( 0 <= c < board["cols"] ):
continue
move = (r,c)
if move in board["obstacles"]:
continue
valid_moves.add(move)
return valid_moves
However, the utility of writing brute force approaches to problem solving should not be fully discounted. It is often helpful to write a correct solution to a problem (even if it is slow) before figuring out a more efficient solution as needed. Sometimes, this means writing a preliminary brute force approach and then applying problem constraints to a faster future version.
2.3.2) Misordering Subproblems §
- Definition : Failing to explore subproblems according to a valid heuristic, when doing so would lead to performance gains.
Typically, a heuristic technique involves a ranking of items to explore, such that items which are more likely to lead to a solution are ranked higher, and explored first. For example, the popular word game Wordle involves guessing a five-letter word in as few guesses as possible. Each time you guess a word, you gain information about which letters are correct, which letters are in the word but in the wrong location, and which letters are incorrect. A good heuristic to help solve the subproblem of unknown letters within the word might involve selecting a word that has distinct letters that haven't been used in previous guesses but are commonly found in five-letter words. For example, a word like "raise" may be a good first guess because it contains five distinct common letters, while "esses" may be less wise because it contains only two distinct letters.
More concretely, in the following example, suppose
shortest_to_longest
is a list of the words
found in Mary Shelly's
Frankenstein
sorted in ascending order by length. The
goal of the code is to identify the longest word in
Frankenstein
starting with
the letter 'z'.
# explore words from shortest to longest by length
for word in shortest_to_longest:
if word[0] == "z":
longest_z_word = word
print(longest_z_word)
This code has the subproblem of finding the words in the book that start with the desired letter "z". However, exploring the words from shortest to longest means we are exploring some of the least likely words first. If instead we used a heuristic like exploring the longest words first, we could potentially avoid looking through all the words, because the first word found starting with 'z' would also be the longest word starting with 'z'.
# explore words from longest to shortest by length
for word in reversed(shortest_to_longest):
if word[0] == "z":
longest_z_word = word
break
print(longest_z_word)
However, if there are no words starting with 'z' in all of Frankenstein , both solutions would have to check all the words. (This is not the case; Frankenstein uses the word "zeal" four times.) While iterating through words from longest to shortest will not lead to performance gains for all books or individual letters, for most general cases, this change will significantly improve efficiency.
2.3.3) Repeatedly Solving Subproblems §
- Definition : Re-exploring parts of the problem space.
Consider the problem of finding all the pairwise sums of numbers in a list called
nums
:
all_pairwise_sums = set()
for num1 in nums:
for num2 in nums:
all_pairwise_sums.add(num1 + num2)
This code is doing about twice as much work as it has to, because addition is commutative.
Suppose that
nums = [1, 2, 4, 7, 13]
. Then, the code above will add both
2 + 13
and
13 + 2
to
all_pairwise_sums
. By adjusting the index of the first
num2
element, this duplicate computation can be eliminated:
all_pairwise_sums = set()
for i in range(len(nums)):
for j in range(i, len(nums)):
all_pairwise_sums.add(nums[i] + nums[j])
However, sometimes we need to keep track of which subproblems we have solved in order to avoid re-solving them. This is the purpose of using a visited set, as discussed in the Flood Fill reading . Later on, the Functional Programming reading discusses the concept of memoization , which is when a function caches the answers to inputs to avoid re-computing the answer when it comes across an input it has seen previously. For the purposes of 6.101, memoization is a useful technique but is not emphasized in the labs.
2.3.4) Reimplementing Builtins §
- Definition : Implementing an algorithm instead of using a more efficient builtin.
Because CPython, the default Python interpreter, implements Python builtin
functions such as
sum
or
max
in the C language (which is much faster),
using builtins rather than reimplementing equivalent functionality in Python
usually leads to faster code.
The code below finds the maximum value in a list of numbers.
def max_num(nums):
if not nums:
raise ValueError("nums argument is empty")
best = nums[0]
for num in nums:
if num > best:
best = num
return best
The
max_num
function has nearly identical behavior to the builtin
max
function. Using the Python
time
module, the time difference between the
builtin and reimplemented
max
function can be seen.
from random import randint
import time
nums = [randint(1, 1000) for _ in range(1000)]
def max_num(nums):
if not nums:
raise ValueError("max_num() nums argument is empty")
best = nums[0]
for num in nums:
if num > best:
best = num
return best
start_time = time.time()
for i in range(100_000):
x = max_num(nums)
end_time = time.time()
total_time_reimplemented = end_time - start_time
print(f"{total_time_reimplemented=}")
start_time = time.time()
for i in range(100_000):
x = max(nums)
end_time = time.time()
total_time_builtin = end_time - start_time
print(f"{total_time_builtin=}")
Running the code above produces the following:
total_time_reimplemented=2.5645499229431152
total_time_builtin=0.8306694030761719
Reimplementing builtins can be a great way to learn how to program, but if efficiency is important for the task at hand, using builtins is the way to go.
3) Finding and Fixing Efficiency Bugs §
The previous section spent a lot of time discussing ways common efficiency bugs arise and how to fix them. Even with this knowledge, the art of optimizing and speeding up code can be a very complex task and can even be more difficult than getting the code correct in the first place, so try not to feel discouraged when it proves difficult to speed things up. However, approaching efficiency bugs in a systematic way can help you find and fix them, especially when you have a large program with many interconnected pieces. The following steps can help make the efficiency bug finding and fixing process more systematic:
- Time your code
- Make a list of potential slow functions or blocks of code
- Make a list of efficiency bug hypotheses and possible fixes within these places
- Select your best hypothesis and implement the fix
- Time your code again (and note the difference!)
These steps are explained in greater detail in the following sections.
3.1) Time your code §
The first step to timing your code is to determine what exactly you are timing. In the context of 6.101 labs, this is generally going to be a specific test case that is already passing, but runs slowly or it might be a test case that runs seemingly forever (despite passing the tests for simpler inputs.)
As discussed in the
Command Line reading
running
pytest -k TEST_NAME test.py
will allow you to select a specific test
case to run. Adding the
-s
flag (i.e.,
pytest -k TEST_NAME test.py -s
)
will allow you to see the print statements for a test case as it is running.
Once you have isolated a specific test case, put
time.time()
and
print
statements in your code to measure the parts you want to learn about.
The
Flood Fill reading
demonstrates this, and
another example
is found earlier in this reading.
It is generally a good idea to run the program a few times to get a sense of how long the program actually takes. Remember to write the time down so that you can compare it to later versions of your program to see whether there was any improvement.
In general, the submission server is going to give you more consistent timing data than your computer, but it also has timeouts. If you hit the timeout you don't know exactly how long your code would take to finish running, so do not use the timeout limit as your time. Instead, it may be useful to note how many test cases passed within the time limit, or what test case was running when the timeout occurred. Also different machines run code faster or slower so when you time your code, make sure to use the same computer (control that variable in your experiment!)
time.time()
Throughout some of the explanations above, we have used Python's
time.time()
function to
measure the runtime of various pieces of code.
While this will be useful as a rough measure of runtime, note that using a single real-time runtime to determine a program's overall performance can be very misleading. This is because the real-time runtime of a piece of code can vary wildly, especially for small programs, as some machines are simply faster than others. And factors beyond your control can cause varied runtimes even on the same machine running the same code repeatedly. (For example, if the operating system is running many programs in the background, the Python program may run more slowly.)
To minimize the impact of this variance, it is a good idea to run the code multiple times and take an average of the trials. This is why for many of the small examples above, we calculated the runtime by repeating the operation thousands of times. As you make changes to your code running more repeated trials will allow you to determine whether your code's average runtime has substantially decreased or increased.
There are other reliable ways of evaluating the efficiency of programs that don't involve measuring the code's runtime, but 6.101 does not focus on those methods. 1
3.2) Make a list of potential slow functions §
After timing your code, inspect the program you are running. What functions does the test case call, and do those functions use any helper functions? From this information, you can start to guess at what functions or places within the functions are taking the most time.
If you are not certain about which function is causing the slowdown you can
collect more time data on individual functions. While
pytest
will report how
much time it took to run the test overall, you may also want to use print
statements and the
time
module as shown in the previous examples to
get more specific timing details about specific functions that you think might
be slow.
After you have at least one or two places in mind that may be sources of efficiency bugs, move on to the next step.
3.3) Make a list of efficiency bug hypotheses §
For each piece of slow code, make hypotheses for what specific efficiency bugs may be causing the slowdown. Start with superfluous computation and data transformations, because those tend to be easier to spot and fix. If you can rule those out, it may be time to think about your larger algorithm and why it might be suboptimal. For each efficiency bug hypothesis, think about what it would take to fix this bug.
3.4) Select your best hypothesis and implement the fix §
Take your best potential efficiency bug (either the one that is the easiest to fix or the one you think will lead to the biggest performance improvement) and implement it. It is usually a good idea to save the original code in a separate file, in case you need to go back to this version of the code.
3.5) Time your code again (and note the difference) §
Finally, determine if the change made things faster or slower by timing your code again on the same tests on the same machine. Note the difference between this time and your original code time.
Big efficiency bug fixes should be apparent on the first run (generally improvements that make ≥ 1 second or ≥ 10% improvements). Small improvements may require a few measurements to see if it truly made a difference.
If the code turns out to be slower instead (which happens sometimes!), then revert to the previous version and go back to selecting a new hypothesis.
If the code still passes the test and is faster (hooray!), determine whether you need to continue making the code more efficient by submitting to the server. If the code still needs to run faster, go back to collecting more time data and selecting a different function or a different hypothesis. Note that the same piece of code may contain multiple different kinds of efficiency bugs!
4) Conclusion §
This reading has described some of the most common efficiency bugs that 6.101 course staff have observed in student code across three general categories of superfluous computation, suboptimal data structure, and suboptimal algorithm. Being aware of these categories will aid you as you systematically time, identify, and fix potential sources of efficiency bugs both in 6.101 and in the future programs that you work with. 2