Chapter 6 Lists and Sequences
In this module, we will cover using lists in Python, which is a data type representing a sequence of data values (similar to how a string
is a squence of letters, and a range
is a sequence of numbers). A list is a fundamental data type in Python, and key to writing almost all practical programs. Lists are used to store and organize large sets of data (and computer programs usually deal with lots of data). This module cover how to create, access, and utilize lists to automate the processing of larger amounts of data.
Contents
6.2 Lists
A list is a mutable, ordered sequence of values that are all stored in a single variable. For example, you can make a vector names
that contains the strings “Sarah”, “Amit”, and “Zhang”, or a vector one_to_hundred
that stores the numbers from 1 to 100. Each value in a list is refered to as an element of that list; thus our names
list would have 3 elements: "Sarah"
, "Amit"
, and "Zhang"
.
Lists are written as literals inside square brackets ([]
), with each element in the list separated by a comma (,
):
# a list of names
names = ["Sarah", "Amit", "Zhang"]
# a list of numbers (can contain "duplicate" values)
numbers = [1, 2, 2, 3, 5, 8]
# lists can contain different types (including other lists!)
things = ["raindrops", 2, True, [5, 9, 8]]
# lists can be empty (with no elements)
empty = []
- List variables should be named using plurals (name_s_, number_s_, etc.), because lists hold multiple values!
Other sequences (such as strings or ranges) can be converted into lists by using the built-in list()
function:
list("hello") # ['h', 'e', 'l', 'l', 'o']
list(range(1,5)) # [1, 2, 3, 4]
6.2.1 List Indices
We can refer to individual elements in a list by their index, which is the number of their position in the list. Lists are zero-indexed, which means that positions are counted starting at 0
. For example, in the list:
vowels = ['a', 'e', 'i', 'o', 'u']
The 'a'
(the first element) is at index 0, 'e'
(the second element) is at index 2, and so on.
- This also means that the last element can be found at the index
length_of_list - 1
. This pattern is why values likerange(5)
include0
but exclude the last5
.
You can retrieve an element from a list using bracket notation: you refer to the element at a particular index of a list by writing the name of the list, followed by square brackets ([]
) that contain the index of interest:
names = ["Sarah", "Amit", "Zhang"]
# access the element at index 0
name_first = names[0]
print(name_first) # Sarah
# access the element at index 2
name_third = names[2]
print(name_third) # Zhang
# accessing an index not in the list will give an error
name_fourth = names[3] # IndexError!
# negative indices count backwards from the end
name_last = names[-1] # Zhang
name_second_to_last = names[-2] #Amit
The value inside the square brackets can any expression that resolves to an integer, including variables:
names[1+1] # "Amit"
last_index = len(names) - 1 # last index is length of list - 1
names[last_index] # "Zhang"
# Don't forget to subtract one from the length!
names[len(names)] # IndexError!
# Using an index of `-1` is a better solution
It is possible to select multiple, consecutive elements from a list by specifying a slice. A slice is written as the starting and ending indices separated by a colon (:
); the starting index is included and the ending index is excluded. For example:
letters = ['a', 'b', 'c', 'd', 'e', 'f']
# indices 1 through 3 (non-inclusive)
letters[1:3] # ['b', 'c']
# indices 3 to the end (inclusive)
letters[3:] # ['d', 'e', 'f']
# indices up to 3 (non-inclusive)
letters[:3] # ['a', 'b', 'c']
# indices 2 to (2 from end) (non-inclusive)
letters[2:-2] # ['c', 'd'], the `e` is excluded
# all the indices. This produces a new list with the same contents!
letters[:]
An indexed reference to a list element (e.g., names[0]
) is effectively a variable in its own right: you can think of names[0]
, names[1]
, and names[2]
as being equivalent to having variables names_0
, names_1
, names_2
(each of which has its own value). Lists effectively provide a “shortcut” for having lots and lots of variables that are all related; instead we can “collect” those variables into a list!
Because list references are variables, they can be used anywhere that a “normal” variable can be. In particular, this means that variables can be assigned to them, allowing the list to be mutated (changed):
# a list of school supplies
school_supplies = ["Backpack", "Laptop", "Pen"]
# replace "Pen" with "Pencil"
school_supplies[2] = "Pencil"
print(school_supplies) # ['Backpack', 'Laptop', 'Pencil']
# You can only assign values to "variables" that exist in the list!
school_supplies[3] = "Paper" # IndexError!
Just as with variables: if the list index (e.g., my_list[index]
) is on the left side of an assignment, it means the variable (which “slot” in the list). If it is on the right side of an assignment, it means the value (which element is in that slot).
Finally, note that strings are also indexed sequences of characters. Thus you can use bracket notation to refer to individual letters, and many string methods utilize the index in their arguments:
message_str = "Hello world"
message_str[1:5] # "ello"
# find the index of the 'w'
message_str.find('w') # 6
6.3 List Operations and Methods
Lists support a number of different operations and methods (functions):
# Addition (+) to combine lists
['a','b'] + [1,2] # ['a', 'b', 1, 2]
# Multiplication (*) performs multiple additions
[1,2] * 3 # [1, 2, 1, 2, 1, 2]
# A sample list
s = ['a', 'b', 'c', 'd']
# Add value to end of list
s.append('x') # add 'x' at end
# Add value in middle of list
s.insert(2, 'y') # put 'y' at index 2 (everyone else shifts over)
# Remove from end of list ("pop off")
s.pop() # removes and returns the last item (`x`)
# Remove from middle
s.pop(2) # removes and returns item at index 2 (`y`)
# Remove specific value
s.remove('c') # remove the first 'c' in the list; nothing returned
# Remove all elements
s.clear()
- Note that list methods such as
append()
orclear()
usually mutate the existing list value and then returnNone
. In comparison, string methods (such aslower()
orreplace()
) will return a different string value. This is because lists can be changed, but strings cannot (they are immutable).
When comparing lists using a relational operator (e.g., ==
or >
), the operation is applied to the lists member-wise: each element in the first list operand is compared to the element at the same index in the second list operand. If the comparison is True
for each pair of elements, then the expression is True
. In practice, this means that (a) you can use ==
to compare the contents of lists, and (b) a list is “smaller” than another if it’s first item is smaller than the other’s.
But be careful: just because two lists have the same contents (are ==
) does not mean that they are the same list! In particular, two lists can be different objects (values) but still have the same contents. In Python, you can test whether two values are actually the same value (as opposed to having the same content) using the is
operator.
# With strings, literals are shared (because they cannot be mutated)
str_a = "banana" # `a` labels string literal "banana"
str_b = "banana" # `b` labels string literal "banana"
# Both variables label the same (literal) value
str_a is str_b # True
# With lists, each list created is a different object!
list_a = [1,2,3] # `a` labels a new list [1,2,3]
list_b = [1,2,3] # `b` labels a new list [1,2,3]
a == b # True, have same values as contents
a is b # False, are two different objects
list_c = list_a # `c` labels the value that `a` labels
a is c # True, both are the same object
# Modify the list!
list_a[0] = 10
print(list_b[0]) # 1 (a different list)
print(list_c[0]) # 10 (the same list)
Keeping track of whether a new list has been created is particularly important when using lists as arguments to functions. Function arguments are local variables that are assigned the passed value—if this value is a list, then the assigned variable will refer to the same list, and any modifications to the argument will affect the value outside of the function:
# A version of a function that modifies the list
def delete_first(a_list):
a_list.pop(0) # modifies the given value
letters = ['a','b','c']
delete_first(letters) # call function
print(letters) # ['b', 'c'], variable is changed
# A version of a function that does not modify the list
def delete_first(a_list):
a_list = a_list[1:] # create new local variable (replacing old local var)
letters = ['a','b','c']
delete_first(letters) # call function
print(letters) #=> ['a', 'b', 'c'], variable is not changed
6.3.1 Lists and Loops
As with other iterable sequences like strings and files, it is possible to iterate through the contents of a list by using a for loop:
numbers = [3.98, 8, 10.8, 3.27, 5.21]
for element in numbers: # `number` is a better local variable name
print(element)
# This will not let you modify the list, since the "number" variable is local
for number in numbers:
number = round(number, 1)
print(numbers) # [3.98, 8, 10.8, 3.27, 5.21], not rounded
In order to modify the list while iterating through it, you need a way to refer to the element you want to change. Since we refer to elements by their index, a better solution is to instead iteate through the range of indices rather than through the elements themselves:
numbers = [3.98, 8, 10.8, 3.27, 5.21]
for i in range(len(numbers)): # `i` for "index"
print(numbers[i]) # refer to elements by index
# Can now modify the list
for i in range(len(numbers)):
numbers[i] = round(numbers[i], 1) # change value at index to rounded version
print(numbers) # [4.0, 8, 10.8, 3.3, 5.2], rounded!
- Note that this process of applying some change to each element in a list is known as a mapping (each value “maps” or goes to some transformed version of itself). We will discuss mapping more in a later module.
6.4 Nested Lists
As noted at the start of the module, lists elements can be of any data type (and any combination of data types)—including other lists! These “lists of lists” are known as nested lists or 2-dimensional lists (or 3d-list for a “list of lists of lists”, etc). Nested lists are most commonly used to represent information such as tables or matrices.
Nested lists work exactly like normal lists; the elements just happen to themselves be indexable (like strings!):
# a list of different dinners available at a fancy party
# this list has 4 elements, each of which is a list of 3 elements
# the indentation is just for human readability
dinner_options = [
["chicken", "mashed potatoes", "mixed veggies"],
["steak", "seasoned potatoes", "asparagus"],
["fish", "seasoned rice", "green beans"],
["portobello steak", "seasoned rice", "green beans"]
]
len(dinner_options) # 4
fish_option = dinner_options[2] # ["fish", "seasoned rice", "green beans"]
# because fish_option is a list, we can reference its elements by index
print(fish_option[0]) # "fish"
In this example fish_option
is a variable that refers to a list, and thus its elements can be accessed by index using bracket notation. But as with any operator or function, it is also possible to use bracket notation on an anonymous value (e.g., a literal value that has not been assigned to a variable). That is, because dinner_options[2]
is a list, we can use bracket notation refer to an element of that list without assigning it to a variable first:
# Access the 2th element's 0th element
dinner_options[2][0] # "fish"
This “pair of brackets” notation allows you to easily access elements within nested lists. This is particularly useful for 2d-lists that represent tables as a list of “rows” (often data records), each of which is a list of “column cells” (often data features):
# a simple table of values
table = [ ['aa','ab','ac','ad'],
['ba','bb','bc','bd'],
['ca','cb','cc','cd'] ]
row = 1 # cells starting with 'b'
col = 3 # cells ending with 'd'
table[row][col] # "bd", the cell at row/col
Note that we often use nested for loops to iterate through a nested list:
for i in range(len(table)): # go through each row (with index)
for j in range(len(table[i])) # go through each col of that row (with index)
print(table[i][j]) # access ith row, jth column
- We use a
j
for the index of the nested loop, because thei
for “index” was already taken!row
andcol
are also excellent local variable names.
6.5 Tuples
While lists are mutable (changeable) sequences of data, tuples represent immutable sequences of data. These are useful if you want to enforce that a data value won’t be changed, such as for a function argument (or a dictionary key; see module 10). Indeed, many built-in Python functions utilize tuples.
Tuples are written as comma-separated sequences as values. They are often placed inside parentheses for clarity (to help indicate the start and end of the tuple values):
letters_tuple = ('a', 'b', 'c')
print(letters_tuple) # ('a', 'b', 'c')
# also a tuple (without parentheses)
numbers_tuple = 1, 2, 3
print(numbers_tuple) # (1, 2, 3)
# A tuple representing a person's name, age, and whether they are hungry
# Tuple values have _implied_ meanings, which should be explained in comments
hungry_person = ('Ada', 28, True)
# In English, tuples may be named based on the number of elements
triple = (1,2,3)
double = (4,5)
single = (6,) # extra comma indicates is a tuple, not just int `6`
empty = () # an empty expression is a tuple!
type(()) # <class 'tuple'> (type of empty expression)
- It’s important to note that while we often write tuples in parentheses (and they are printed in parentheses), it is the commas that makes a sequence of literals into a tuple. The parentheses act just like like they do in mathematical expressions—they are only necessary to clarify ambiguity in the order-of-operations. You will find that some “idiomatic” expressions using tuples forgo the parentheses, making the syntax look more magical than it is!
Elements in tuples can be accessed by index using bracket notation, just like the elements in lists. However, tuples cannot be modified, so you cannot assign a new value to an index in a tuple:
letter_triple = ('a','b','c')
print(letter_triple[0]) # 'a'
print(letter_triple[1:3]) # ('b','c'), a tuple
letter_triple[0] = 'z' # TypeError!
Tuples can be compared using relational operators just like lists, and have the “member-wise” comparison behavior described above. This makes it easy to order the immutable tuples just like you would order numbers or strings.
Finally, tuples provide one additional useful feature. The Python interpreter uses tuples to perform multiple assignments, where you assign multiple values to multiple variables in a single statement. We have already been able to assign multiple, comma-separated values to a single tuple variable (a process called packing). But Python also supports having a single sequential value (e.g., a tuple) be assigned to multiple, comma-separate variables (a process called unpacking)!
triple = 1, 2, 3 # assign multiple values to single variable (packing)
print(triple) # (1, 2, 3)
x, y, z = (1,2,3) # assign single tuple value to multiple variables (unpacking)
print(x) # 1
print(y) # 2
print(z) # 3
# the VALUE in this statement is evaluated as a tuple, and then is assigned to
# multiple variables!
a, b, c = x, y, z # a=x; b=y; c=z
# the same process can be used to swap values!
a, b = b, a
This is mostly a useful shortcut (syntactical sugar), but is also used by some idiomatic Python constructions.