Sequences are an abstract type in Python that represent ordered collection of elements: e.g., strings, lists, range objects, etc.
Today we will focus on strings which are an ordered sequence of individual characters (also of type str
)
word = "Hello"
'H'
is the first character of word, 'e'
is the second character, and so on.'H'
is the zeroth character of word, 'e'
is the first character, and so on.We can access each character of a string using indices in Python.
[]
operator¶word = 'Williams'
word[0] # character at 0th index?
'W'
word[3] # character at 3rd index?
'l'
word[7] # character at 7th index?
's'
word[8] # will this work?
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[6], line 1 ----> 1 word[8] # will this work? IndexError: string index out of range
len()
function. Python has a built-in len()
function that computes the length of a sequence such as a string (or a list, which we will see in next lecture).
Thus, a string word
has (positive) indices 0, 1, 2, ..., len(word)-1
.
len("Williams")
8
len("pneumonoultramicroscopicsilicovolcanoconiosis") # longest word in English
45
Python also allows for negative indices, staring at -1
which is a handy way to refer to the last element of a non-empty sequence (regardless of its length).
Thus, a string word
has (negative) indices -1, -2, ..., -len(word)
.
place = "Williamstown"
place[-1]
'n'
len(place)
12
place[-12]
'W'
is_vowel
function¶Let us write a function is_vowel
function that takes a character as input and returns whether or not it is a vowel.
We can check if a letter is a vowel by comparing it against every possible upper or lower case vowel. We will learn a better way to write this function soon, but this works for now!
def is_vowel(char):
"""Takes a char (str) returns True if char is a vowel otherwise False."""
l_case = char == 'a' or char == 'e' or char == 'i' or char == 'o' or char == 'u'
u_case = char == 'A' or char == 'E' or char == 'I' or char == 'O' or char == 'U'
return l_case or u_case
is_vowel('A')
True
is_vowel('z')
False
is_vowel('u')
True
Problem. Write a function count_vowels that takes a string word as input, counts and returns the number of vowels in the string.
def count_vowels(word):
'''Returns number of vowels in the word'''
pass
Expected behavior:
>>> count_vowels('Williamstown')
4
>>> count_vowels('Ephelia')
4
>>> count_vowels('rythm')
0
Re-using functions. Since we have defined is_vowel
, we can use it to test individual characters of the string, rather than starting from scratch.
What do we need to do to solve this problem?
counter
for all vowels seen so far)Suppose we manually check each character of the string and update a counter if it is a vowel.
word = 'Williams'
counter = 0
if is_vowel(word[0]):
counter += 1
if is_vowel(word[1]):
counter += 1
if is_vowel(word[2]):
counter += 1
if is_vowel(word[3]):
counter += 1
if is_vowel(word[4]):
counter += 1
if is_vowel(word[5]):
counter += 1
if is_vowel(word[6]):
counter += 1
if is_vowel(word[7]):
counter += 1
print(counter)
3
Question. How good is this approach? Will it work for any word?
word = 'Williamstown'
counter = 0
if is_vowel(word[0]):
counter += 1
if is_vowel(word[1]):
counter += 1
if is_vowel(word[2]):
counter += 1
if is_vowel(word[3]):
counter += 1
if is_vowel(word[4]):
counter += 1
if is_vowel(word[5]):
counter += 1
if is_vowel(word[6]):
counter += 1
if is_vowel(word[7]):
counter += 1
print(counter)
3
Takeaway. Downsides of this approach are many:
for
loops¶We can "iterate" over the elements of a sequence using a for
loop. A loop is a mechanism to repeat the same operations for an entire sequence.
for
loop¶for var in seq:
do something
var
above is called the loop variable of the for
loop. It takes on the value of each of the elements of the sequence one by one.
# simple example of for loop
word = "Williams"
for char in word:
print(char)
W i l l i a m s
countVowels
¶Now, we are ready to implement our function that takes a string as input and returns the number of vowels in it.
def count_vowels(word):
'''Takes a string as input and returns
the number of vowels in it'''
count = 0 # initialize the counter
# iterate over the word one character at a time
for char in word:
if is_vowel(char):
count += 1 # update counter
return count
count_vowels('Williams')
3
count_vowels('Ephelia')
4
# count_vowels() # give me a word with a lot of vowels
Pythonic looping. Notice that the for
loop does not need to know the length of the sequence ahead of time. This is a bit of Python magic (in other languages such as Java, you do need to know the length of the sequence you are iterating over). In Python, the for
loop automatically finishes after the sequence runs out of elements, e.g., word
runs out of characters, even though we have not computed the length manually.
Tracing the loop. To observe how the variables char
and count
change state as the loop proceeds, we can add print statements.
def trace_count_vowels(word):
'''Traces the execution of countAVowels function'''
count = 0 # initialize the counter
for char in word: # iterate over the word one character at a time
print('char, count: ('+ char + ' , ' + str(count) +')')
if is_vowel(char):
print('Incrementing counter')
count += 1
return count
trace_count_vowels('Williams')
char, count: (W , 0) char, count: (i , 0) Incrementing counter char, count: (l , 1) char, count: (l , 1) char, count: (i , 1) Incrementing counter char, count: (a , 2) Incrementing counter char, count: (m , 3) char, count: (s , 3)
3
trace_count_vowels('Queue')
char, count: (Q , 0) char, count: (u , 0) Incrementing counter char, count: (e , 1) Incrementing counter char, count: (u , 2) Incrementing counter char, count: (e , 3) Incrementing counter
4
Summary. As you can see, the loop variable char
takes the value of every character in the string one by one until the last character. Inside the loop, we check if char
is a vowel and if so we increment the counter.
vowel_seq
¶Define a function vowe_seq
that takes a string word as input and returns a string containing all the vowels in word in the same order as they appear.
Example function calls:
>>> vowel_seq("Chicago")
'iao'
>>> vowel_seq("protein")
'oei'
>>> vowel_seq("rhythm")
''
def vowel_seq(word):
'''Returns the vowel subsequence in given word'''
vowels = "" # initialize accumulation var
for let in word:
if is_vowel(let): # if vowel
vowels += let # accumulate
return vowels
vowel_seq("Chicago")
'iao'
vowel_seq("protein")
'oei'
vowel_seq("rhythm")
''
Recall that sequences are an abstract type in Python that represent ordered collections of elements.
In the last lecture we focused on strings. Today we will discuss lists.
Unlike strings, which are a homogenous sequence of characters, lists can be a collection of heterogenous objects.
# Examples of various lists:
word_list = ['What', 'a', 'beautiful', 'day']
num_list = [1, 5, 8, 9, 15, 27]
char_list = ['a', 'e', 'i', 'o', 'u']
mixed_list = [3.145, 'hello', 13, True] # lists can be heterogeous
type(num_list)
list
word_list = ['What', 'a', 'beautiful', 'day']
word_list[3]
'day'
word_list[-1]
'day'
len(word_list)
4
# can loop over lists just like we loop over strings
name_list = ["Chels", "Artie", "Pixel", "Linus", "Jerry", "Velma", "Wally"]
for name in name_list:
print(name)
Chels Artie Pixel Linus Jerry Velma Wally
Here are several operators that apply to any sequence, including lists and strings.
[]
len
function to find length[:]
in
and not in
operators+
The in
operator tests membership and returns True
if and only if an element is in a sequence. On the other hand, the not in
operator returns True
if and only if a given element is not in the sequence.
Note that it is preferable and more readable to say if el not in seq
compared to the (logically equivalent) if not el in seq
.
Python allows us to extract subsequences of a sequence using the slicing operator [:]
.
For example, suppose we want to extract the substring Williams
from Williamstown
. We can use the starting and ending indices of the substring and the slicing operator [:]
.
place = "Williamstown"
# return the sequence from 0th index up to (not including) 8th
place[0:8]
'Williams'
place[5:7] # what will this return?
'am'
place[4:4] # what will this return?
''
place[1:] # if second index not provided, defaults to len
'illiamstown'
place[:8] # if first index not provided, defaults to 0
'Williams'
place[:] # what will this do?
'Williamstown'
place[8:100] # notice no indexError
'town'
place[-4:-1] # can also use negative indices to slice
'tow'
The slicing operator [:]
optionally takes a third step parameter that determines in what direction to traverse, and whether to skip any elements while traversing and creating the subsequence.
By default the step is set to +1 (which means move left to right in increments of one).
We can pass other step parameters to obtain new sliced sequences; see examples below.
place = "Williamstown"
place[:8:1] # 1 is default
'Williams'
place[:8:2] # go left to right in increments of 2
'Wlim'
place[::2] # can you guess the answer?
'Wlimtw'
The optional parameter does not come up too often, but does provide a nifty way to reverse sequences.
For example, to reverse a string, we can set the optional step parameter to -1
.
place[::-1] # reverse the sequence
'nwotsmailliW'
in
operator¶The in
operator in Python returns True
/False
value and is used to test if a given sequence is a subsequence of another sequence.
For example, we can use it to test if a string is a substring of another string (a substring is a contiguous sequence of characters within a string, e.g. Williams
is a substring of Williamstown
)
'Williams' in 'Williamstown'
True
'W' in 'Williams'
True
'w' in 'Williams' # capitization matters
False
'liam' in 'WiLLiams' # will this work?
False