## New Mutable Collection:  Sets

In Python, a **set** is a **mutable**, **unordered** and **unique** collection of **immutable objects**.

**Syntax.** Nonempty sets can be written directly as comma-separated elements delimited by curly braces.   The empty set is written `set()` rather than `{}`, because `{}` means an empty dictionary in Python (a different data structure we'll see shortly).

In [None]:
nums = {42, 17, 8, 57, 23}
flowers = {"tulips", "daffodils", "asters", "daisies"}
empty_set = set() # empty set

In [10]:
# what if make a set with duplicates?
dup_set = {1, 1, 2, 2, 2, 3, 4, 5, 5, 5}

In [11]:
# what is in dup_set?
dup_set

{1, 2, 3, 4, 5}

In [12]:
# will this work?

l_set = {[1, 2, 3], "hello"}

TypeError: unhashable type: 'list'

**Removing duplicates.** Unlike lists, sets cannot have duplicate values, which is why they are a handy way to remove duplicates from sequences.

In [2]:
first_choice = ['a', 'b', 'a', 'a', 'b', 'c']

In [3]:
uniques = set(first_choice)
print(list(uniques))

['a', 'b', 'c']


In [5]:
print(set("aabrakadabra"))

{'a', 'k', 'r', 'b', 'd'}



**Question.** What can be potential downside of this approach, compared to the `uniques()` and `candidates()` helper functions we used in Lab 3 and 4?

We lose the ordering of items.

 ### Membership, Length and Iteration on Sets

 
Some familiar operators and functions work on sets as well: 
 
 - the `in` and `not in` operators to test membership in sets, similar to lists and other collections.

 - `len()` function computes the number of items in a set

 - we can iterate over a set in a `for` loop (however, the ordering may be arbitrary)

In [None]:
nums = {42, 17, 8, 57, 23}
flowers = {"tulips", "daffodils", "asters", "daisies"}

In [None]:
print(16 in nums)
print("asters" in flowers)
print("iris" not in flowers)

**Counting Items.** We can use the `len(..)` built-in function to count the number of items in the set, similar to lists and other collections.

In [None]:
print(len(flowers))

**Iterable.** We can also iterate over a set using a `for..`loop, like with other collections such as lists.

In [None]:
# iterable 
for f in flowers:
    print(f, end=' ') # end=' ' replaces the newline at end of print with a space

**Note.** Interactive Python notebook displays sets in sorted order, but they do not inherently have any order.  Printing them will lead to an unpredictable order.

In [13]:
print(flowers)
print(type(flowers))

NameError: name 'flowers' is not defined

**Sets are unordered.**  Because sets are unordered, we cannot index into them, or concatenate them together.

In [None]:
# will this work?
flowers[1]

In [None]:
# will this work?
flowers + {"lilies"}

### Creating Sets

We can create a new set by:
  - by assigning comma separated values enclosed in `{}`, or
  - using the built-in `set()` function (similar to `list` and `str`, etc.)

We **cannot** define an empty set like we define an empty set.


In [21]:
emp_set = {}
type(emp_set)

dict

In [19]:
emp_set = set()

In [20]:
type(emp_set)

set

### Set Operations
The usual operations you think of in set theory are implemented as follows:

In [22]:
even_nums = {2, 4, 8, 10}
sqs = {2, 4, 9, 16}

**Set Union.** Returns a new set that has all elements that are in either set.

In [23]:
new_set = even_nums | sqs
new_set

{2, 4, 8, 9, 10, 16}

**Set Intersection.**  Returns a new set that has all the elements that are common to both sets.

In [24]:
new_set = even_nums & sqs
new_set

{2, 4}

**Set Difference.**   Returns a new set that has all the elements of the first set that are not in the second set.

In [25]:
new_set = even_nums - sqs
new_set

{8, 10}

**Set Mutators.**   _Sets are mutable!_ Placing an assignment operator _after_ the Set Operator, will **mutate** the first Set, rather than return a new Set!

In [26]:
even_nums = {2, 4, 8, 10}
sqs = {2, 4, 9, 16}

even_nums |= sqs

In [27]:
even_nums # has changed!

{2, 4, 8, 9, 10, 16}

In [32]:
even_nums = {2, 4, 8, 10}
sqs = {2, 4, 9, 16}

even_nums &= sqs
even_nums # has changed

{2, 4}

In [31]:
even_nums = {2, 4, 8, 10}
sqs = {2, 4, 9, 16}

even_nums -= sqs  # can do this with sets
even_nums

{8, 10}

### Example: `get_candidates` using `set`

Let us rewrite the `get_candidates` function from Lab 4 using sets.

In [48]:
def get_candidates_list(ballots):
    '''Takes as input a list of lists of strings ballots and returns a
    list of unique candidate names that appear in ballots.
    '''
    candidate_list = []
    for b in ballots:
        for candidate in b:
            if candidate not in candidate_list:
                candidate_list.append(candidate)
    return candidate_list

In [53]:
from datasets import throwdown_ballots
ballots = throwdown_ballots()

In [54]:
get_candidates_list(ballots)

['Blue Mango',
 'Spring St Market',
 'The Log',
 'Tunnel City Coffee',
 'Spice Root']

In [55]:
def get_candidates_set(ballots):
    '''Takes as input a list of lists of strings ballots and returns a
    list of unique candidate names that appear in ballots.
    '''
    candidate_set = set()
    for b in ballots:
        for candidate in b:
            candidate_set |= {candidate}
    return candidate_set

In [56]:
get_candidates_set(ballots)

{'Blue Mango',
 'Spice Root',
 'Spring St Market',
 'The Log',
 'Tunnel City Coffee'}

### Example: `madlibs_replacement` using `tuples`

Let us rewrite the `madlibs_replacement` function from Lab 3 using tuples.

In [66]:
from datasets import sample_key_list_tuples
from datasets import sample_story_list

In [73]:
story_list = sample_story_list()
key_tuples = sample_key_list_tuples()

In [74]:
story_list

['Mary',
 'had',
 'a',
 '/',
 'an',
 '<adjective1>',
 'lamb',
 '.',
 "It's",
 '<noun1>',
 'was',
 '<adjective2>',
 'as',
 '<noun2>',
 '.']

In [75]:
key_tuples

[('<adjective1>', 'little'),
 ('<noun1>', 'fleece'),
 ('<adjective2>', 'white'),
 ('<noun2>', 'snow')]

In [77]:
def find_madlibs_replacement(string, key_tuples) :
    '''Takes a placehold string and returns the corresponding
    swap value (also str) from key_list_tuples'''
    for tup in key_tuples:
        placeholder, word = tup
        if string == placeholder:
            return word
    # if placeholder is not in key_tuples
    return string

In [80]:
solved_story = []
for word in story_list:
    if len(word):
        if word[0] == '<' and word[-1] == '>':
            solved_story.append(find_madlibs_replacement(word, key_tuples))
        else:
            solved_story.append(word)

solved_story

['Mary',
 'had',
 'a',
 '/',
 'an',
 'little',
 'lamb',
 '.',
 "It's",
 'fleece',
 'was',
 'white',
 'as',
 'snow',
 '.']