Sue Evans & David Walser
Press space bar for next page
Whenever we describe a data type, we not only need to give a definition
of what that type is but also the operations that can be done on that
type. The definition may include the range of possible values that type
can hold.
Example: integer
Choosing the right data structure to solve a problem affects how easy the problem is to solve, how readable the code will be, and how efficient the resulting program will be.
You should substitute "algorithms" for "flowcharts" and "data structures" for "tables" in that last quote to use modern terminology.
You may think that a list in Python is the same as an array in C or other languages. They are similar but there are some significant differences.
There are several ways to create new lists:
>>> # assign the literal empty list >>> empty = [] >>> empty [] >>> # use the list constructor >>> empty = list() >>> empty [] >>> # give a literal assignment of items >>> items = ["bread", "milk", "cheese", "cider"] >>> # 0 1 2 3 >>> items ['bread', 'milk', 'cheese', 'cider'] >>> # use the list constructor with a string >>> letters = list("hello") >>> letters ['h', 'e', 'l', 'l', 'o'] >>> # use string.split which returns a list of strings >>> words = "a bunch of words".split() >>> words ['a', 'bunch', 'of', 'words']
Operators:
>>> items[2] 'cheese' >>> items[3] 'cider' >>> items[3] = "lemonade" >>> items[3] 'lemonade'
>>> "milk" in items True >>> "juice" in items False
>>> "two words".split() == ["two", "words"] True >>> ["this", "that"] == ["that", "this"] False
Methods:
Built-in functions that operate on lists:
append(item) - add an item to the end of the list
>>> items ['bread', 'milk', 'cheese', 'lemonade'] >>> items.append('ham') >>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham'] >>> items.append('ham') >>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham'] >>>
count(item) - count occurences of an item in the list
>>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham'] >>> items.count('ham') 2 >>> items.count('cheese') 1 >>>
remove(item) - remove the first occurence of an item in the list
>>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham'] >>> items.remove('ham') >>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham'] >>>
extend(list) - append multiple items to end of the list
>>> items.extend(['lettuce', 'tomatoes']) >>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>>
index(item) - locate the index of an item in the list
>>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>> items.index('milk') 1 >>>
insert(index, item) - insert an item before the one at the index
>>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>> items.insert(2, 'eggs') >>> items ['bread', 'milk', 'eggs', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>>
reverse() - reverse the order of the items in the list
>>> items ['bread', 'milk', 'eggs', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>> items.reverse() >>> items ['tomatoes', 'lettuce', 'ham', 'lemonade', 'cheese', 'eggs', 'milk', 'bread'] >>>
sort() - sort the list
>>> items ['tomatoes', 'lettuce', 'ham', 'lemonade', 'cheese', 'eggs', 'milk', 'bread'] >>> items.sort() >>> items ['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes'] >>>
len(list) - count number of items in the list
>>> items ['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes'] >>> len(items) 8 >>>
del(list[index]) - remove the item at the index from the list
>>> items ['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes'] >>> del(items[4]) >>> items ['bread', 'cheese', 'eggs', 'ham', 'lettuce', 'milk', 'tomatoes'] >>>
Use a list when order matters.
names = [] name = raw_input("Enter a name (or . to quit): ") while name != ".": names.append(name) name = raw_input("Enter a name (or . to quit): ") names.sort() for name in names: print name
An object that is mutable can be changed in-place. Lists are
mutable.
Example: list[2] = "foo"
>>> list = ['cat', 'dog', 'bird', 'hamster'] >>> list ['cat', 'dog', 'bird', 'hamster'] >>> list[2] = "foo" >>> list ['cat', 'dog', 'foo', 'hamster'] >>>
Strings are immutable (aka, not mutable).
Example: str[3] = 'f' is not supported
>>> str = 'fooooo' >>> str[3] = 'f' Traceback (most recent call last): File "", line 1, in ? TypeError: object does not support item assignment >>>
Lists, and other data structures can be used as return values from functions. They can also be used as arguments. Mutable data structures can be changed by the function.
names.dat
Nicholas, Charles Finin, Tim desJardins, Marie Evans, Sue
Example output:
1 DESJARDINS, MARIE 2 EVANS, SUE 3 FININ, TIM 4 NICHOLAS, CHARLES
How about a top-down design.
import string # printGreeting() explains the program to the user # Inputs: None # Output: None def printGreeting(): print "This program reads in a list of names from a file," print "capitalizes them, sorts them alphabetically, and prints" print "them out in a numbered table." # readNames() reads in the names from a file # and puts them into a list called names # Input: filename, the name of the file to open # Output: names, the list of names def readNames(filename): names = [] file = open(filename, 'r') for line in file: name = line.strip() names.append(name) file.close() return names # capitalizeNames() capitalizes names (changes the list in-place) # Inputs: the list of names # Outputs: None, but the list was modified by this function def capitalizeNames(names): for i in range(len(names)): names[i] = names[i].upper() # printTable() prints the table of numbered names # one per line def printTable(names): length = len(names) for i in range(length): print (i + 1), "\t", names[i] def main(): printGreeting() # get filename from user filename = raw_input("What's the name of the file ? ") # process the names names = readNames(filename) capitalizeNames(names) names.sort() # print the table printTable(names) main()
Definition:
Sets are an unordered collection of items, where duplicates are not allowed.
Items must be immutable (such as numbers or strings).
Operations:
>>> empty = set()
>>> cards = set(["jack", "queen", "king"])
Operators
>>> cards = set(['jack', 'queen', 'king']) >>> cards set(['king', 'queen', 'jack']) >>> 'queen' in cards True
>>> cards2 = set(['queen', 'jack', 'king']) >>> cards2 set(['king', 'queen', 'jack']) >>> cards == cards2 True
Methods
>>> cards set(['king', 'queen', 'jack']) >>> cards.add('ace') >>> cards set(['king', 'queen', 'jack', 'ace']) >>>
>>> cards set(['king', 'queen', 'jack', 'ace']) >>> cards2 = cards.copy() >>> cards2 set(['king', 'queen', 'jack', 'ace']) >>>
>>> cards2 set(['king', 'queen', 'jack', 'ace']) >>> cards2.remove('king') >>> cards2 set(['queen', 'jack', 'ace']) >>>
>>> cards set(['king', 'queen', 'jack', 'ace']) >>> cards2 set(['queen', 'jack', 'ace']) >>> cards.difference(cards2) set(['king']) >>>
>>> cards set(['king', 'queen', 'jack', 'ace']) >>> cards2 set(['queen', 'jack', 'ace']) >>> cards.intersection(cards2) set(['queen', 'jack', 'ace']) >>>
>>> cards set(['king', 'queen', 'jack', 'ace']) >>> cards3 set(['trey', 'deuce']) >>> cards3.add('ace') >>> cards3 set(['trey', 'deuce', 'ace']) >>> allcards = cards.union(cards3) >>> allcards set(['trey', 'king', 'deuce', 'jack', 'ace', 'queen']) >>>
>>> allcards set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce']) >>> cards3 set(['trey', 'deuce', 'ace']) >>> cards3.issubset(allcards) True >>> allcards.issubset(cards3) False >>>
>>> allcards set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce']) >>> cards3 set(['trey', 'deuce', 'ace']) >>> allcards.issuperset(cards3) True >>> cards3.issuperset(allcards) False >>>
>>> cards3 set(['trey', 'deuce', 'ace']) >>> cards3.clear() >>> cards3 set([]) >>>
Built-in functions that operate on sets
>>> allcards set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce']) >>> len(allcards) 6 >>> allcards.clear() >>> len(allcards) 0 >>>
When would you use a set?
When order doesn't matter and you don't want to allow duplicates.
names = set() name = raw_input("Enter a name (or . to quit): ") while name != ".": names.add(name) name = raw_input("Enter a name (or . to quit): ") for name in names: print name
linuxserver1.cs.umbc.edu[128] python nameset.py Enter a name (or . to quit): sue Enter a name (or . to quit): ben Enter a name (or . to quit): dan Enter a name (or . to quit): dawn Enter a name (or . to quit): tess Enter a name (or . to quit): dan Enter a name (or . to quit): dawn Enter a name (or . to quit): . sue ben dan dawn tess linuxserver1.cs.umbc.edu[129]
import string # partAB() prints part A or part B courses the student still needs to take # Inputs: the set of courses needed for this part, needed # the set of courses taken, taken # the part being processed, part # Output: None def partAB(needed, taken, part): print if needed == taken: print 'You have satisfied the Part', part, 'requirements.' else: remaining = needed.difference(taken) print 'Part', part, 'Requirements:' for course in remaining: print "You need to take", course def main(): # set up course sets requiredSet = set(['CMSC 201', 'CMSC 202', 'CMSC 203', 'CMSC 304']) more = set(['CMSC 313', 'CMSC 331', 'CMSC 341', 'CMSC 345', 'CMSC 411']) more2 = set(['CMSC 421', 'CMSC 441']) requiredSet = requiredSet.union(more) requiredSet = requiredSet.union(more2) requiredMath = set(['MATH 151', 'MATH 152', 'MATH 221']) # start userClasses as the empty set userClasses = set() # get the classes the user has taken putting them into a set print "You should have a file that contains the names of the courses" print "you've taken toward the CS major. There should be one course" print "per line." filename = raw_input("Please enter the name of this file : ") file = open(filename, 'r') for line in file: course = line.strip() individual = set() individual.add(course) # removed all the code that counted classes and credits # all used if individual.issubset(some set of courses): userClasses = userClasses.union(individual) file.close() # call functions to produce output for each part partAB(requiredSet, userClasses, 'A') partAB(requiredMath, userClasses, 'B') main()
CMSC 201 MATH 151 CMSC 202 CMSC 203 MATH 152
You should have a file that contains the names of the courses you've taken toward the CS major. There should be one course per line Please enter the name of this file : courses.txt Part A Requirements: You need to take CMSC 411 You need to take CMSC 313 You need to take CMSC 421 You need to take CMSC 341 You need to take CMSC 345 You need to take CMSC 331 You need to take CMSC 441 You need to take CMSC 304 Part B Requirements: You need to take MATH 221
Oh, no!
We have the right answer, but our list of courses is out of order!
So after trying 2 different data structures, strings and sets for HW4, what data structure do you think would be the best for this program ?
Definition:
Associative arrays are tuples of key-value pairs. The keys must be immutable (such as numbers or strings). Values can be any type. They associate one value with each key. Keys aren't in any particular order. Associative arrays are implemented with the Python dictionary type.
Operations:
Associative arrays are built into many languages, but they are called many different things:
Creating new dictionaries:
# use the constructor to make an empty dictionary empty = dict() # use the empty dictionary literal empty = {} # use the constructor with some initial pairings # in this case the keys are strings numbers = dict(one=1, two=2) # use the dictionary literal with some initial pairings capitals = {'MD':'Annapolis', 'ME':'Augusta'}
Operators:
Methods:
Built-in functions that operate on dictionaries:
Operators:
>>> capitals = {'MD':'Annapolis', 'ME':'Augusta', 'MI':'Lansing'} >>> capitals {'ME': 'Augusta', 'MD': 'Annapolis', 'MI': 'Lansing'} >>> capitals['MD'] 'Annapolis' >>> capitals['MI'] 'Lansing'
>>> 'ME' in capitals True >>> 'VA' in capitals False
Methods:
copy() - create a copy of the dictionary
>>> cap2 = capitals.copy() >>> cap2 {'ME': 'Augusta', 'MD': 'Annapolis', 'MI': 'Lansing'}
clear() - remove all entries from the dictionary
>>> cap2.clear() >>> cap2 {}
get(key) - find the value associated with a key
>>> cap = capitals.get('MD') >>> cap 'Annapolis' >>> cap = capitals.get('VA') >>> cap >>>
get(key, default) - use a default value when a key isn't found
>>> cap = capitals.get('MD', 'unknown') >>> cap 'Annapolis' >>> cap = capitals.get('VA', 'unknown') >>> cap 'unknown'
has_key(key) - another way to test for key containment
>>> capitals.has_key('ME') True >>> capitals.has_key('ND') False
keys() - create a list of the keys in the dictionary
>>> capitals.keys() ['ME', 'MD', 'MI']
values() - create a list of the values in the dictionary
>>> capitals.values() ['Augusta', 'Annapolis', 'Lansing']
Built-in functions that operate on dictionaries:
len(dict) - count # of keys in the dictionary
>>> len(capitals) 3
del(dict[key]) - remove a key (and its value) from the dictionary
>>> del(capitals['ME']) >>> capitals {'MD': 'Annapolis', 'MI': 'Lansing'} >>>
Why use an associative array?
When you want to associate keys with values. Keys don't have to be numbers, or if they are, they don't have to be contiguous and starting from zero (like in a list).
import random counts = {} for i in range(1000): # simulates rolling 2 dice a = random.randrange(1, 7) b = random.randrange(1, 7) sum = a + b # gets the old count for that roll or # 0 if that sum hasn't been rolled yet count = counts.get(sum, 0) # writes the new count to an existing entry or # creates a new dictionary entry with a key of # sum and a count of 1 counts[sum] = count + 1 # sort the keys, sums is a List of keys sums = counts.keys() sums.sort() # print table print "Sum\tOccurences" print "-" * 18 for sum in sums: print sum, "\t", counts[sum]
ite207-pc-01.cs.umbc.edu[126] python dice.py Sum Occurences ------------------ 2 38 3 60 4 92 5 104 6 141 7 169 8 132 9 120 10 68 11 43 12 33 ite207-pc-01.cs.umbc.edu[127]
Here's the data file
Plymouth 02360 Bar_Harbor 04609 Mystic 06355 Cape_May 08204 Hyde_Park 12538 Niagara_Falls 14301 Paradise 17562 Rehoboth_Beach 19971 D.C. 20001 UMBC 21250 Key_West 33040 Nashville 37201 Custer 57730 West_Yellowstone 59758 Whitefish 59901 Chicago 60610 St._Louis 63121 Kansas_City 66101 Baton_Rouge 70801 Austin 78701 Boulder 80301 Cody 82414 Ketchum 83340 Moab 84532 Tempe 85281 Taos 87571 Las_Vegas 89110 Big_Sur 93920 Carmel 93923 Monterey 93940 San_Francisco 94110
import string zipcodes = dict() filename = raw_input('Enter the filename of the zipcode file : ') file = open(filename, 'r') for line in file: string.strip(line) place, zip = string.split(line) zipcodes[place] = zip file.close() place = raw_input('Which location (Q to quit) ? ') while place != 'Q': zip = zipcodes.get(place, 'unknown') print zip place = raw_input('Which location ? ')
Let's try it out!
linuxserver1.cs.umbc.edu[149] python zipcodes.py Enter the filename of the zipcode file : zipcodes.txt Which location (Q to quit) ? Plymouth 02360 Which location ? Carmel 93923 Which location ? UMBC 21250 Which location ? Oshkosh unknown Which location ? Niagara_Falls 14301 Which location ? Q linuxserver1.cs.umbc.edu[150]
What data structures would you use for these programs?