Lecture Notes
Introduction to Programming and Algorithm Design (COP-1000)

Strings

This chapter discusses how to work with the data type string, and how to use functions available in the string module.


A string is a sequence

  • This section introduces the compound data type string (abbreviated str in Python)
  • A compound data type is a data type in which the values are made up of components, or elements, which are themselves values
  • Variables of type str hold a sequence of characters (text)
  • String literals are enclosed in matching single or double quotes:
>>> a = 'Steve'
>>> type (a)
<type 'str'>
>>> b = "Smith"
>>> print a, b
Steve Smith
  • When obtaining a value for a string variable from the keyboard, use the raw_input function:
>>> c = raw_input('Enter a string: ')
Enter a string: 2 eggs + ham
>>> print c
2 eggs + ham
  • Note: input evaluates the expression entered, raw_input does not:
>>> d = input("Enter an expression: ")
Enter an expression: 2 + 3
>>> print d
5
>>> d = raw_input("Enter an expression: ")
Enter an expression: 2 + 3
>>> print d
2 + 3
  • In Python programming, you can access the entire string, a single character of the string (indexing) or a substring of the string any contiguous sequence of characters in the string (slicing)
  • Each character of the string is assigned a number its index starting with 0 on the left (positive indexes), and starting with -1 on the right (negative indexes); for example, here is the string 'Green eggs':
String: G r e e n   e g g s
+Index 0 1 2 3 4 5 6 7 8 9
-Index -10 -9 -8 -7 -6 -5 -4 -3 -2 -1
  • To extract a single character from a string (indexing), place its index in square brackets following the string name:
string_name[expression]
>>> d = 'Green eggs'
>>> print d[0], d[4], d[-1]
G n s
  • Strings can be joined (concatenated) with the + operator:
>>> d = 'Green eggs'
>>> print d + ' and ham'
Green eggs and ham
  • Strings can be repeated with the * operator:
>>> e = 'elephant' * 3
>>> print e
elephantelephantelephant

Length

  • The length of a string can be obtained with the built in len function:
>>> d = 'Green eggs'
>>> e = ' and ham'
>>> f = d + e
>>> print len(d), len(e), len(f), len(d + e)
10 8 18 18
  • Note that the length of a string is one greater than the highest index, since indexes start with 0; another way to look at it is the highest index of a string s is len(s) - 1
  • Two ways to access the last (right-most) character of a string:
     
       print someString[-1]
       print someString[len(someString) - 1]

Traversal and the for loop

  • Many computations involve processing a string one character at a time
  • To process a string at the beginning and continue with each element through to the end is called a traversal (from "traverse", as in "cross over")
  • We can traverse a string with a while loop: see example program twhile.py
  • Python provides for traversing strings with a for loop as well:

        for var in string:
            statement


    The variable var is assigned the value of the first charcter in string and the body of the loop (statement) executed; then var is assigned the value of the next charcter in string, and so on, until all characters have been processed (i.e., the string has been traversed)
  • See example program tfor.py, that uses a for loop to traverse a string

String slices

  • A substring is any contiguous sequence of characters of a string
  • In Python, a substring is called a slice
  • We specify a slice by using this notation:

        string[ start_index : up_to_index ]

    The slice begins with the character at start_index and ends with the character right before up_to_index; if start_index is omitted, the slice starts at the beginning of the string; if up_to_index is omitted, the slice goes to the end of the string  
  • Here are some examples:

>>> d = 'Green eggs'
>>> print d[1:3], d[:3], d[6:]
re Gre eggs


Strings are immutable

  • We can access each character or substring of a string by indexing and slicing, but we can not modify them; strings are immutable
  • You can create a new string from pieces of strings:

>>> s = "Hello"
>>> print s[0]       # accessing is ok
H
>>> s[0] = "J"       # modifying is not

Traceback (most recent call last):
File "<pyshell#9>", line 1, in -toplevel-
s[0] = "J"
TypeError: object does not support item assignment


>>> newString = "J" + s[1:]
>>> print newString
Jello


A find function

  • This section describes how to construct a function find() that accepts as parameters a string and a character; it will search the string for the character, and, if found, returns the index at which the character occurs
  • See the program findch.py that implements this function
    • Note the return statement inside the loop that exits the function (if the character is found) a "Eureka!" traversal
    • Final return statement is executed only if traversal is completed without finding the character in the string
  • Example program findch2.py starts the search at a specified index, not just 0, which makes it more general, allowing the function to be used to find multiple occurrences

Looping and counting

  • Traversals are often used to count the number of occurrences of something
  • Any time counting is to be implemented you need:
    • An integer variable that will serve as the counter
    • An initialization to set the counter to a starting value (usually 0)
    • An incrementing statement that will add 1 to the counter when the condition of interest occurs
  • See the example program countch.py that demonstrates the use of a counter

string methods

  • A string is really an object it has methods associated with it than can be used to manipulate the string
  • A method is similar to a function, but it uses different syntax:
     
       new_word = upper(word)         # function syntax
       new_word = word.upper()        # method syntax
     
  • We call a function; we invoke a method
  • Here are some of the most useful string methods:
Method ( s.x() ) Meaning
capitalize() Copy of s with only the first character capitalized
center(w) Center s in a field w characters wide
count(sub) Count the number of occurrences of substring sub in s
find(sub) Find the first position where sub occurs in s
ljust(w) Like center, but s is left-justified
lower() Copy of s with all lowercase characters
lstrip() Copy of s with leading whitespace removed
replace(old,new) Replace all occurrences of 'old' in s with 'new'
rfind(sub) Like find but returns the rightmost position of sub
rjust(w) Like center, but s is right-justified
rstrip() Copy of s with trailing whitespace removed
upper() Copy of s with all uppercase characters
  • Some examples:

>>> s = "watermelon"
>>> print s.upper()
WATERMELON
>>> print s.find('e')
3
>>> print s.find('el')
6
>>> print s.replace("water", "honeydew ")
honeydew melon

  • You can get help for all built-in str methods in the Python shell with help(str), or for any individual method:

  >>> help(str.find)
  Help on method_descriptor:

  find(...)
  S.find(sub [,start [,end]]) -> int

  Return the lowest index in S where substring sub is found,
  such that sub is contained within s[start:end]. Optional
  arguments start and end are interpreted as in slice
  notation.

  Return -1 on failure.


The in operator and character classification

  • The in operator is a boolean operator that takes two strings and returns True if the first appears as a substring in the second, or False otherwise:

>>> "i" in "Team"
False
>>> "excel" in "excellent"
True

  • See the example program in_both.py
  • The logical operator not is often combined with in:

>>> email = "biff<at>aol.com"
>>> if "@" not in email:
        print "invalid email"

invalid email


String comparison

  • Strings can be compared with the usual equality and relational operators
  • Strings are compared lexicographically; that is, character by character, from left-to-right, comparing each character's numeric ASCII code
  • A character's ASCII code (in decimal) can be determined using the built-in ord() function:

>>> for ch in "Biff":
        print ord(ch),

66 105 102 102
>>> for ch in "Bill":
        print ord(ch),

66 105 108 108
>>> print "Biff" < "Bill"
True

  • A complete ASCII table is available here

The string formatting operator (%)

  • Strings can be formatted using the % operator, with this syntax:
     
       format_string % (value_to_be_formatted)
     
  • The result can be assigned to a string or printed
  • Detail is available here
  • Here is an example of formatting a floating point value to be displayed right-justified in a 15 character wide field, rounded to two decimal places:
     
       >>> fp = 13.12876
       >>> print "Value is: %15.2f" % (fp)
       Value is:           13.13
 Updated: 12.13.2010