Lecture Notes
Introduction to Programming and Algorithm Design (COP-1000)

File processing and exceptions

This chapter introduces two important subjects: files external, non-volatile storage for programs and persistent data, and exceptions errors that occur during execution of a program (at runtime).

The notes below do not exactly follow the text, which includes some topics (like variable formatting and "pickling" that we will bypass for now. You can replace the text (Chapter 14) with these notes.


Persistent data: Files and the operating system

  • A file is a container for a stream of data that is stored in secondary memory, usually a disk drive
  • Files are used to hold programs and  persistent data data that is retained after the program that created it is terminated
  • Every operating system (OS) has a file manager that maintains the computer's file system, and provides access to the files
  • The OS maintains a directory of all files, with the file's name, size and other information about the file:
C:\TEMP>dir /-n
 Volume in drive C is Bilbo
 Volume Serial Number is 0C1D-4158

 Directory of C:\TEMP

.            <DIR>              11/15/2004  09:06 AM
..           <DIR>              11/15/2004  09:06 AM
bp       xls             14,336 05/02/2003  05:44 AM
chrtoInt c                  245 05/02/2004  02:18 PM
chrtoInt exe             14,510 11/15/2004  09:06 AM
coin     c                1,436 09/30/2004  07:40 AM
counter  exe                 78 02/14/2004  09:18 AM
crefcard pdf            106,208 09/21/2004  08:08 PM
faq      htm              2,516 09/26/2003  07:56 PM
fileio   cpp              1,647 04/18/2004  10:37 AM
kona     py                 411 11/15/2004  06:08 AM
mailbox  gif              2,967 02/14/2004  09:18 AM
steinitz jpg             13,150 02/14/2004  09:18 AM
              11 File(s)        157,504 bytes
               2 Dir(s)  72,341,573,632 bytes free
  •  The OS locates a file on disk by its physical file name; in DOS/Windows systems, this consists of a drive, path, filename, and extension:
C:\TEMP\kona.py
  • Most programs do not access files directly; instead, they request file services from the OS's file manager:


Text files

  • Files consist of a stream of bytes, which can be coded to represent any type of data numbers, text, pictures, or sound
  • A text file is one that contains just text; each byte in the file is interpreted as the ASCII code of a character
  • Text files are organized as lines of characters delimited by the newline character, written as ('\n'), whose ASCII decimal code is 10:
>>> for ch in 'Hello\n':
        print ord(ch),

72 101 108 108 111 10
  • Each line of a text file can be handled as a string
  • Text files are very flexible, because it is easy to convert between strings and other data types, and because there are many functions available for string processing
  • Text files are accessed sequentially, from beginning to end; files can also be accessed randomly, reading from and writing to a file at any point
  • Before a file can be used by a program, it must be opened this is really a request to the OS to find a specific file in secondary storage
  • If the file is found (or is created) the OS notifies the user program the file is ready for processing
  • Some files are only used for input, to obtain data to be processed by the program; some files may be used just to hold program output; some files need to be kept current by updating adding, deleting, or changing data in the file
  • Text files (and sequential files in general) are not updated directly in secondary storage; rather, they are retrieved into main memory, updated there, and then rewritten to secondary storage:

  • If updated files are written back to secondary memory with the same physical file name, the updated file replaces (overwrites) the original file; or, the updated file can be written to secondary memory with a new file name, in which case both the old and new (updated) file will exist

  • When file processing is complete, the file is closed by notifying the OS that the file is no longer needed, to free up system resources for other uses


File Processing in Python

  •  In Python, a file is opened by creating a file object:
file_ref = open(physical_file_name, mode)
  • The syntax is as follows:
    • file_ref is the object reference assigned to the file object that will be used in the program; it is also called the logical file name
    • physical_file_name is a string that the OS will use to locate the fil
    •  mode is 'r' (open the file for reading, the default), 'w' (writing), 'a' (appending) or 'r+' (open an existing file for both reading and writing)
  • For example:
outFile = open('c:\\temp\\somefile.txt', 'w')
  • Here the file object (logical file name) is outFile, and it has been opened for output (for writing); note that the string containing the physical file name uses two backslash characters (\\) as delimiters a single backslash would be interpreted as the beginning of a control character, like \n or \t
  • Once the file object has been created, operations with the file in Python are performed by using the file object's methods functions built into the object itself. Here are the simplified forms of some of the most useful.
Method Description
read() Read at most size bytes from the file or, if size is omitted, read all data until EOF. The bytes are returned as a string object. An empty string is returned when EOF is encountered immediately.
readline() Read one entire line from the file. A trailing newline character is kept in the string. An empty string is returned only when EOF is encountered immediately.
readlines() Read until EOF using readline() and return a list containing the lines thus read.
write(string) Write the string argument to the file.
writelines(list) Write a list of strings to the file.
close() Close the file. A closed file cannot be read or written any more.

  • To use these methods, the dot operator (.) is needed to reference the file object on which the method is being invoked:
file_ref.method(argument)
  • For example, to write the string "Hello\n" to the (already opened) file outfile, you would use:
outfile.write("Hello\n")
  • Example (createfile.py). Creates a text file using the write() method.
  • Example (createfile2.py). Same as the above program, except the user is prompted for the location to save the file.
  • Example (readfile.py). Reads the text file written above, inputting the entire file into a string with the read() method, then displays the string.
  • Example (readfile2.py): Reads the text file one line at a time with the readline() method, displaying each line as it is input.
  • Example (readfile3.py): Reads the text file, inputting each line into a list of strings, then displays each line by iterating through the list.
  • Example (readfile4.py): Reads the text file by traversing it, one line at a time.

I/O errors in file processing exceptions

  • Programs that process files have an increased likelihood of exceptions runtime errors that occur because of problems in the environment in which the program is run
  • Typical exceptions errors are those that occur when an attempt is made to open a file for input, but the file does not exist (at least by the physical file name specified); other types of runtime errors may occur because the disk being written to is full, or has been removed (if a removable media type, like a floppy or CD)
  • Provisions can be made to "trap" and handle some of these common error rather than let the program simply terminate abnormally
  • Python includes an exception handling facility to allow runtime errors to be handled, including those resulting from file I/O problems; the sytax is:
try:
    stmts that might generate exception

except [type of exception [,message]]:
    stmts to handle exception
[else:
    stmts if no exception occurs]
  • If an exception occurs in the try block, Python will raise an exception; if the type of exception raised is the type specified in the except block, the code there will be executed; if no exception occurs, the code in the optional else block is not executed, if it is present, otherwise control passes to the next statement
  • If the optional type of exception is not used, all types of exceptions are handled by the except block
  • When an I/O operation (such as a print statement, the built-in file function or a method of a file object) fails for an I/O-related reason (like "file not found'' or "disk full''), an IOError exception is raised, which can be then be trapped:
try:
    myfile = open("somefile", "r")

except IOError, msg:
    print "File could not be opened:", msg
    # something to quit gracefully

# if we get here, myfile has been opened
  • Example (ioexception.py): demonstrates how to interactively obtain the physical file name to open, and how to trap an IOError exception and in this design simply exit the program with a return
  • Example (ioexception2.py): a slightly more sophisticated version of the above allows the user to create a file if the one specified does not exist.

A File Example: Maintaining an email Directory (optional)

  • Program: email.py
  • This program uses a variety of string, list, and file processing techniques to create and maintain a directory of email addresses on disk
  • Constructed in a modular fashion, where each major operation (add, change, delete, and save) are implemented in their own functions, called from the main module in an "infinite" processing loop
  • Design features:
    • the user is prompted for the name of file that contains the email directory
    • the file specified is opened if it exists, and each line is loaded into a global list of strings, where each string is a 'record' (name and email address)
    • after the list is loaded, the file is closed; all updating to the list (add, change, delete records) is done in memory
    • the entire list is displayed after each operation (probably not a great design choice for large directories, but simplifies this demo program)
    • the user can save the updated list at any time (to the same file), or is prompted to do so upon exit (if the list has changed)
  • Some features of this program to note in each function
    • main()
      • the startup() function returns an empty string if it could not open the file specified, and the user did not want to create a new file; otherwise, startup() returns the physical file name that was opened and from which the items were read into the global list elist
      • the local variable chFlag is used to indicate if the list has or has not been changed since last saved, using the built-in Boolean values True and False
      • the main loop uses the while True: construct to implement a "loop-and-a-half"
      • the main loop calls showlist() to display the list, then displays the prompt and obtains the user's choice
      • the appropriate processing function is called in the if...elif decision structure; an invalid choice simply cycles the main loop for new input
    • startup()
      • after obtaining a string from the user representing a physical file name, an attempt is made to open the file for mode "r" (input), that will generate an IOError exception if the file is not found
      • the potential IOError exception is handled by giving the user the option to create the file; if the user declines, the function returns an empty string, indicating to main() to exit
      • if the file is opened, the lines in it are read into the global list elist, and then the file closed; a string containing the physical file name is returned to main(), indicating that startup() was successful
    • addrec()
      • add a record (a string) to elist
      • inputs string containing name and email address, separated by a comma
      • extracts both name and address using the string method split(), then reassembles them, stripping out any whitespace after the comma using the string method lstrip() and a newline character appended
      • the reformatted record is added to elist with the list's append() method, then sorted with sort()
    • chgrec()
      • changes the email address for a record (string) in elist; the name in the record cannot be changed
      • inputs the 'record number' of the record to delete, as shown in the directory display, and the new email address
      • the name in the record at index n - 1 (for 'record # n') is extracted, concatenated with a comma, the new address, and a newline character, and the formatted string assigned to (overwriting) the same list item
    • delrec()
      • deletes a record (string) from elist
      • inputs the 'record number' of the record to delete, as shown in the directory display
      • uses the del operation to remove the record from elist, which, for 'record # n' is at index n - 1
    • savelist()
      • saves elist to a file
      • receives the physical file name to open (the same one input in the startup()), and opens it for "w" (output), which will overwrite the existing file
      • the records (strings) in elist are written to the file with the file's writeline() method, and the file closed
    • showlist()
      • displays a directory of the records in elist
      • iterates through elist, extracting name and email address from each record using the split() method
      • record number, name, and address are displayed with format specifiers; the formatting string "%4d %-20.18s %s" specifies the display of the three values as 1) an integer, right-justified in a 4-character wide field; 2) 18 characters of a string, left-justified in a 18-character wide field; 3) an unformatted string
  • What enhancements could be implemented? How?
 Updated: 12.13.2010