So we have scripts which run just fine as long as the information we give them is what they expect. Errors appear when the script/program encounters conditions it does not know how to handle. This is not as scary as it sounds and it is very nice to have scripts which you can feel confident are not going to “crash” as soon as someone else tries to use them.
Running your script on your own system may be working perfectly but is it also going to work on another system?
A brief list of what can commonly go wrong:
As you create your script, it is very common to have syntax errors, that is, typos, incorrectly named methods or passing in the wrong number of arguments to a method. Catching these early is easier with an editor like Spyder.
if (1 == 1)
print("This is a syntax error")
You should see an output showing that something is missing (what should this be?):
if (1 == 1)
^
SyntaxError: invalid syntax
Syntax errors will prevent the code from running in the first place. Errors which only appear when your program is running are known as “Runtime Errors”. We will now see how to deal with this.
Fortunately, we can catch these errors and then do something about them before they abruptly end our program. The structure is a “try-except-finally” block of code which is wrapped around your code.
try:
prnt("This is a name error") #misspelt print so no method found
except NameError as e:
print("Caught this error:", e)
finally:
print("I'm always going to run this bit")
The
finally
section is optional
When running in a loop and the error is minor, such as the first line in a text file is blank and you just want to skip it, use continue
in the except
block to tell the program to go to the next instruction.
When running in a loop and the error is major and you cannot continue without the information:
break
and the loop will endpass
and the program will pass to the next instructionsys.exit()
exit()
quit()
and os._exit(0)
will kill the Python interpreter so avoid if possible or at least use with cautionSo, we will see how to put this in a script which reads the gapminder data as we have done in previous lessons. Files can be missing or not accessible so we will include error catching to notify us of these problems.
Previously:
import pandas
data = pandas.read_csv('data/gapminder_gdp_oceania.csv')
print(data)
sep
from a library called os
which separates filenames according to operating system so this code can be run on Mac, Unix or Windowsimport pandas
from os import sep
filename = datadir + sep + fname
data = pandas.read_csv(filename)
print(data)
try:
blockimport pandas
from os import sep
inputdir = "data"
fname = "gapminder_gdp_oceania.csv"
try:
filename = inputdir + sep + fname
data = pandas.read_csv(filename)
if (not data.empty):
print(data.iloc[0, 0])
except OSError as e:
print("ERROR: Unable to find or access file:", e)
pass
Often, you will be using a Python script to read files in a directory.
inputdir = "data"
try:
for fname in listdir(inputdir):
print(fname)
filename = inputdir + sep + fname
data = pandas.read_csv(filename)
if (not data.empty):
print(data.iloc[0, 0])
except OSError as e:
print("ERROR: Unable to find or access file:", e)
pass
Finally, we will allow the directory to be loaded from outside the script.
if __name__ == '__main__':
so we can run from the terminal consoleimport pandas
import argparse
from os import sep, listdir
if __name__ == '__main__':
parser = argparse.ArgumentParser(prog='Read CSV Files',
description='''Reads a directory and extracts first cell from each file''')
parser.add_argument('--filedir', action='store', help='Directory containing files', default="data")
args = parser.parse_args()
inputdir = args.filedir
try:
for fname in listdir(inputdir):
print(fname)
filename = inputdir + sep + fname
data = pandas.read_csv(filename)
if (not data.empty):
print(data.iloc[0, 0])
except OSError as e:
print("ERROR: Unable to find or access file:", e)
pass
Take this one step further and put your code into methods.
Combine all our csv files into an excel file.
TIPS
glob
library is very good at filtering files from a directory based on their filenames. For example, selecting all CSV files, or all files with numbers in the filename.
files = glob.glob('data/*.csv')
- the data directory can be replaced by an argumentto_excel
and you will need a special file writer called ExcelWriter
from the pandas library
writer = pandas.ExcelWriter(outputfile, engine='xlsxwriter')
- sets up the file writer(fsheet, _) = splitext(basename(f2))
- extract just the filename from the file using basename()
from os.path
librarydata.to_excel(writer, sheet_name=fsheet)
- write the data to the writer with an Excel sheet named as the filenameimport argparse
import glob
from os import sep, listdir, R_OK, access
from os.path import join, basename, splitext
import pandas
if __name__ == "__main__":
parser = argparse.ArgumentParser(prog='Combine CSV files to an Excel file',
description='''\
Reads a directory and combines csv files into an excel file
''')
parser.add_argument('--filedir', action='store', help='Directory containing files', default="data")
parser.add_argument('--output', action='store', help='Output file name', default="output.xlsx")
args = parser.parse_args()
inputdir = args.filedir
outputfile = args.filedir + sep + args.output
print("Input directory:", inputdir)
if access(inputdir, R_OK):
seriespattern = '*.csv'
writer = pandas.ExcelWriter(outputfile, engine='xlsxwriter')
try:
files = glob.glob(join(inputdir, seriespattern))
print("Files:", len(files))
for f2 in files:
print(f2)
(fsheet, _) = splitext(basename(f2))
data = pandas.read_csv(f2)
if (not data.empty):
data.to_excel(writer, sheet_name=fsheet)
print("Files combined to: ", outputfile)
except ValueError as e:
print("ERROR: ", e)
except OSError as e:
print("ERROR: Unable to find or access file:", e)
pass
finally:
writer.save()
writer.close()
else:
print("Cannot access directory: ", inputdir)