Programming for Performance or Simplicity
March 11th, 2008I was recently writing some code to be run from a command line that recursively walks through a network folder and all sub folders compiling a list of all the files, their last modified date and their size.
Since writing VBScript is quick and dirty I tend to start with that and in most cases it works just fine. However when speed really matter it’s just not up to the job.
The network directory contains about 35,000 files in 3,200 folders consuming 45.4GB of disk space. The script should simply gather the data required and write it to a flat text file (delimited with semi-colons). When I ran the VBScript it took 306 seconds (5 minutes and 6 seconds) to gather the data.
Not too bad but not fast enough. So I decided to write the same script both in Python and C++. Python is a very high level scripting language, so nowhere near as fast as a complied language like C++, but I wanted to see how it compared to VBScript. Here’s the results for the 3 programs.
| Language | Files indexed | Time taken (seconds) | Files per Second |
|---|---|---|---|
| VBScript | 34999 | 306 | 114.4 |
| Python | 35018 | 68.5 | 511.2 |
| C++ | 35017 | 13.9 | 2519.2 |
Of course, as expected, for real performance C++ outstrips the scripting languages by a mile but Python also fairs pretty well.
When I also consider how simple the code is and how long it took to develop I think Python is the clear winner unless every ounce of performance is vital. Developing real world programs in Python is extremely easy and the code is fast.
I use another short script in Python to parse Apache log files and it amazes me how fast it runs. The current log is over 122,000 lines (27.1MB of data) long but the script parses the file and outputs the results to a text file in just over a second! Very impressive for an interpreted language.