How treecomp.py works

python's dircmp class

treecomp.py uses the python standard library filecmp, dedicated to the purpose of comparing files and directories. Part of filecmp is a class named dircmp for comparing two directory trees. In filecmp jargon, these two directory trees are named

  • left (corresponds to our "new")
  • right (corresponds to our "old")

filecmp.dircmp is discussed here: http://www.python.org/doc/2.5/lib/dircmp-objects.html

treecomp.py (or better: treecomplib.py, see below)
uses the following methods:

  • right_only – gives you all file names that are present in the old directory. Used for detecting the violation of the assumption that the new tree shall be a superset of the old tree, in terms of filenames (see treecomp.py overview). If there are any entries in this list, an exception is thrown to signal the violation.
  • left_only – gives you all file names that are present in the new directory. Used for finding the files denoted by "xtra:" in the output.
  • diff_files – gives you all file names that are different. Used for finding the files denoted by "diff:" in the output.

The actual work of comparing is encoded in treecomplib.py, this file also contains the unit-tests and can be run stand-alone, i.e. running treecomplib.py without parameters will run the unit-tests:

C:\Development\Remotion-Contrib\PhoneBook\trunk\tools>treecomplib.py
....
----------------------------------------------------------------------
Ran 4 tests in 0.009s

OK

The actual tool treecomp.py is more or less a simple report-generator for printing the batch file with copy commands to stdout.

CmpWalkerCore

CmpWalkerCore walks the directories in the tree recursively and stores what it finds along the way in a parameter named accu (this is how Lisp-people learn to do such things in recursions (: ).

CmpWalkerCore creates a dircmp object instance, essentially a "DOM tree" of the directory trees inspected. It is most certainly possible to call filecmp.dircmp just once and walk the resulting tree recursively, but we found that this is more work than recursing thru the subdirectories and calling filecmp.dircmp for each sub-directory anew.

CmpWalker is a wrapper function for hiding the accu.

Ignore list

What's cool about dircmp is the optional ignorelist-parameter: we use it to filter out .svn directories. These are not compared. The dircmp documentation (http://www.python.org/doc/2.5/lib/dircmp-objects.html) explicitly lists "RCS", "CVS" as examples for such filtering.
Batteries included!

Unit-tests

Unit-tests basically perform the toy exercises discussed in using treecomp.py programmatically and check the results.