How treecomp.py works
python's dircmp class
treecomp.py
uses the python standard library filecmp
, dedicated to the purpose of comparing files and directories. Part of filecmp
is a class named dircmp
for comparing two directory trees. In filecmp
jargon, these two directory trees are named
left
(corresponds to our "new")right
(corresponds to our "old")
filecmp.dircmp
is discussed here: http://www.python.org/doc/2.5/lib/dircmp-objects.html
treecomp.py
(or better: treecomplib.py
, see below)
uses the following methods:
right_only
– gives you all file names that are present in theold
directory. Used for detecting the violation of the assumption that thenew
tree shall be a superset of theold
tree, in terms of filenames (see treecomp.py overview). If there are any entries in this list, an exception is thrown to signal the violation.left_only
– gives you all file names that are present in thenew
directory. Used for finding the files denoted by "xtra:" in the output.diff_files
– gives you all file names that are different. Used for finding the files denoted by "diff:" in the output.
The actual work of comparing is encoded in treecomplib.py
, this file also contains the unit-tests and can be run stand-alone, i.e. running treecomplib.py
without parameters will run the unit-tests:
C:\Development\Remotion-Contrib\PhoneBook\trunk\tools>treecomplib.py .... ---------------------------------------------------------------------- Ran 4 tests in 0.009s OK
The actual tool treecomp.py
is more or less a simple report-generator for printing the batch file with copy commands to stdout.
CmpWalkerCore
CmpWalkerCore
walks the directories in the tree recursively and stores what it finds along the way in a parameter named accu
(this is how Lisp-people learn to do such things in recursions (: ).
CmpWalkerCore
creates a dircmp
object instance, essentially a "DOM tree" of the directory trees inspected. It is most certainly possible to call filecmp.dircmp
just once and walk the resulting tree recursively, but we found that this is more work than recursing thru the subdirectories and calling filecmp.dircmp
for each sub-directory anew.
CmpWalker
is a wrapper function for hiding the accu
.
Ignore list
What's cool about dircmp
is the optional ignorelist
-parameter: we use it to filter out .svn
directories. These are not compared. The dircmp
documentation (http://www.python.org/doc/2.5/lib/dircmp-objects.html) explicitly lists "RCS", "CVS" as examples for such filtering.
Batteries included!
Unit-tests
Unit-tests basically perform the toy exercises discussed in using treecomp.py programmatically and check the results.