PyPedal: Python Pedigree Analysis
Links
Sourceforge Page
Screenshots
My Python Blog
Manual [PDF, PS]
API [classes, demog, graphics, io, metric, newclasses, nrm, utils]
Other [AUTHORS.txt, CHANGES.txt, LICENSE.txt, OPTIONS.txt, PEDIGREE FORMAT CODES.txt]

Personal Website

Support This Project

News
07/18/2005: No new version yet. I am adding in setuptools support. This should (I hope) make it easier for people to get PyPedal installed and running on their systems. I also found a project called mfGraph which may allow me to package an interactive pedigree viewer with a reasonable amount of work, although I am not making any guarantees on that front. Somehow I managed to lose a bunch of changes made to pyp_metrics in response to little bugs exposed by the unit testing. I hope to get that sorted out tomorrow, finish a few more half-implemented features, and get back to work on the unit tests and the documentation for a release in the near future.< Also note that I have made this page much sorter; if you want to see older posts you will have to read the Full index/changelog./td>
 
06/20/2005: One day I need to clean this page up. You know, some sort of actual CMS. Anyway. No new release yet. Today I started working on a unit testing framework, and the unit test for a single routine (pyp_metrics/effective_founders_lacy()) found several small errors. Some of them are a direct result of the big push to convert everything to the new object model which was called done in 2.0.0a17 and required a little backing-off in that area. Specifically, pyp_nrm/fast_a_matrix() cannot be converted to the new model. It is used by several metrics which quietly pass subpedigrees, lists of NewAnimal() objects rather than actual NewPedigree() objects, causing breakage.
 
05/18/2005: Version 2.0.0a17 has been released. The use of strings for animal, sire, and dam IDs is now supported; see PEDIGREE FORMAT CODES for the appropriate format codes to use. This feature has been gently tested, but needs a good workout to catch any lingering bugs. Most of the work in this release went into cleaning-up code; all of the routines in PyPedal have now been updated to use the new-for-2.0.0 object model. I also seem to recall adding a new item or two to OPTIONS. Note that these changes have NOT been well-tested and I did a lot of the work this morning. Since I went to see "Revenge of the Sith" last night and only got four hours sleep there are almost certainly typos. Of course, the distribution RPMs won't build if there are obvious errors (screwed-up tabs, etc.) so maybe I got lucky. The big push for the next release will be some sort of testing framework, as well as adding logging functionality to most of the routines in PyPedal.
 
05/03/2005: Version 2.0.0a16 has been released. The big new feature in this release is a wrapper class, NewAMatrix for handling numerator relationship matrices; the object has save(), load(), and info() methods. Matrices are stored in a binary format as described in the Numarray documentation. This provides the user with an easy way to store NRM for later work, such as visualization or computation of CoI/CoR. I also added a new option, missing_parent, that can used to specify the value assigned to missing parents in the input pedigree file.

The other noteworthy bit of news is that I have [finally] started working on documentation again. The PDF and PS manuals posted over on the left are up-to-date, but they are not complete. As usuall, all API docs are up-to-date. Note that the typography of the manuals is kind of dodgy in places, which reflects in part the questionable LaTeX generated by html2latex (used to get the API docs into the manual). I will put updated HTML manuals up tomorrow after I install latex2html.

Please read the AUTHORS file over on the left sometime. It's where I give credit and thanks to the people who have helped make PyPedal useable. Any flaws or faults in the software are my own and do not reflect poorly on those who have helped me.
 
04/27/2005: Version 2.0.0a14 has been released. The changelog is now linked over on the left under Other. This is an important release as it deals with a potentially serious bug in pyp_utils/fast_reorder() procedure that affects pedigree in which (i) animals are not guaranteed to have larger IDs than their parents and (ii) pedigrees in which birth dates/years are either not provided or which contain errors. pyp_newclasses/NewPedigree.load() has been changed to use a newly-rewritten pyp_utils/reorder() procedure that is a fair bit slower than fast_reorder() but much faster than the previous version of reorder(). This default behavior can be overridden using an OPTION by users who are willing to risk it.
 
04/26/2005: CHANGELOG for PyPedal 2.0.0a13
  • In pyp_newclasses/NewPedigree.save() accepts an option, idformat, that specifies which animal, sire, and dam IDs are written. The 'o' (original) option writes a pedigree with the original IDs as read from the original input pedigree file. The 'r' (renumbered) option will write a pedigree file containing renumbered animal, sire, and dam IDs.
  • In pyp_newclasses/NewPedigree.save() accepts an option, outformat, that specifies how the saved pedigree is written. The 'o' (original) option writes a pedigree with the same pedformat as the original input pedigree file; this is useful if you have computed CoI, inferred sex, and that kind of thing. The 'l' (long) option will write a pedigree file containing all known fields in the animal object for which there is are pedigree format codes (see the file PEDIGREE_FORMAT_CODES).
  • In pyp_newclasses/NewPedigree.__init__() the default logfile name is now self.kw['filetag'].log.
  • Some changes were made to layout options in pyp_graphics/draw_pedigree(). Pedigrees are now drawn landscaped on US letter-sized pages (8.5 in x 11 in) and will, in theory, be tiled across pages if they cannot fit on a single page. This does not work as well as hoped, but I am working on it.
  • pyp_graphics/draw_pedigree() now takes an optional parameter, gdot, that tells draw_pedigree() whether or not write the raw (dot language) representation of the pedigree to a file. Code is written to a file named gfilename_pedigree.dot.
  • pyp_graphics/draw_pedigree() now takes an optional parameter, gsize, that tells draw_pedigree() whether or not write the raw (dot language) representation of the pedigree to a file.
  • pyp_graphics/draw_pedigree() now takes an optional parameter, gsize, that specifies the size of the resulting graphic: 'f' (default) produces as large a graph as necessary to accomodate the layout and 'l' produces a diagram scaled to fit on a letter-sized sheet of paper.
  • Added a new method, save(), to pyp_newclasses/NewPedigree(). This long-overdue feature lets you easily save a pedigree after, for example, computing CoI. It eliminates the need to perform time-consuming computations on pedigrees every time they are accessed by making it easy to store a "large format" PyPedal pedigree.
  • Fixed a bug in pyp_newclasses/NewPedigree.preprocess() in which records for sires and dams that appear in a pedigree, but which do not have individual entries in the pedigree file, were assigned birth years of 0 when dummy records were inserted into the pedigree. This was causing pyp_newclasses/NewAnimal.pad_id() to return a munged up paddedID that caused problems in pyp_utils/fast_reorder(). Tricky problem to find, that was.
  • Made a small change to pyp_newclasses/NewPedigree.preprocess() so that blank lines are caught and handled correctly. Before this fix a blank line with, say, an embedded TAB character would cause a fatal error b/c it was treated as a "regular" record.
 
04/15/2005: PyPedal 2.0.0a11 has been released. Since I have been working on the pyp_graphics module lately I posted a few screenshots. Note that the API documentation to the left has been updated; the classes module is deprecated in favor of the newclasses module. Changes are as follows:

CHANGELOG for PyPedal 2.0.0a11
  • I think that pyp_graphics/draw_pedigree() may be inserting a spurious node when drawing the pedigree, but I have not yet figured out where it is happening.
  • Removed references to "species" from pyp_newclasses/NewAnimal.printme() and pyp_newclasses/NewAnimal.stringme().
  • Tweaked pyp_newclasses/NewAnimal.pad_id() so that it casts values to INTs before concatenating them.
  • pyp_newclasses/NewPedigree.preprocess has been fixed to handle parents that do not have their own entry in the pedigree file. They are added to the pedigree with an unknown sire and dam.
  • Changed pyp_nrm/inbreeding() so that the output file written contains the original ID, the renumbered ID, and the CoI (in that order).
  • Added a dictionary, "backmap", to pyp_newclasses/NewPedigree that maps renumbered IDs (keys) to original IDs (values). It is the reverse direction of that provided by idmap.
  • Added pyp_graphics/plot_pct_founders_by_year() to plot the frequency of founders in each birth year. NOTE: This requires matplotlib. If matplotlib is not installed/cannot be imported, a value of 0 is returned.
  • Fixed pyp_graphics/draw_pedigree() so that it labels animals with their original IDs instead of their renumbered IDs.
  • Fixed pyp_graphics/draw_pedigree() so that it displays the gtitle.
  • Fixed a typo in pyp_newclasses/NewAnimal.__init__() that broke proper birthyear assignment.
  • Added pyp_graphics/plot_founders_by_year() to write a histogram of number-of-founders by year of birth. NOTE: This requires matplotlib. If matplotlib is not installed/cannot be imported, a value of 0 is returned.
  • Changed pyp_demog/BASE_DEMOGRAPHIC_YEAR from 1950 to 1900. This brings it in line with the default birthyear of 1900 used in pyp_newclasses.
  • Added pyp_demog/founders_by_year() which provides a dictionary, keyed by birthyear, of the number of founders with each birthyear.
 
04/14/2005: PyPedal 2.0.0a10 has been released. There are only very minor changes in the package: disabled I18N (gettext) in pyp_classes.py, a __version__.py file was added, and a small fix was made to the MANIFEST file used to roll the distributions. Thanks to Thomas von Hassel for reporting the gettext problem under FreeBSD. Only one (1) method in pyp_classes used it anyway, so I am going to let that lie until I have some more time to work on it.
 
03/30/2005: I uploaded PyPedal 2.0.0a9 today for your edification. Please note that updated documentation has not yet been posted. The CHANGELOG (see below) is posted on the SourceForge site and is included in the distribution; it is worth a read. A bunch of code has been refactored, and 2.0.0a10 will include a lot more code cleanups. If you are using PyPedal right now, please take a look at examples/new_lacy.py for an example of how to use the new object model. One interesting thing to note is that pyp_metrics/effective_founders_lacy() (a rewrite of pyp_metrics/a_effective_founders_lacy() to support large pedigrees and the new object model) is "smart" enough to accept and use NewPedigree objects, while pyp_metrics/a_effective_founders_lacy() knows nothing about the new classes and must be handed a pedigree-as-Python-list by hand. I am thinking about how to migrate a lot more of the old functions to the new way of doing things, and I expect that I will probably keep all of the current code in a working state and write parallel code to accomodate the Version 2 stuff. I expect that someone will eventually point out that I should consider using multimethods to hide that from the user. Hm. Maybe later.

CHANGELOG for PyPedal 2.0.0a9
  • pyp_io/pyp_file_header() and pyp_io/pyp_file_footer() now work.
  • Added pyp_metrics/effective_founders_lacy(), which is a re-write of pyp_metrics/a_effective_founders_lacy() that works with the new object model. Correctness was verified by comparing results against Table 3 in Lacy (1989) and Tables I and II in Boichard et al. (1997). You can use examples/new_lacy.py to verify the results.
  • Fixed a nasty bug in pyp_metrics/a_effective_ancestors_definite() that was due to an indentation screwup when moving from one editor to another. Correctness was verified by comparing results against Tables I and II in Boichard et al. (1997). You can use examples/new_lacy.py to verify the results.
  • Added pyp_utils/pyp_nice_time() which returns the current date and time as a nicely-formatted string.
  • Added pyp_metrics/descendants() and pyp_metrics/founder_descendants() to support the rewritten pyp_metrics/effective_founders_lacy() routine.
  • Added pyp_utils/assign_offspring(), which adds offspring of an animal to that animal's 'unks' list.
  • Stubbed pyp_io/pyp_file_header() and pyp_io/pyp_file_footer() in preparation for standardizing the output files written by PyPedal.
  • Added pyp_graphics module. It currently includes three functions from the ASPN Python Cookbook for visualizing the sparsity and the elements of matrices. I have also moved the draw_pedigree() function from pyp_utils to pyp_graphics. From now on, any functions related to visualization will go in pyp_graphics.
  • It looks like the sons and daus lists get screwed up when the pedigree is renumbered, but I think that it is a consequence of the item below.
  • When a pedigree that needs renumbering is read, pyp_utils/preprocess() throws an exception when trying to assign sex codes because it uses the sire's and dam's original IDs as keys. This represents fundamental breakage in the ordering of events in pedigree creation. I have sort-of hacked around this for the moment, but the bug is still there.
  • Added a new pedigree format code, asdgb, to pyp_utils/preprocess().
  • Added pyp_metrics/generation_lengths_all() which computes the average generation interval in years for each of the four selection paths (sire-son, sire-daughter, dam-son, and dam-daughter) for all births of a parent's offspring.
  • Added pyp_utils/assign_sexes() which iterates over a renumbered PyPedal pedigree to update sexes of sires and dams based on knowledge of their sons and daughters. This seems to catch cases that are missed in pyp_utils/preprocess(), which needs to e cleaned up.
  • Upon further examination, it seems like males and females are being correctly assigned. Hm...OK. Fixed a bug in pyp_utils/preprocess() that incorrectly assigned sires and dams with unknown parents to the sons and daus lists of the last animal in the pedigree. This was fixed by casting to an INT before a comparison with 0.
  • See examples/generations.py -- sons and daughters are not being correctly assigned to foo.sons and foo.daus.
  • Need to fix a bug in pyp_utils/new_preprocesss() in which unknown sires and dams (animals with IDs of 0) were being put into male, female, son, and daughter lists.
  • Fixed a bug in pyp_utils/preprocesss() in which unknown sires and dams (animals with IDs of 0) were being put into male, female, son, and daughter lists.
  • Added pyp_metrics/generation_lengths() which computes the average generation interval in years for each of the four selection paths (sire-son, sire-daughter, dam-son, and dam-daughter) for the oldest (first-born) of parents.
  • Added pyp_metrics/num_traced_gens(), pyp_metrics/num_equiv_gens(), and pyp_metrics/pyp_partial_inbreeding().
  • Lots of code cleanup in pyp_classes. Removed pad_id() and renamed pad_id_new() to pad_id().
  • Removed the originalID and species attributes from the Animal() class.
 
03/09/2005: A few of you may have figured out that I am not the fastest shoe in the closet when it comes to some software stuff. I have been trying to be a good programmer and use CVS to track the codebase. Well, no more. When I opened pyp_newclasses.py this afternoon for a little bit of light hacking (after refreshing my local tree from the CVS server) I found all kinds of diff-looking stuff in the file. I have had all I can stand of CVS, thank you very much. I will stick with the boring but tried-and-true method of syncing my tree between machines by way of my trusty USB stick. I think that perhaps the complexity of CVS, with which I am not comfortable, is just not worthwhile on a relatively small project such as PyPedal.
 
02/25/2005: Made a commit to the CVS tree. Migration to the new class system as outlined in pyp_newclasses.py is in full swing. More later.
 
12/01/2004: I suppose that this is my periodic reminder that PyPedal is not dead and it is not abandonware. However, I have not had any time lately to work on the project. I did a little profiling this morning and discovered that PyPedal requires a minimum of 717 bytes per animal record in a pedigree (!!!). This is not really a big deal for a few hundred animals, but it is a big deal for 100,000 animals. Back at the end of August I decided to add a bunch of new things, such as refactored classes and I18N support. I think that I am going to have to back off of some of those changes in order to focus on more important (to me) performance issues.
 
08/31/2004: It took me all morning but I have a start on using gettext to provide support for I18N in PyPedal. I have started on a German translation, but since I do not actually speak German it is based on Babelfish translations. I do not yet know if that is good or bad. :-)
 
08/30/2004: Lots of big changes are in the works for PyPedal. Today I started working on the new class structure that will see the introduction of a real pedigree class with most of the computational routines as methods. I also started tinkering with some custom exceptions for PyPedal but I think that's not a very productive way to spend my time. Although I have not started coding on it, I am thinking about what needs to be done to enable translations in PyPedal. I have a few ideas here and they make their way into Alpha 8. One thing that must be done before I can make a Beta release is to write some unit tests. The examples subdirectory is a mess and if I write some tests it will force me to clean that up. I hope that unit testing does not reveal a bunch of unknown bugs. :-)

I am adding a configuration file that can be used to control the default options used by a lot of PyPedal procedures. The idea here is to reduce the number of parameters that have to get passed around by the user. At the moment, though, it is unclear to me how best to use the config file (thank Bob for the OptionsParser module). I am adding **kw argument handling to the classes and methods, too.

One of the hard things about this part of development is to stay focused on the little details that need attention. It is much more interesting to add new features than to perfect older ones. For example, there is still a nasty little bug in the new pedigree format code handling procedure that affects all columns following allelotypes in an input file. It should not be very hard to fix, but still, boring. :-)

Anyway, what does this mean for the release schedule? In short, I do not know. I will probably keep releasing Alpha versions until I am happy with everything. This will probably take until at least Christmas. We are looking for a new place to live and I need to get at least two manuscripts written and submitted by then, so PyPedal is not first on my list of priorities. Once unit tests and a basic GUI are completed I will start the Beta series of releases. If I can recruit some people to bug-check then the Beta cycle may be relatively short. Once the system is pretty well debugged I will freeze the codebase and start working on the documentation. When there is documentation that is actually useful and I have not found any bugs that slipped through the testers I will release PyPedal 2.0.0.

What can you do to help? If you are interested in PyPedal, download it and use it. If you think that you have found a bug enter it into the bug-tracking system on the Sourceforge site. This is very important. You can also submit test pedigrees for the unit tests; better yet, submit a test! Anyone who wants to work on documentation will receive ample praise and as many props as I can throw to them. Did I mention that you will also get your name in the credits? If there is a missing feature that will prevent you from using PyPedal visit the discussion forum on the Sourceforge site and make a suggestion. Please note that I am going to institute a feature freeze when PyPedal enters Beta. I welcome feedback from my users!
 
08/12/2004: Alpha 7 has been released on the Sourceforge site. Details available in the CHANGES file in the distribution, in the changelog on the website, and in earlier entries on this page. There are a few things that are not finished for this release, notably the new pedigree format parser, but release early, release often, right?
 
08/12/2004: Work continues on Alpha 7. I have made changes to several routines that use pyp_metrics/fast_a_matrix() to build a relationship matrix. fast_a_matrix() will fail if you throw it a large enough pedigree (the exact limit depends on your computer, and in particular on the amount of available RAM that you have), and I put some code into several routines so that they will (hopefully) recover a little more gracefully from that situation. If you get "-999.9" back from a routine when you expect a different answer (such as a positive value) it might be because your pedigree is too large for fast_a_matrix().

I also made some changes to several methods in the pyp_classes/Pedigree() class, mostly adding debugging messages and replacing list use with dictionary use whenever possible. The performance gain here is minimal on small pedigrees but very substantial on large pedigrees. Do NOT set the debug switch to 1 unless you really, really want a lot of output.
 
08/09/2004: Alpha 7 will be released soon! While I was in East Lansing and St. Louis last month I spent several hours working on PyPedal. The biggest improvement will be the new code for handling the pedigree format string. It is not quite finished and is not yet debugged, but it will be a big win for the user. Currently you can only use a format from a defined set of codes. The new system will let you specify your own format string from a list of something like 15 different column data types recognized by PyPedal. There are also some new features, including the first sketchy bit of code for a demographics module. The vigilant user will also notice that there is a stub for a peeling module (to compute genotype probabilities from incomplete marker data) which will not be written until at least Alpha 8.

Alpha 7 has taken longer than I thought to get out, but family needs come before coding needs, as I am sure you can understand. I also got sidetracked into some debugging becuase I thought that there was a problem with pedigree preprocessing that I had missed. The problem turned out to be a dodgy pedigree, but the input file contains something like 800,000 animals and could not easily be debugged by hand. Watch this space for the impending new release, hopefully before the end of August.
 
05/25/2004: Alpha 6 has been released on the Sourceforge site. Alpha 6 includes pyp_metrics/effective_founder_genomes(), which uses a gene-dropping algorithm to compute the effective number of founder genomes for a population of interest. The population of interest is the generation with the largest generation code (in a pedigree with explicitly-defined generations) or all animals but the founders (in a pedigree with no explicitly-defined generations). The procedure has been validated against the sample pedigrees in Lacy (1989) and Boichard et al. (1997).
 
05/06/2004: Alpha 5 has been released on the Sourceforge site. The big news is that all of the online documentation has been updated to cover the latest changes. The Tutorial is not finished, but is also included in the Manual. In addition, I have squashed a few minor bugs, significantly improved the performance of pyp_utils/preprocess(), and added pyp_nrm/fast_a_matrix_r() which corrects numerator relationships for the inbreeding of parents to produce actual coefficients of relationship.
 
04/23/2004: Alpha 4 has been released on the Sourceforge site there have been many feature additions and bugfixes lately:
  • Possibly corrected a subtle bug in the Animal.pad_id_new method that resulted in incorrect sorting in some cases.
  • Added pyp_metrics/mating_coi(), which computes the coefficient of inbreeding of offspring that would result from a matinge between two animals.
  • Added pyp_metrics/relationship(), which computes the coefficient of relationship between two animals.
  • Added three new attributes to Animal() objects: self.sons, self.daus, and self.unks, which are lists to store renumbered animalIDs of sons and daughters of an animal, as well as the IDs of offspring with unknown sex.
  • Added a 'name' attribute to the Animal() object to accomodate, e.g., dog breeders.
  • Added a new procedure, pyp_utils/draw_pedigree(), to draw pedigrees using the pydot interface to Graphviz. If the necessary modules are not installed the procedure will return a result of '0' rather than exploding. :-)
  • Beginnings of a tutorial in the PyPedal manual.
  • Corrected a minor bug in pyp_nrm/inbreeding_tabular() that resulted in negative CoI being written to returned dictionary.
  • Enhanced pyp_nrm/inbreeding() to update Animal() instances with the CoI computed by that routine.
  • Enhanced pyp_utils/preprocess() to assign sex codes to Animal() instances based on the inferred sex iff no sex code was specified in the pedigree file.
  • Added a new routine, pyp_metrics/a_effective_ancestors, that will call either a_effective_ancestors_definite() or a_effective_ancestors_indefinite() depending on the size of the pedigree passed in. Currently, they cutoff is 1,000.
  • Added a new routine, pyp_metrics/a_effective_ancestors_indefinite() routine, that attempts to estimate upper and lower bounds for f_a in large pedigrees rather than computing all contributions explicitly. a_effective_ancestors_indefinite() is NOT WELL TESTED. There are almost certainly bugs; the routine does not iterate. All I can really tell you for sure is that it sometimes returns values that are extreme underestimates of f_a. It is supposed to work reasonably well on large pedigrees rather than small ones.
  • FINALLY fixed all known bugs in the tragically-written pyp_metrics/a_effective_ancestors_definite() routine!
  • Added pyp_utils/set_ancestor_flag() to be used to set ancestor flags.
  • Added an ancestor flag to pyp_classes/Animal/__init__().
  • Fixed bugs in pyp_metrics/a_effective_founders_lacy() and pyp_metrics/a_effective_founders_boichard() that were introduced by changes in pyp_utils/preprocess().
  • Changed pyp_utils/preprocess() so that pedigree entries are not made for unknown parents by the "add parent records to the pedigree if they are not already there" routine.
  • Added pyp_metrics/common_ancestors() which returns a list of all of the ancestors that two animals share in common.
  • Added pyp_metrics/related_animals() which recurses through a pedigree to build a list of all animals related to a given animal, if any.
Isn't that impressive? And the software is free to boot! Wow!
04/14/2004: Documentation, source files, and binary files were released on the Sourceforge site this evening!
Introduction
PyPedal is a Python language application for pedigree analysis.  I am in the process of moving the source tree to Sourceforge.  Documentation and access to the CVS tree will be provided as soon as possible.

Questions?  Comments?  Send e-mail to the author.