7.1 Reordering and Renumbering

Many computations on pedigrees require that the pedigree be renumbered such that animal IDs are consecutive from 1 to "n", where "n" is the total number of animalsin the pedigree. By default, all pedigrees are renumbered at load time. If you pedigree is already renumbered and you do not want it renumbered again you must set the "renumber" option to 0. The renumbering process requires that the pedigree be reordered such that parents always precede their offspring in the list of animal IDs. The actual ID assigned to an animal is of no particular importance, and it is even possible for parents to have larger IDs than their ofspring. PyPedal can reorder any pedigree unless there is an error in it that would prevent unambiguously placing parents before offspring. For example, a pedigree containing a keypunch error such that an animal is one of its own grandparents cannot be reordered because there is no way to unambiguously order the animals. The pyp_utils module provides two routines for pedigree reordering, reorder() and fast_reorder(). By default, reorder() is used to reorder pedigrees in place. It does this by maintaining a list of animal IDs that have been processed; whenever a parent that is not in the list of encountered animals the offspring of that parent are moved to the end of the pedigree. This ensures the pedigree is properly sorted such that all parents precede their offspring. Founders are also grouped together at the beginning of the pedigree. This procedure will always correctly reorder a pedigree but it can be quite inefficient as it is similar to an insertion sort, which has a worst-case runtime proportional to $ n^{2}$ [Cormen, Leiserson, Rivest, and SteinCormen et al.2003].

fast_reorder() provides a much faster means of reordering a pedigree, but can incorrectly reorder a pedigree in some cases. When an instance of a NewAnimal object is created the pad_id() method is called. pad_id() uses the animal ID and birth year to form an ID used by by pyp_utils/fast_reorder() for quick sorting; if your pedigree file is numbered such that offspring always have larger IDs than their parents and your birth years (if provided) are correct (that is, parents always born BEFORE offspring) then pyp_utils.fast_reorder() works as expected. If you do not provide birth years in your pedigree file but your parent IDs are always smaller than your animal IDs, the reordering will be correct. If you do not provide birth years, all animals in the pedigree will be assigned a default value of `1900'. In that case, if parents have IDs larger than that of one or more of their offspring, the pedigree will be incorrecrly reordered by fast_reorder(). If your pedigree file contains birth years, or you know that parents always have smaller IDs than their offspring, then fast_reorder() will correctly reorder your pedigree in linear time. Founders are not guaranteed to be grouped at the beginning of the pedigree when fast_reorder() is used; if you are going to calculate coefficients of partial inbreeding (Section 7.4.3) then you should instead use reorder() to reorder your pedigree.

The performance difference between the two reordering routines is not very noticeable on pedigrees of a few hundred to a few thousand animals, but is quite dramatic for very large pedigrees. If your pedigree file is already reordered then there is essentially no performance difference between the two. When creating a pedigree file from data stored in a relational database, let the database perform the sort for you by using an "ORDER BY" statement.

See About this document... for information on suggesting changes.