PAIR WISE COMPARISONS OF TREES
Topological changes
- Did anything change, in general, or in a
sub-tree? Were there small changes or major changes?
Qualitative indications of the magnitude of
the differences between the two trees or sub-trees will be indicated by the
number of entries in the Assignment panel. Trees that are very similar will
have many entries in the Common taxa tab and
few entries in the other tabs (particularly the Different taxa tab). Conversely, trees that are
very different will have few entries in the Common
taxa tab and many in the Different
taxa tab. The nature of the changes (small versus major) often requires
judgement on the part of the user (in our case, taxonomists) as to the
importance of the change.
- What nodes were added, deleted?
Nodes that were added or deleted from one
tree are deleted or added, respectively, in the other tree. These simple
differences between two trees are listed in the Missing taxa tab in the Assignment table.
- Did any node or sub-trees "move" in the
tree. Can you characterize those movements?
Entries in the Different taxa tab indicate movement, in other
words, nodes which have different linkage (paths to the root node) in the two
trees. Highlighting an entry in the table will show the node in the hierarchy
display panel. If the sub-tree moves en masse (so that the root of the sub-tree
has a different parent), then just one difference will be recorded: that the
root node has different parents in the two hierarchies. If the sub-tree
fragments as it is moved, so that nodes in the sub-tree end up in different
places, then each difference will be recorded in the
Different taxa tab.
Attribute value changes
- Global impression: did things change a lot
or not?
The TaxoNote Comparator is designed to
manage the Latin name and the rank from the classification data sets. Its
primary use in the comparison of two trees lies in distinguishing between
similar nodes and in resolving conflicts between the two trees. In order to do
this with some degree of confidence, many more attributes than are available in
the classification data sets are required. (Principal among these is the
authority for the name, in other words, the author of the publication where the
name first appeared.) In the context of taxonomy, changes in attribute values
are recorded in the Synonyms tab of the
Assignment panel (two nodes which are compatible but having different rank or
name).
- What nodes or sub-trees changed the most?
A visual scan of entries in the
Different taxa tab will show which entries
have changed, but not by how much. Hence we can detect where the differences
lie, but not their magnitude.
- Did the value of attribute XYZ for this node
increase or decrease? in absolute terms, or relatively to other siblings or
other nodes?
Attributes that could be interpreted
numerically, such as those having integer values are not handled in a different
way by our software. Therefore we are able to detect that the value of the same
attribute from two nodes is different, but we are not able to interpret this as
an increase or decrease.
GENERAL VISUALIZATION OF TREES
Topology
- Overall characteristics: How large is the
tree? How many levels deep? What is the deepest branch? Does the depth vary
between sub-trees or not?
The overall characteristics of the tree
such as those given above, are obtainable by inspection. It would be possible
to calculate such metrics for each of the trees involved in the comparison but
this has not been implemented yet. We envisage taxonomists using metrics these
metrics in determining which trees are largest (covering the most taxonomic
ranks), hence are most likely to be the product of taxonomic revisions.
- Path: What is the path of this node?
The path to a node is displayed in the
popup in the hierarchy display panel.
- Local relatives : what are the children,
siblings or cousins of this node?
A pop-up could be implemented to give this
information. However, since the hierarchy display window will always display
nodes legibly, this information can be obtained by inspection of the
hierarchies.
- Filtering by level: e.g. show me only the
first level, or show only 3 levels down, or removes all the leaves
Since expansion and contraction of the
hierarchy is under user control, manual filtering by level is possible.
- Topologies question that involve counting
nodes can be seen as attribute dependant questions: e.g. Which branch contains
the largest number of nodes? or Which branch has the largest fan-out?
At present these topological functions are
not addressed explicitly in the TaxoNote Comparator although they may be added
in later versions of the software.
Attribute based
Those tasks only occur when leaf nodes have
attributes that can be aggregated at the parent level.
- Find nodes with value Y of categorical
attribute X - What value of a categorical attribute occurs more often? e.g. Are
there more farm animals or pets
One of the Assignment Table tabs that we
intend to implement post-InfoVis is a general Search Results tab. Such a tab
would record all the results of a search, allowing answers to questions of this
nature to be addressed.
- Find nodes with certain values of two or
more attributes e.g. what video file is used the most
At present our search facility does not
allow boolean operators, hence general searches on attributes and combinations
of attributes is not supported. However, it is possible to search by Rank and
Taxonomic name.
Some topologies queries fall under
attribute dependant queries because all trees/sub-trees have at least one
attribute: their number of nodes!
- Number of nodes in a tree, or sub-tree? e.g.
How many animal?, how many mammals?
At present we do not count the number of
nodes in a sub-tree, although this could be implemented in the popup that
currently reports the path to a node.
- Comparison of branches of the tree (i.e.
sub-trees with most nodes)? e.g. Is there more mammals or fish?
Again, we do not count the number of nodes
in a sub-tree. Questions of this nature could be answered using the current
version of the software, but answers would be obtained by manual inspection of
the displayed hierarchies.
- Largest fan-out e.g. What is the largest
group of animals with same lineage?
Counting nodes is not a feature that is
requested by our intended audience taxonomists, hence we have not
implemented this feature yet.
Known items
- Which node(s) has a label containing this
string? e.g. find "giraffe" in a tree of animals
This is implemented in the Search
panel.
- Locate a node knowing its path.
If you know the path to a node then you can
find it simply by expanding parent nodes sequentially until the node is found.
- Go back to a node you have visited before.
At present we do not have a history or
bookmark mechanism, although we can appreciate the utility of such a facility
in a tree exploration and navigation context.
Labeling
- Review all the labels in a sub-tree
Lists of names are important to taxonomists
(e.g. a list of the members of a genus). At present this is supported through
the hierarchy comparison panel, and in the appropriate Assignment Table tab
(e.g. the Common Nodes tab if all the nodes in the sub-tree are common to both
trees).
Browsing
- Explore the tree by performing a series of
up and downs in the tree e.g. you are looking for a cute animal... so you look
into mammals, then primates, then gorillas, and chimpanzees, but you realize
that are not that cute, so you go to felines, to tigers and cheetahs, but now
remember that pandas are your favorites and you go there.
When we first explored the InfoVis data
sets, we realized that one of the major issues would be supporting navigation
in large data sets, as opposed to comparison of hierarchies within the data
sets. Targeted navigation using the Search panel and Assignment Table are
supported, but browsing, as characterized above, is not supported beyond
expanding and collapsing nodes of interest. Of course, individual and
synchronous scrolling of the hierarchy display panes is also supported.
Managing the analysis
- Marking nodes of interest, removing special
anomalies.
Marking nodes of interest has not been
implemented yet, although we can appreciate the utility of being able to
bookmark or otherwise highlight such nodes. The second issue, of being able to
remove special anomalies requires that issues of maintaining an audit trail be
addressed, such as recording who made which modifications to a node, and
when.
- Saving visualization settings for future
reference.
Not yet implemented.
- Keeping the history of your analysis,
reviewing it and replaying it with different parameters.
Again, not implemented yet.
|
- To what extent are the differences in the
classifications due to differences in how animals are thought to be related?
Are there other kinds of differences and can you explain them?
All taxonomic classifications are based on
some assessment of relationship, either phenetic or phylogenetic:
consequentially all differences are attributable to differences in these
relationships. Where classifications are formed from quite different
relationship models, particularly phylogenetic and phenetic, it can be
particularly difficult to map classifications one onto another, and therefore
difficult to explain individual differences. As in all systematics, it is
easier to explain differences when both the relationship model and taxa are
similar.
We suspect that this question specifically
means phylogenetic relationship rather than relatedness in general. It is
crucial to know whether the hierarchies analysed were phylogenetically derived
in order to answer this question and such information was not included in the
dataset.
Considering one dataset or the other:
- Can you say in how many different subtrees a
particular common name (such as "dolphin" or "horse") is used? How closely are
these animals related? Are common names a good guide to understanding
relationships?
We have not implemented common name
management at this stage of development of our user interface, although they
exist in our data model. Our data structures allow for searching of individual
names, as exemplified in question 3 below, so determining how many times a name
occurs is straightforward. The question of subtrees, though, is intriguing: we
have not found a method of identifying sub-trees without user-intervention,
beyond the trivial set of sub-trees created at each node. Users can be shown
each instance of the target name and can manually identify the sub-tree to
which they belong but without the ability to identify such sub-trees, our
software is unable to answer this query automatically.
The question of degree of relatedness of
two taxa is probably best assessed by determining their lowest common
rootnode. Such information could be expressed as the rank of the taxon
immediately below the common root node (for instance belonging to different
Classes) and as such would be most easily accessible to a wide audience. Our
software is able to display the hierarchies necessary to determine these ranks,
but is not set up to calculate the lowest common root of a pair of taxa.
Shared common names are not a good guide to
phylogenetically related taxa, but they might indicate an ecological
relationship of some sort, such as "horse" and "horse fly".
- How many species or subspecies are named
after biologists named "Townsend"? Note that the answer will be different if
you are looking at common names versus Latin names. Can you look at the pattern
of names to deduce where in the world they might have done research? On what
kinds of animals?
Within the Mammalia data there are 9 taxa
containing the string "townsend" using wildcard completion of the name,
although such names may have been given for a geographical location rather than
a biologist: the data set does not contain the information to discriminate
these cases. These taxa are highlighted in the Hierarchy Comparison panel. The user can assess
the hierarchical position of each instance by inspection, which will inform the
educated user of the kind of animal involved. (Our software is intended as a
tool for taxonomists and not for naïve users.) Information on the
geographical origin of these taxa are not included in the hierarchical data
sets, so the user would have to use the names recovered in other search engines
to resolve queries beyond the scope of hierarchical comparison.
- Some scientific names are maddeningly
similar. For example, Spirulida and Spirurida are two nodes in two different
subtrees. A user types in the wrong one. What kind of feedback does your tool
provide to alert the user quickly? Do the names have the same rank? Is the
typed name in the expected part of the tree?
Our software allows users to select taxa
either by pointing at the hierarchy or at a name-list or by typing in the name
of the query taxon. When the user types names that exists in the data set, the
software will display the local region of the hierarchy: if the name was not
the one intended the user must recognise the fact from the hierarchy displayed.
The software does not provide any scheme for highlighting taxa with similar
names. Again we point out that our software is intended for taxonomists not
naïve users.
- For the top five subtrees with the most
nodesare they likely to have a parent of a particular rank? Or does this
happen in many ranks? Can you comment on how useful "rank" is?
Our software is unable to detect sub-trees
automatically, beyond the trivial case where each node is the root of a
sub-tree. This question seems to be asking whether the density of taxa in a
tree is evenly distributed or whether some regions are highly differentiated
into a large number of taxa. Such phenomena are readily seen by reducing the
scale of the hierarchy display, although this facility is not included within
our software. The question appears to wish to explore the evenness of
distribution in both the vertical (with rank) and horizontal (number of taxa at
any rank) directions. Such analytical facility has not been included in our
software, which is focussed on comparison between hierarchies rather than
analysis of individual hierarchies.
Rank is an essential property of a nested
hierarchy, being simply a measure of the degree of nesting. Rankless ordering
such as that based on the concept of clades is an alternative means of managing
statements of relationship, but it can work at only one level and cannot form
an hierarchical classification. Our software is ultimately intended to manage
nomenclature and this component is designed to compare hierarchies.
Phyolgenetic trees are, as such, beyond the scope of the software.
|