intervaltree ~master
Interval Tree implementations.
To use this package, run the following command in your project's root directory:
Manual usage
Put the following dependency into your project's dependences section:
intervaltree
intervaltree provides 3 implementations of an interval tree structure:
- Augmented AVL tree
- Augmented Splay tree
- Implicit Interval Tree
A classic Red-Black tree is not included but would be welcomed.
In addition to the package module itself, which includes a "BasicInterval" struct and an "overlaps" function, the package includes 3 sub-modules, one for each of the tree types listed above:
intervaltree.avltree intervaltree.splaytree intervaltree.iitree
Simply include intervaltree.<treetype>
in your code.
Overview
Each tree is implemented as a container type via templates. So, in addition to [start,end) interval coordinates, it may contain arbitrary other data. For example, instantiate the tree as IntervalTree!MyStruct
API (unstable until 1.0.0)
Encapsulating Struct, templated on "IntervalType", containing ValueType IntervalType or a pointer to it (in the case of IITree)
Operations implemented in various combinations across the 3 tree types: insert remove find findOverlapsWith findMin
Currently, avltree
and splaytree
share a common API requiring only coordinates,
whereas iitree
differs in that a string key identifying a distinct interval tree
is an implicit part of the data structure. i.e., the IITree structure may contain
multiple independent trees. IITree was developed for genomics and the key is understood
as "contig" (chromosome) in this context, but could be used for whatever. For an
example software consuming this library and using version
to access any of the
trees, see https://github.com/blachlylab/swiftover/
ForwardRange interface: planned.
@nogc status: insert and delete operations are @nogc
. Currently, findOverlapsWith
returns Node*[] using dlang dynamic arrays and thus cannot be @nogc. It would be great
for the entire library to be @nogc
but I haven't settled on a suitable array impl,
and I also hate to make the caller remember to free() the returned nodes.
Debugging
Debug messages:
Debug messages are only printed when debug symbol `intervaltree_debug`
is defined in order to better preserve debug messages from your own
program's `debug { }` blocks.
Instrumentation:
* Defining version `instrument` creates a variable
`__gshared int[] _{treename}_visited` where `{treename}` in (avltree, splaytree)
and holds statistics on the number of nodes visited to find results.
* For cgranges (iitree), you must additionally #define INSTRUMENT and recompile
`cgranges.c`
Brief discussion of interval trees and relative tradeoffs
Interval trees are often implemented as augmented binary search trees. Here, we explore several different types of binary search trees.
Red-Black trees are relatively well-balanced, but not perfectly so. Insertion is fastest; query is slightly slower than AVL tree due to imperfect balance, but it is a good compromise and widely used.
AVL trees are more well-balanced than Red-Black trees. This makes insertion slightly slower, but provides the fastest amortized lookups.
Splay trees "splay" the most recently accessed node to the top/root of the tree, so they may become extremely unbalanced. However, this imbalance provides an implicit caching effect when the next insertion, deletion, or lookup is very close in coordinate space to the most recently accessed node. In sequential queries, one may need only to descend a single node from the root. This means for sequentially ordered operations, it can beat the perfectly balanced AVL tree. Random access, on the other hand, can be extremely poor. In this library, we introduce another uncommon optimization, the "probabilistic" splay tree. Randomizing the likelihood of performing the splay operation on read can substantially improve access times for some workloads. (Albers & Karpinski 2002)
Implicit Interval Trees (IIT) store the entire tree in a compact linear array sorted by start position. They were created by Heng Li and implemented as the "cgranges" C library. This library is intended for genome applications, and includes a "contig" parameter. The IIT structure excels at both sequential and random access, with the disadvantage that it must be reindexed (resorted) after any/all inserts or deletes, so it works best with static trees.
Credits
AVL tree based on attractivechaos' klib https://github.com/attractivechaos/klib Splay tree is my own implementation IITree is a D wrapper around Heng Li's cgranges C library, which is included as source https://github.com/lh3/cgranges
References
https://en.wikipedia.org/wiki/Intervaltree https://en.wikipedia.org/wiki/Red%E2%80%93blacktree https://en.wikipedia.org/wiki/AVLtree https://en.wikipedia.org/wiki/Splaytree https://github.com/lh3/cgranges
http://www14.in.tum.de/personen/albers/papers/ipl02.pdf -- Albers & Karpinski doi: 10.1016/S0020-0190(01)00230-7
https://github.com/blachlylab/swiftover/ -- Example library consumer
- ~master released 3 years ago
- blachlylab/intervaltree
- github.com/blachlylab/intervaltree
- MIT
- Copyright 2019 James Blachly
- Authors:
- Dependencies:
- mir-random
- Versions:
-
0.22.1 2021-Feb-23 0.22.0 2020-Dec-05 0.21.0 2020-Aug-22 0.20.0 2020-Jan-10 0.15.0 2020-Jan-06 - Download Stats:
-
-
0 downloads today
-
0 downloads this week
-
0 downloads this month
-
167 downloads total
-
- Score:
- 0.0
- Short URL:
- intervaltree.dub.pm