dhtslib ~file_abstract
D bindings for htslib
To use this package, run the following command in your project's root directory:
Manual usage
Put the following dependency into your project's dependences section:
dhtslib
dhtslib
provides D bindings, high-level abstractions, and additional functionality for htslib, the most widely-used library for manipulation of high-throughput sequencing data. We currently support linux and OSX. Windows support is still in progress (see #38). More extensive documentation can be found at our gitbook.
Installation
Add dhtslib
as a dependency to dub.json
:
"dependencies": {
"dhtslib": "~>0.13.3+htslib-1.13",
}
(version number 0.13.3 is example, +htslib-1.13
represents the compatible htslib version; see https://dub.pm/package-format-json)
Requirements
htslib
A system installation of htslib >= v1.13 is required. You can find detailed install instructions here.
Usage
dhtslib
usage information and examples can be found here.
Dhtslib API (OOP Wrappers)
Object-oriented, idomatic D wrappers are available for:
- SAM/BAM/CRAM files and streams (
dhtslib.sam
) - VCF/BCF files (
dhtslib.vcf
) - BGZF compressed files (
dhtslib.bgzf
) - FASTA indexes (
dhtslib.faidx
) - Tabix-indexed files (
dhtslib.tabix
)
Additional functionality is provided for:
- GFF(2|3) files and streams (
dhtslib.gff
) - BED files and streams (
dhtslib.bed
) - FASTQ files and streams (
dhtslib.fastq
) - Compile-time coordinate system (
dhtslib.coordinates
) to avoid off-by-one errors
All htslib bindings can be found under the htslib
namespace (in prior versions they were under dhtlsib.htslib
). These can be used directly as you would with htslib
.
htslib API
Direct bindings to htslib C API are available as submodules under the htslib
namespace. Naming remains the same as the original .h
include files. For example, import htslib.faidx
for direct access to the C function calls. Where the OOP wrappers manage their own data along the the D garbage collector, these functions use traditional C memory management (or lack thereof). The current compatible htslib versions are 1.10+.
Currently implemented:
- bgzf
- cram (untested)
- faidx
- hfile
- hts_endian
- hts_expr (untested)
- hts_log
- hts_os (untested)
- hts
- kbitset (untested)
- kfunc (untested)
- knetfile (untested)
- kroundup
- kstring
- regidx
- sam
- synced_bcf_reader (untested)
- tbx
- thread_pool (untested)
- vcf
- vcf_sweep (untested)
- vcfutils (untested)
Missing or work-in-progress:
- khash (see dklib), klist, kseq, ksort (mostly used internally anyway)
dstep has matured and is an incredibly powerful tool for machine-assisted C-to-D translation. We've used dstep for the majority of bindings in the since version v0.11.0. After dstep translation, we port inline functions by hand as they are not translated, tweak some macros into templates (done although dstep already does an amazing job on simple #define
macros translating to D templates!), and update the documentation comments to ddoc format.
FAQ
Q: Does this work with the latest htslib?
A: Yes
Q: Why not use bioD
A:
bioD, as a more general bioinformatics framework, is more comparable to bio-python, bio-ruby, bio-rust, etc.
bioD does have some excellent hts file format (BGZF and SAM) handling, and at one time sambamba, which relied on it, was faster than samtools.
However, the development resources poured into htslib
overall are tremendous, and we wish to leverage that rather than writing VCF, tabix, etc. code from scratch.
Q: How does this compare to bio-Rust's htslib bindings?
A: We love Rust, but dhtslib has way more complete bindings and more and better high level constructs :smile:. We have also implemented a novel compile-time type-safe coordinate system to mostly avoid off-by-one errors.
Q: Why am I getting a segfault?
A: It's easy to get a segfault by using the direct C API incorrectly. Or possibly correctly. We have tried to eliminate most of this (use after free, etc.) in the OOP wrappers via refernece counting. If you are getting a segfault you cannot understand when using purely the high-level D API, please post an issue.
Bugs and Warnings
Do not call hts_log_*
with ctx
as anything other than a string literal from a destructor, as it is potentialy allocating via toStringz
Programs made with dhtslib
- fade: Fragmentase Artifact Detection and Elimination
- recontig: a program to convert different bioinformatics data types from one reference naming convention to another i.e UCSC to ensembl (chr1 to 1)
Related projects
- ~file_abstract released 2 years ago
- blachlylab/dhtslib
- github.com/blachlylab/dhtslib
- MIT
- Authors:
- Sub packages:
- dhtslib:coordinates
- Dependencies:
- dhtslib:coordinates
- Versions:
-
0.14.0+htslib-1.13 2022-Mar-02 0.13.3+htslib-1.13 2021-Oct-01 0.13.2+htslib-1.13 2021-Oct-01 0.13.1+htslib-1.13 2021-Sep-30 0.13.0+htslib-1.13 2021-Sep-30 - Download Stats:
-
-
0 downloads today
-
0 downloads this week
-
1 downloads this month
-
828 downloads total
-
- Score:
- 0.0
- Short URL:
- dhtslib.dub.pm