dentist ~no-damapper
Close assembly gaps using long-reads with focus on correctness.
To use this package, run the following command in your project's root directory:
Manual usage
Put the following dependency into your project's dependences section:
dentist
Close assembly gaps using long-reads with focus on correctness.
Today, many genome sequencing project have been conducted using
second-generation sequencers which produce short reads. Such assemblies have
many gaps. dentist
closes these gaps using a (small) set of long reads.
Furthermore, it can be used to scaffold contigs freely using a set of long
reads. This can be used to fix known scaffolding errors or to further scaffold
output of a long-read assembly pipeline.
Table of Contents
Install
Use Pre-Built Binaries
Download the latest pre-built binaries from the releases section
and extract the contents. The tarball contains a dentist
binary as well as
the snakemake workflow, example config files and this README. In short, everything you to run DENTIST.
Build from Source
Be sure to install the D package manager DUB. Install using either
dub install dentist
or
git clone https://github.com/a-ludi/dentist.git
cd dentist
dub build
Runtime Dependencies
The following software packages are required to run dentist
:
- The Dazzler Data Base (>=2020-07-27)
daligner
((>=2019-07-21 && <=2020-01-15) || >=2020-07-27)damapper
(>=2020-03-10)TANmask
(>=2020-01-15)dascrubber
(>=2020-07-26)daccord
(>=v0.0.17)
Please see their own documentation for installtion instructions. Note, the available packages on Bioconda are outdated and should not be used at the moment.
Usage
Suppose we have the genome assembly reference.fasta
that is to be updated
and a set of reads reads.fasta
with 25× coverage.
Quick execution with snakemake
Install snakemake version >=5.10.0 and copy these files into your working directory:
./snakemake/Snakefile
./snakemake/workflow_helper.py
./snakemake/snakemake.example.yml
→./snakemake/snakemake.yml
Next edit snakemake.yml
to fit your needs and optionally test your
configuration with
snakemake --configfile=snakemake.yml -- extend_dentist_config
If no errors occurred the whole workflow can be executed using
snakemake --configfile=snakemake.yml
For small genomes of a few 100 Mbp this should run on a regular workstation.
One may use snakemakes --jobs
to run independent jobs in parallel. Larger
data sets may require a cluster in which case you can use Snakemake's
cloud or cluster facilities.
Executing on a Cluster
To make execution on a cluster easy DENTIST comes with examples files to make Snakemake use SLURM via DRMAA. Please read the documentation of Snakemake if this does not suit your needs. Another good starting point is the Snakemake-Profiles project.
Start by copying these files to your working directory:
./snakemake/profile-slurm.yml
→~/.config/snakemake/<profile>/config.yaml
./snakemake/cluster.example.yml
→./snakemake/cluster.yml
Next adjust the profile according to your cluster. This should enable
Snakemake to submit and track jobs on your cluster. You may use the
configuration values specified in cluster.yml
to configure job names and
resource allocation for each step of the pipeline. Now, submit the workflow
to your cluster by
snakemake --configfile=snakemake.yml --profile=<profile>
Note, parameters specified in the profile provide default values and can be overridden by specififying different value on the CLI.
Manual execution
Please inspect the Snakemake workflow to get all the details. It might be
useful to execute Snakemake with the -p
switch which causes Snakemake to
print the shell commands. If you plan to write your own workflow management
for DENTIST please feel free to contact the maintainer!
Configuration
DENTIST comprises a complex pipeline of with many options for tweaking. This section points out some important parameters and their effect on the result.
How to Choose DENTIST Parameters
The following list comprises the important/influential parameters for DENTIST itself. Please keep in mind that the alignments generated by daligner/damapper have immense influence on the performance of DENTIST.
--max-insertion-error
: Strong influence on quality and sensitivity. Lower values lead to lower sensitivity but higher quality. The maximum recommended value is0.05
.--min-anchor-length
: Higher values results in higher accuracy but lower sensitivity. Especially, large gaps cannot be closed if the value is too high. Usually the value should be at least500
and up to10_000
.--reference-error
,--reads-error
: Determines the-e
parameter for daligner/damapper. Usedentist generate-dazzler-options
to see the effect of these parameters or consultcommandline.d
in the source code.--min-reads-per-pile-up
: Choosing higher values for the minimum number of reads drastically reduces sensitivity but has little effect on the quality. Small values may be chosen to get the maximum sensitivity in de novo assemblies. Make sure to throughly validate the results though.--min-spanning-reads
: Higher values give more confidence on the correctness of closed gaps but reduce sensitivity. The value must be well below the expected coverage.--allow-single-reads
: May be used under careful consideration. This is intended for one of the following scenarios:
- DENTIST is meant to close as many gaps as possible in a de novo assembly. Then the closed gaps must validated by other means afterwards.
- DENTIST is used not with real reads but with an independent assembly.
--existing-gap-bonus
: If DENTIST finds evidence to join two contigs that are already consecutive in the input assembly (i.e. joined byN
s) then it will preferred over conflicting joins (if present) with this bonus. The default value is rather conservative, i.e. the preferred join almost always wins over other joins in case of a conflict.--join-policy
: Choose according to your needs:scaffoldGaps
: Closes only gaps that are marked byN
s in the assembly. This is the default mode of operation. Use this if you do not want to alter the scaffolding of the assembly. See also--existing-gap-bonus
.scaffolds
: Allows whole scaffolds to be joined in addition to the effects ofscaffoldGaps
. Use this if you have (many) scaffolds that are not yet full chromosome-scale.contigs
: Allows contigs to be rearranged freely. This is especially useful in de novo assemblies before applying any other scaffolding methods as it increases the contiguity thus increasing the chance that large-scale scaffolding (e.g. Bionano or Hi-C) finds proper joins.
Choosing the Read Type
In the examples PacBio long reads are assumed but DENTIST can be run using any
kind of long reads. Currently, this is either PacBio or Oxford Nanopore reads.
For using none-PacBio reads, the reads_type
in snakemake.yml
must be set
to anything other than PACBIO_SMRT
. The recommendation is to use
OXFORD_NANOPORE
for Oxford Nanopore. These names are borrowed from the NCBI.
Further details on the rationale can found in this issue.
Citation
Arne Ludwig, Martin Pippel, Gene Myers, Michael Hiller. DENTIST – close assembly gaps with high confidence. In preparation.
Maintainer
Dentist is being developed by Arne Ludwig <ludwig@mpi-cbg.de> at the Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.
Contributing
Contributions are warmly welcome. Just create an issue or pull request on GitHub. If you submit a pull request please make sure that:
- the code compiles on Linux using the current release of dmd,
- your code is covered with unit tests (if feasible) and
dub test
runs successfully.
It is recommended to install the Git hooks included in the repository to avoid premature pull requests. You can enable all shipped hooks with this command:
git config --local core.hooksPath .githooks/
If you do not want to enable just a subset use ln -s .githooks/{hook} .git/hooks
. If you want to audit code changes before they get executed on your machine you can you cp .githooks/{hook} .git/hooks
instead.
License
This project is licensed under MIT License (see LICENSE).
- ~no-damapper released 4 years ago
- a-ludi/dentist
- MIT
- Copyright © 2018, Arne Ludwig <arne.ludwig@posteo.de>
- Authors:
- Dependencies:
- darg, vibe-d:data, string-transform-d
- Versions:
-
4.0.0 2022-Sep-14 3.0.0 2021-Dec-09 2.0.0 2021-Jun-21 1.0.2 2021-Apr-26 1.0.1 2021-Feb-22 - Download Stats:
-
-
0 downloads today
-
0 downloads this week
-
0 downloads this month
-
56 downloads total
-
- Score:
- 1.7
- Short URL:
- dentist.dub.pm