llama-cpp-d 0.1.3

D bindings for llama-cpp


To use this package, run the following command in your project's root directory:

Manual usage
Put the following dependency into your project's dependences section:

llama-cpp-d

CI Build Latest release Static Badge Ask DeepWiki

D bindings for llama.cpp.

Requirements

ToolMinimum
LDC or DMD≥ 2.111 (importC required)
CMake≥ 3.14
C++17 compilerGCC / Clang / MSVC

How to use

dub add llama-cpp-d

Tools

hf-download

List and download GGUF files from HuggingFace Hub:

cd tools && dub build --build=release

# List available .gguf files in a repository
./build/hf-download -r unsloth/Qwen3.5-0.8B-GGUF

# Download a specific file
./build/hf-download -r unsloth/Qwen3.5-0.8B-GGUF -f Qwen3.5-0.8B-Q4_K_M.gguf -o ~/models

# With authentication (private repos / higher rate limits)
HF_TOKEN=hf_xxx ./build/hf-download -r myorg/mymodel -f model.gguf
FlagDescription
-r owner/repoHuggingFace repository (required)
-f filenameFile to download; omit to list .gguf files
-o outdirOutput directory (default: .)
-t tokenHF access token (or HF_TOKEN env var)

Examples

# Text completion
dub run :simple -- -m model.gguf -n 64 "Tell me a joke"

# Tokenization inspector
dub run :tokenize -- -m model.gguf -s "Hello, world!"

# Sentence embeddings (cosine similarity between prompts)
dub run :embedding -- -m model.gguf
dub run :embedding -- -m model.gguf -p "custom sentence"

# Context state save/load (verifies two runs produce identical output)
dub run :save-load-state -- -m model.gguf -n 32

# Multimodal (vision/audio) — text only
dub run :multimodal -c default -- -m model.gguf --mmproj mmproj.gguf -n 200 "Describe this."

# Multimodal with an image
dub run :multimodal -c default -- -m model.gguf --mmproj mmproj.gguf -i photo.jpg "What do you see?"
ExampleRequired flagsOptional flags
simple-m <path>-n <tokens> (default 32), -ngl <gpu-layers> (default 99)
tokenize-m <path>-s include BOS/EOS
embedding-m <path>-p <text>, -ngl (default 99)
save-load-state-m <path>-n <tokens> (default 16), -ngl, --state-file <path>
multimodal-m <path>, --mmproj <path>-i <image>, -n <tokens> (default 512), -ngl (default 99), --no-gpu

Configurations

ConfigDescription
defaultCPU only
mtmdCPU multimodal (llama + libmtmd)
cudaCUDA GPU acceleration
vulkanVulkan GPU acceleration
metalApple Metal (macOS)
hipblasAMD ROCm/HIP
openblasOpenBLAS
openmpOpenMP threading
syclIntel oneAPI SYCL

Quick start

Text completion

import llama;

void main()
{
    loadAllBackends();

    // D-string overload; second arg is GPU layer count (0 = CPU only)
    auto model = LlamaModel.loadFromFile("model.gguf", 99);
    assert(model);

    // Context window = model default; batch size = number of prompt tokens
    auto tokens = tokenize(model.vocab, "Hello");
    auto ctx    = LlamaContext.fromModel(model,
                      cast(uint) tokens.length + 32,  // nCtx
                      cast(uint) tokens.length);       // nBatch
    assert(ctx);

    // Two-statement form: SamplerChain is non-copyable, so no chaining on init
    auto smpl = SamplerChain.create();
    smpl.topK(40).topP(0.9f).temp(0.8f).dist();

    auto batch = batchGetOne(tokens);
    ctx.decode(batch);

    auto next = smpl.sample(ctx); // samples from the last output position
}

Multimodal (vision/audio)

import llama;

void main() @trusted
{
    loadAllBackends();

    auto model = LlamaModel.loadFromFile("model.gguf", 99);
    assert(model);

    auto mparams = mtmd_context_params_default();
    mparams.use_gpu = true;

    auto mtmd = MtmdContext.initFromFile("mmproj.gguf", model.ptr, mparams);
    assert(mtmd);

    // Load an image (or skip for text-only)
    auto bitmap = mtmd.loadBitmap("photo.jpg");
    assert(bitmap);

    import std.string : fromStringz;
    string marker    = fromStringz(mtmd_default_marker()).idup;
    string prompt    = marker ~ "\nDescribe the image.";
    auto   chunks    = InputChunks.create();
    auto   inputTxt  = mtmd_input_text(&prompt[0], true, true);
    const(mtmd_bitmap)*[1] bitmaps = [bitmap.ptr];
    mtmd.tokenize(chunks, inputTxt, bitmaps[]);

    auto ctx = LlamaContext.fromModel(model,
                   cast(uint)(chunks.nTokens + 256),
                   512);
    assert(ctx);

    llama_pos nPast;
    mtmd.evalChunks(ctx.ptr, chunks, 0, 0, 512, true, nPast);

    auto smpl = SamplerChain.create();
    smpl.temp(0.8f).topK(40).topP(0.95f).dist();

    // Generation loop
    llama_token[1] buf;
    foreach (i; 0 .. 256)
    {
        auto tok = smpl.sample(ctx);
        if (isEog(model.vocab, tok)) break;
        import std.stdio : write;
        write(tokenToString(model.vocab, tok));
        smpl.accept(tok);
        buf[0] = tok;
        ctx.decode(batchGetOne(buf[]));
    }
}

License

MIT

Authors:
  • Matheus Catarino França
Dependencies:
none
Versions:
0.1.3 2026-Mar-31
0.1.2 2026-Mar-20
0.1.1 2026-Mar-19
0.1.0 2026-Mar-19
Show all 4 versions
Download Stats:
  • 0 downloads today

  • 0 downloads this week

  • 4 downloads this month

  • 4 downloads total

Score:
0.1
Short URL:
llama-cpp-d.dub.pm