Package dataframes on DUB

To use this package, run the following command in your project's root directory:

DataFrame

Simple DataFrame for D programming language. Each field from the given struct will be converted as DataFrame Column to store the array. This library is focused on making a easy to use DataFrame in D.

Install

Add dataframes to your project by running the following command.

dub add dataframes

Create a new DataFrame

Create a struct that represents the Row of the DataFrame. For example, to store the item and price information.

struct Item
{
    string name;
    double unitPrice;
    int quantity;
}

With this library, we can't add more columns to DataFrame in runtime. So include the additional fields to the above struct if required. Example,

struct Item
{
    string name;
    double unitPrice;
    int quantity;
    double totalPrice;
}

Now create the DataFrame.

auto df = new DataFrame!Item;

Adding items

To add initial data, initialize the DataFrame as,

auto df = new DataFrame!Item(
    name: ["Pencil", "Pen", "Notebook"],
    unitPrice: [5.0, 10.0, 25.0],
    quantity: [5, 2, 7]
);

To add items one by one,

df.add(Item("Pen", 10.0, 1));

// OR from the list of Items
foreach(item; items)
    df.add(item)

Preview the DataFrame data

Print the DataFrame to see the content. If the DataFrame has less than or equal to 10 rows then it prints the full DataFrame. It prints only first 5 and the last 5 rows otherwise.

Example:

df.writeln;

Sample output:

name         unitPrice    quantity  totalPrice
Pencil            5.00           5         nan
Pen              10.00           2         nan
Notebook         25.00           7         nan

3 rows

Full Example

import std.stdio;

import dataframes;

struct Item
{
    string name;
    double unitPrice;
    int quantity;
    double totalPrice;
}

void main()
{
    auto df = new DataFrame!Item(
        name: ["Pencil", "Pen", "Notebook"],
        unitPrice: [5.0, 10.0, 25.0],
        quantity: [5, 2, 7]
    );

    // Preview
    df.writeln;
}

Number of Columns and Rows

writeln("Columns: ", df.ncol);
writeln("Rows   : ", df.nrow); // OR `df.length`

Column names

writeln(df.columnNames);

Access Rows and Columns

Access rows by index.

auto firstRow = df.row(0);
writeln(firstRow.name, " ", firstRow.unitPrice * firstRow.quantity);

Access column,

auto firstPrice = df.unitPrice[0];

auto names = df.name;
// OR
auto names = df["name"].get!string;
// OR
auto names = df[0].get!string;

To access column from a Row,

auto firstRow = df.row(0);
string name = firstRow.name;
// OR
string name = firstRow["name"].get!string;
// OR
string name = firstRow[0].get!string;

Updating the derived columns

In the above example, totalPrice data is not available in the initial dataset. To calculate the totalPrice,

df.totalPrice = df.unitPrice * df.quantity;

Above command will update totalPrice of all the rows.

For complex formula or business logic, use the temporary column to calculate the total price and add the results to DataFrame.

Column!double discounts;
foreach(name; df.name)
{
    if (name == "Notebook")
        discounts ~= 0.05;
    else if (name == "Pen")
        discounts ~= 0.02;
    else
        discounts ~= 0;
}

df.totalPrice = (df.unitPrice - discounts) * df.quantity;

Or multiply the column by a single number.

df.totalPrice = (df.unitPrice - df.unitPrice * 0.05) * df.quantity;

Head and Tail

To get first n records from the dataframe,

auto firstTwo = df.head(2);

To get last n records from the DataFrame,

auto lastValue = df.tail(1);

Using `std.algorithm` goodies with the DataFrame

Following example shows the sum of total prices of a few selected items.

df.rows
  .filter!(item => item.name == "Pencil" || item.name == "Pen")
  .map!(item => item.totalPrice)
  .sum
  .writeln;

Multisort using name and quantity fields.

df.rows
    .multiSort!("a.name < b.name", "a.quantity > b.quantity")
    .writeln;

Importing data from CSV file

If struct fields and data in CSV matches then we can give csvReader!Item to import all the items. But the CSV file may contain more data which are not imported. In such cases, define a new Tuple type.

import std.csv;
import std.typecons;

//                         name    unitPrice  quantity
alias ItemCsvData = Tuple!(string, double,    int);
auto df = new DataFrame!Item;

auto file = File("items_2024_10_26.csv", "r");
// header: null to ignore the header row
foreach (record;file.byLine.joiner("\n").csvReader!ItemCsvData(header: null))
    df.add(Item(record[0], record[1], record[2]));

// Preview the imported data
df.writeln;

Copying the DataFrame or creating DataFrame of new Type

To create a PriceList dataframe from the list of items.

struct PriceList
{
    string name;
    double price;
}

df.rows
    .sort!("a.name < b.name")
    .uniq!("a.name == b.name")
    .toDataFrame!PriceList
    .writeln;

Resampling

Using chunkBy, group the records then apply the logic to use in each groups. Logic can be string, array of string or hash map.

auto dfSummary = df.resample(df.rows.chunkBy!((a, b) => a.name == b.name), logic);

First argument creates the group as required, and logic will be applied to other columns.

string logic = "sum";
// OR
//                name     quantity   price
string[] logic = ["first", "sum",     "sum"];
// OR
string[string] logic = ["name": "first", "quantity": "sum", "price": "sum"];

Currently supported logics are:

first - Select the first element from the group.
last - Select the last element from the group.
max - Maximum value from the column.
min - Minimum value from the column.
count - Count of the values in each group.
sum - Sum of each element in the column.

DataFrame to JSON

To convert a DataFrame to JSON,

auto jsonData = df.toJSON;

Rolling

Use this to apply aggregation function over a moving window. For example, to calculate the Simple moving average for the Price data.

//                           FUNC  RETTYPE WINDOW
df.sma21 = df.close.rolling!(mean, double)(21);

To use the cusom function with rolling. Example Exponential Moving Average:

double[] ema(T)(T arr, int period)
{
    double prevEMA;
    auto multiplier = 2.0 / (period + 1);

    double emaFunc(int[] data)
    {
        if (isNaN(prevEMA))
        {
            prevEMA = mean(data);
            return prevEMA;
        }

        auto lastValue = data[$-1];

        prevEMA = lastValue * multiplier + prevEMA * (1 - multiplier);
        return prevEMA;
    }

    return arr.rolling!(emaFunc, double)(period);
}

void main()
{
    auto window = 2;
    auto arr = [10, 20, 30, 40, 50, 60, 70, 82, 91, 100];
    writeln("SMA: ", arr.rolling!(mean, double)(window));
    writeln("EMA: ", arr.ema(window));
}

1.1.0	2025-Oct-12
1.0.3	2024-Nov-18
1.0.2	2024-Oct-29
1.0.1	2024-Oct-28
1.0.0	2024-Oct-26

dataframes ~main