ankane/libmf

Large-scale sparse matrix factorization for PHP

v0.2.0 2024-06-03 02:59 UTC

This package is auto-updated.

Last update: 2024-10-09 08:55:13 UTC


README

LIBMF - large-scale sparse matrix factorization - for PHP

Check out Disco for higher-level collaborative filtering

Build Status

Installation

Run:

composer require ankane/libmf

Add scripts to composer.json to download the shared library:

    "scripts": {
        "post-install-cmd": "Libmf\\Vendor::check",
        "post-update-cmd": "Libmf\\Vendor::check"
    }

And run:

composer install

Getting Started

Prep your data in the format rowIndex, columnIndex, value

$data = new Libmf\Matrix();
$data->push(0, 0, 5.0);
$data->push(0, 2, 3.5);
$data->push(1, 1, 4.0);

Create a model

$model = new Libmf\Model();
$model->fit($data);

Make predictions

$model->predict($rowIndex, $columnIndex);

Get the latent factors (these approximate the training matrix)

$model->p();
$model->q();

Get the bias (average of all elements in the training matrix)

$model->bias();

Save the model to a file

$model->save('model.txt');

Load the model from a file

$model = Libmf\Model::load('model.txt');

Pass a validation set

$model->fit($data, $validSet);

Cross-Validation

Perform cross-validation

$model->cv($data);

Specify the number of folds

$model->cv($data, 5);

Parameters

Pass parameters - default values below

use Libmf\Loss;

new Libmf\Model(
    loss: Loss::RealL2,     // loss function
    factors: 8,             // number of latent factors
    threads: 12,            // number of threads used
    bins: 25,               // number of bins
    iterations: 20,         // number of iterations
    lambdaP1: 0,            // coefficient of L1-norm regularization on P
    lambdaP2: 0.1,          // coefficient of L2-norm regularization on P
    lambdaQ1: 0,            // coefficient of L1-norm regularization on Q
    lambdaQ2: 0.1,          // coefficient of L2-norm regularization on Q
    learningRate: 0.1,      // learning rate
    alpha: 1,               // importance of negative entries
    c: 0.0001,              // desired value of negative entries
    nmf: false,             // perform non-negative MF (NMF)
    quiet: false            // no outputs to stdout
);

Loss Functions

For real-valued matrix factorization

  • Loss::RealL2 - squared error (L2-norm)
  • Loss::RealL1 - absolute error (L1-norm)
  • Loss::RealKL - generalized KL-divergence

For binary matrix factorization

  • Loss::BinaryLog - logarithmic error
  • Loss::BinaryL2 - squared hinge loss
  • Loss::BinaryL1 - hinge loss

For one-class matrix factorization

  • Loss::OneClassRow - row-oriented pair-wise logarithmic loss
  • Loss::OneClassCol - column-oriented pair-wise logarithmic loss
  • Loss::OneClassL2 - squared error (L2-norm)

Metrics

Calculate RMSE (for real-valued MF)

$model->rmse($data);

Calculate MAE (for real-valued MF)

$model->mae($data);

Calculate generalized KL-divergence (for non-negative real-valued MF)

$model->gkl($data);

Calculate logarithmic loss (for binary MF)

$model->logloss($data);

Calculate accuracy (for binary MF)

$model->accuracy($data);

Calculate MPR (for one-class MF)

$model->mpr($data, $transpose);

Calculate AUC (for one-class MF)

$model->auc($data, $transpose);

Example

Download the MovieLens 100K dataset and use:

$trainSet = new Libmf\Matrix();
$validSet = new Libmf\Matrix();

if (($handle = fopen('u.data', 'r')) !== false) {
    $i = 0;
    while (($row = fgetcsv($handle, separator: "\t")) !== false) {
        $data = $i < 80000 ? $trainSet : $validSet;
        $data->push($row[0], $row[1], $row[2]);
        $i++;
    }
    fclose($handle);
}

$model = new Libmf\Model(factors: 20);
$model->fit($trainSet, $validSet);

echo $model->rmse($validSet), "\n";

Resources

History

View the changelog

Contributing

Everyone is encouraged to help improve this project. Here are a few ways you can help:

To get started with development:

git clone https://github.com/ankane/libmf-php.git
cd libmf-php
composer install
composer test