mdcass / file-validator
A Laravel package for applying validation rules to CSV and XLS files
This package's canonical repository appears to be gone and the package has been frozen as a result.
Requires
- mdcass/support: ^2.1
- vlucas/valitron: ^1.3
Requires (Dev)
- phpunit/phpunit: ^5.7
This package is not auto-updated.
Last update: 2020-11-08 19:05:52 UTC
README
Overview
A Laravel package for applying validation rules to CSV and XLS files. Provides an interface for making sure files you're reading in your applications are valid against your defined schema.
The class loads a file from the local disk, accepts several options to validate against, then provides a collection of validation results.
use Mdcass\FileValidator\Facades\FileValidator; $results = FileValidator::load('file.csv')->execute(); $results->file_exists; // true
Installation
Installation instructions are for Laravel 5.3
Require the library
composer require mdcass/file-validator
Add the Service Provider to the providers
list in config/app.php
// config/app.php 'providers' => [ ... /* * Third Party Service Providers */ Mdcass\FileValidator\FileValidatorServiceProvider::class ]
Optionally alias the Facade in the aliases
list in config/app.php
// config/app.php 'aliases' => [ ... 'FileValidator' => Mdcass\FileValidator\Facades\FileValidator::class ]
The package loads files relative to the path set at config('filesystems.disks.local.root')
.
Validating and Reading
The two methods below represent the suggested implementation for this library.
Validating an Entire File
The execute
method on the validator facade uses the configuration you specify to return a ResultsBag.
$results = FileValidator::load('relative/file/path.csv') ->rules([ 'required' => [ ['date'] ], 'date' => [ ['date'] ], 'in' => [ ['company', ['Schoen-Kertzmann']] ], 'lengthMax' => [ ['company', 255] ] ]) ->execute(); $results->valid; // true
Traversing a File
The reader
attribute on the validator facade represents the class's underlying Reader class (the configuration of which can be specified in the same fashion as when setting the header row).
The following example uses the chunk
method on the underlying library to get results in a more efficient way from a large file.
$validator = FileValidator::load('accessor.csv') ->aliases([ 'date' => 'datum' ]) ->rules([ 'required' => [ ['datum'] ], ]); // The $validator is passed by reference into the chunk callback, // as each $row it is given to validate will contribute to the internal // results (such as the total row_count, valid_row_count, and invalid_row_count) $validator->reader->chunk(1000, function($row) use (&$validator) { // Pass the row into the validator to validate, as well as set // any aliased rows and accessors $results = $validator->validateRow($row); $results; // ['valid' => true, 'row' => Mdcass\FileValidator\Support\Row] $results['row']->toArray(); // ['date' => '2016-11-23', ..., 'datum' => '2016-11-23'] }, false); // Third parameter prevents the underlying library for queueing the chunk $results = $validator->results; $results->valid // true
Shorthand Instantiation
The class can be instantiated by passing an array to the constructor of the methods to call.
$validator = new FileValidator([ 'load' => 'simple.csv', 'rules' => [ 'required' => [ ['date'] ] ], 'distinct' => ['randomelementab'], ]); $results = $validator->execute();
Methods
Method | Params | Description |
---|---|---|
load() | string | Load a file from a relative path |
config() | array | Set config variables on the underlying library |
sheet() | string | Set the sheet on an Excel file |
rules() | array | Set validation rules |
distinct() | array | Set fields to retrieve distinct values |
aliases() | array | Set field aliases |
accessors() | array | Set field accessors |
execute() | Validate the entire file |
Attributes
Attribute | Returns | Description |
---|---|---|
results | ResultsBag | Validation results and distinct values |
Examples
File Exists
Verify the file exists. The path passed to the load
method should be relative to the Laravel Storage config variable at config('filesystems.disks.local.root')
.
$results = FileValidator::load('simple.csv')->execute(); $results->file_exists; // true $results->has_data; // true
The
has_data
flag in the results will befalse
if only the header exists in the sheet
Set Header Row
Set the header row to use
$results = FileValidator::load('simple.csv') ->config(['start_row' => 2]) ->execute(); $results->has_data; // true
The
config
method customises the attributes on the underlying Excel manipulation package, and you can see what can be configured here at Maatwebsite/Laravel-Excel@2.1
Specifying a sheet
$results = FileValidator::load('multisheet.xls') ->sheet('Second Sheet') ->execute(); $results->has_data; // true
The
has_data
flag in the results will befalse
if the sheet doesn't exist
Validation Rules
Validation rule reference available at the excellent library vlucas/valitron
simple_rules.csv
date | company | randomDigit | randomFloat | |
---|---|---|---|---|
2016-11-30 | Schoen-Kertzmann | bbruen@yahoo.com | 2 | 1.023789456 |
2016-12-01 | Aufderhar LLC | smitham.gayle@yahoo.com | 7 | 183601.189291 |
$results = FileValidator::load('simple_rules.csv') ->rules([ 'required' => [ ['date'] ], 'date' => [ ['date'] ], 'in' => [ ['company', ['Schoen-Kertzmann']] ], 'lengthMax' => [ ['company', 255] ] ]) ->execute(); $results->valid; // false $results->row_count; // 2 $results->valid_row_count; // 1 $results->invalid_row_count; // 1 $results->errors; // ['company' => 1]
Retrieving Distinct Values
The library can collect distinct values of fields in the file which may be useful for further validation outside of the library.
Any fields set to return distinct values are added as
required
in the validation rules
simple_distinct.csv
date | company | randomDigit | randomFloat | |
---|---|---|---|---|
2016-11-30 | Schoen-Kertzmann | bbruen@yahoo.com | 2 | 1.023789456 |
2016-12-01 | Aufderhar LLC | smitham.gayle@yahoo.com | 7 | 183601.189291 |
2016-12-01 | Adel Sef | rediase@yahoo.com | 74 | 56.937856 |
$results = FileValidator::load('simple_distinct.csv') ->distinct(['date']) ->execute(); $results->values->date; // ['2016-11-30', '2016-12-01']
Accessors
You may specify a callback for a field to manipulate the value returned from the row.
Validation happens on the value returned from the callback
accessors.csv
date | company | randomDigit | randomFloat | |
---|---|---|---|---|
2016-11-30 | Schoen-Kertzmann | bbruen@yahoo.com | 2 | 1.023789456 |
2016-12-01 | Aufderhar LLC | smitham.gayle@yahoo.com | 7 | 183601.189291 |
2016-12-01 | Adel Sef | rediase@yahoo.com | 74 | 56.937856 |
// define an accessor for the date field to change the format from Y-m-d to m/d/Y $validator = FileValidator::load('accessor.csv') ->accessors([ [ 'field' => 'date', 'callback' => function($val) { return \Carbon\Carbon::parse($val)->format('m/d/Y'); }, ], ]); // Retrieve a row from the underlying reader and // pass it to the $validator for a validation result $row = $validator->reader->take(1)->get(); $res = $validator->validateData($row); $res['row']->date; // 30/11/2016
Aliases
You may alias an accessor with a given callback, and validate it separately to the original field.
$validator = FileValidator::load('accessor.csv') ->accessors([ [ 'field' => 'date', 'callback' => function($val) { return 1; }, 'alias' => 'datum', ], ]) ->rules([ 'required' => [ ['datum'] ], 'numeric' => [ ['datum'] ], ]); $row = $validator->reader->take(1)->get(); $res = $validator->validateData($row); $res['valid']; // true $res['row']->toArray(); // ['date' => '2016-11-23', ..., 'datum' => 1]
Results Bag
Available Attributes
The execute
method returns a class Mdcass\FileValidator\ResultsBag
which has the following attributes:
attribute | type | Description |
---|---|---|
valid | boolean | Whether the file exists, has data, and has no validation errors |
file_exists | boolean | Whether the file exists |
has_data | boolean | Whether the file has data |
row_count | integer | Total rows (not including the header row) |
valid_row_count | integer | Total valid rows |
invalid_row_count | integer | Total invalid rows |
errors | array | List of invalid fields and the number of rows that failed validation |
values | ValuesBag | Where distinct values are collected |
timers | Support/ScriptExecution | Where timers are tracked |
Timers
The Results Bag also contains a timers
attribute, a list of script execution times. Each item in the list contains the start
and end
unix timestamps with microseconds, as well as the result
in seconds.
name | description |
---|---|
load | Time to load the file |
has_data | Time to establish there is data to process in the file |
execute | Time to traverse the file, and validate against the set rules and collect distinct values |
$results = FileValidator::load('file.csv')->execute(); $results->timers->load; // [ 'start' => 1482743338.946, 'end' => 1482743338.9575, 'result' => 0.011478900909424 ] var_dump($results->timers->toArray()); // ['load' => [...], 'has_data' => [...], 'execute' => [...]]
FAQ
Can the library load files from a stream, or can a storage disk be specified?
No, the file must exist on the local disk. Whilst out of scope of this package, another package in the future will contain this functionality (however will simply by a hacky store locally, validate, delete as per this GitHub issue discussion on this package's underlying file reading package dependency)
Contributing
Testing
Package testing follows this article. To test:
$ cd /path/to/package/mdcass/file-validator
$ ../../../vendor/bin/phpunit
Tests create dummy files of data in the location defined in the package's config file config/file-validator-testing.php
the variable storage_path
.