Banishing Loops

PHP has a number of functions for manipulating arrays of data, which will almost always achieve a result quicker than creating a solution in PHP itself, often involving one or more nested loops. Part of the reason for this is that PHP is interpreted whereas the underlying PHP functions are compiled. 

Output data as csv

A common solution for converting an associative array in to a format which can be returned as a CSV file is to use one loop to output each value followed by a comma nested inside another loop to traverse the rows.

$dataset = [
    ['a', 'b', 'c'],
    ['d', 'e', 'f'],
    ['g', 'h', 'i'],
];

foreach ($dataset as $row) {
    $outputRow = '';
    foreach ($row as $value) {
        $outputRow .= (empty($outputRow)) ? $value : ",$value";
    }
    $outputCSV .= $outputRow . "\n";
}

This can be improved quite easily by switching the internal loop for a call to implode:

foreach ($dataset as $row) {
    $outputRow = implode(',',$row);
    $outputCSV .= $outputRow . "\n";
}

But we can take this a step further to remove both loops, using array_map() to execute a callable on each row and array_fill() to provide the separator to the implode() function, followed by a second call to implode():

$outputRows = array_map(
    'implode', 
    $dataset, 
    array_fill(0, count($dataset), ',')
);

$outputCSV = implode("\n", $outputRows);

Whats happening here is array_fill() is creating an array of the same length as $dataset where each value = ‘,’. array_map() is calling implode() on each row and supplying the value at the same index of the result from the call to array_fill() as the first parameter – i.e.

$param = array_fill(0, count($dataset), ',');
for ($i = 0; $i < count($dataset); $i++) {
    $output .= implode($param[i], $dataset[i]);
}

As a footnote, if there is a possibility that empty rows may occur in the data set, the following can be used to remove them:

$dataset = array_filter($dataset);  

As oppose to building csv output to be returned in the browser it may be preferable to output to file, if so then use fputcsv() in place of implode, passing in the file handle as the first parameter, Note the order of the parameters passed to array_map(), after the first argument which is the name of the callable function, the following parameters are the values to be passed to the callable in the order expected by the callable: 

$dataset = [
    ['a', 'b', 'c'],
    ['d', 'e', 'f'],
    ['g', 'h', 'i'],
];

$file = fopen('out.csv', 'w');

$outputRows = array_map(
    'fputcsv', 
    array_fill(0, count($dataset), $file),
    $dataset
);

fclose($file);

Sorting

There are numerous sorting functions provided by PHP which will sort() an array,  or in reverse, rsort(). Sort an array maintaining the key mappings, asort() and arsort(). Or sort an array by the keys as oppose to its values with ksort() and krsort().

How to sort an associative array containing a dataset, e.g. [date, clicks, impressions, acquisitions]? For this task the function in the toolbox to reach for is the array_multisort(). This function will sort an associative array on a specified column, or columns:

$dataset = [
    [1,3,1], [2,3,2], [2,2,3],
    [1,2,1], [3,1,2], [1,3,3]
];

$sortColumn = array_column($dataset, ‘date’);
array_multisort($sortColumn, $dataset);

It can also be given a sort order and sort flags, i.e. whether the data should be treated as numeric int or string value:

array_multisort(
    array_column($dataset, 2),  //sort 3rd column, $dataset[2]
    SORT_ASC,
    SORT_NUMERIC
);

As hinted above this method can also be used to sort on multiple columns, e.g.  date and impressions, this could be descending order for date and then ascending order for impressions. As many columns as are present in the dataset can be added for sorting on.

array_multisort(

    array_column($dataset, 2),
    SORT_ASC,
    SORT_NUMERIC,

    array_column($dataset, 1),
    SORT_ASC,
    SORT_NUMERIC,

    $dataset
);

DataSet to [Key] => value

This derived from a requirement to pull a number of rows from the database and needing to map one field from each row as the key for a second field in each row. This is reasonably simple to do using a loop by looping over each row and building an array as :

for ($row in $resultSet) { 
     $array[$row[‘field1’] = $row[‘field2’];
}

For smaller result sets this is fine, but may not scale so well when the result set gets large, an alternative more performant solution can be achieved by combining array_combine() which creates an array using 2 other arrays for the keys and values respectively and array_column() which extracts a column from an associative array:

$keyValueArray = array_combine(
    array_column($resultSet, ‘field1’),
    array_column($resultSet, ‘field2’)
)

Slicing horizontally

It’s simple enough to slice an array vertically using the array_pop(), array_shift(), array_slice(), array_splice() or array_chunk() to split an array in to parts and then use array_push(), array_unshift() or array_merge() to construct an array from any number of other arrays (preferably with the same number of columns).  However it may not be quite so obvious to achieve a similar result horizontally.

Say we have a dataset of username and ages, and both columns are formatted as strings although the age column contains numeric data. If we require the age column to be treated as an integer, possibly because we intend to send this data out via an api, we need to convert them so that they will not be surrounded by quotes in the output. One option would be to loop over each row and build a new array casting the age as an int as we go, here is an alternative approach :

$a = [
    ['Username 1', '25', 'user1@email.com'],
    ['Username 2', '46', 'user2@email.com'],
    ['Username 3', '23', 'user3@email.com']
];

$ages = array_column($a, 1);
$agesInt = array_map('intval', $ages);

$dataset = array_map(
    null,
    array_column($a, 1),
    $agesInt,
    array_column($a, 2)
);

print json_encode($dataset);

 What this does is extract the 1st column, the one with the ages, and runs intval() on each element in that vector. Then by passing null as the callback to array_map() it splices the given arrays together horizontally, refoming the original dataset but replacing the age column with the parsed one.

Non utf8 characters passed to json_encode()

A similar problem can arise when converting an array of data to JSON if the dataset contains some non-UTF8 characters, which results in json_encode() returning an empty string. The issue can be caught by checking the output of json_last_error(), from PHP 7.3 it will alsos throws an exception which can be caught in a try … catch. In order to resolve this issue it is necessary to traverse the array of data and utf8 encode the values, if the column(s) are known this can be achieved in a similar manner as the previous example, calling utf8_encode in place of intval:

$a = [
    ['Username 1', '25', 'user1@email.com'],
    ['Username 2', '46', 'user2@email.com'],
    ['Username 3', '23', 'user3@email.com']
];

$ages = array_column($a, 1, 0);
$agesInt = array_map('intval', $ages);

foreach($dataset as $key => $row) {
    $dataset[$key][1] = $agesInt[$row[0]];
}

—-

$dataset = array_walk(
    null,
    array_column($a, 1),
    $agesInt,
    array_column($a, 2)
);

print json_encode($dataset);
$utf8EncodedOutput = array_map(
    "utf8_encode", 
     array_column($dataSet, 'columnToParse')
); 

array_column() returns the column to be encoded, ‘name’, and the resulting output can then be merged back in to the original dataset. N.B.  array_column() takes an optional 3rd parameter to specify another column from the dataset to use for the index for the resulting vector array, which can be ussful if a loop is being used to generate the final output. 

N.B. This does not maintain keys, if you still requre to maintain keys a third parameter can be passed to array_column() to specify a column to use to index the output array, this should be a column which is unique to the rows in the dataset. The original dataset can then be traversed using a loop, or using a lambda call back passed to array_walk() in order to replace the values in the dataset with the converted ones in the parsed column:

$a = [
    ['k1' => 'Username 1', 'k2' => '25', 'user1@email.com'],
    ['k1' => 'Username 2', 'k2' => '46', 'user2@email.com'],
    ['k1' => 'Username 3', 'k2' => '23', 'user3@email.com']
];

$ages = array_column($a, 'k2', 'k1');
$agesInt = array_map('intval', $ages);

// foreach($a as $key => $row) {
//    $a[$key]['k2'] = $agesInt[$row['k1']];
// }

array_walk($a, function(&$row) use ($agesInt) {
    $row['k2'] = $agesInt[$row['k1']];
});

print json_encode($a);




Other Useful Array functions

Conclusion

There are many ways to skin a cat, but generally loops may provide a quick solution to a problem which can later become a performance issue simply because more data === more iterations === more time to process. PHP’s provided utility functions can handle much of the heavy lifting for us and should be the first port of call before employing native solutions, whether home grown or imported as a package.