Performant Handling of Excel Files in PHP

Avatar von Martin Brotzeller

When you look at Excel solutions for PHP, the first thing you find is PHPExcel, digging a little bit further lets you find libXl. The prior is a pure PHP solution, the latter a native library for a couple of operating systems. There is a PHP extension that exposes this library to PHP, so you can do the same things with it that you can do with PHPExcel.

TL;DR: PHPExcel is nice, libXl with php_excel can do the same WAY faster. Use the Diagrams below to convince your customer/boss/CFO to invest in performance.

Recently, over at Sitepoint, Taylor Ren wrote about PHPExcel for generating Excel files with pure PHP, reacting to feedback demanding windows-free solutions. The same applied to us in a recent project. Reading and writing XLS aswell as XLSX files sent us to the same library, since it is state-of-the-art, robust, open-source and most importantly runs on linux servers.

We had one problem though: XLS files can get huge. Any modern pure PHP solution has to represent Cells and Rows and Sheets with a huge number of objects and will thus grow exponentially. But there is an alternative:
libXl offers the same range of Excel formats as PHPExcel (sans ODS, HTML and other non-proprietary types). It is available for Linux, Mac and Windows, so it can support pure Linux environments. For use in PHP, there is an extension that exposes all library functionality to PHP. If your customer requires you to read and write files that are generated and might be edited by typical corporate users, the task can be solved by either LibXl or PHPExcel. One is free to use, the other one costs between 200 and 1500 EUR – not that much in corporate terms, but you need to explain why the customer should spend his money.

So we did some benchmarking. First a couple of writes to a document with 5 sheets of 26 columns and 2000 rows.

wxlsruntimewxlsxruntime As you can see in the graphs (note the logarithmic y scale), both PHPExcel and libXl use an exponential amount of time, but libXl is more than one order of magnitude faster than PHPExcel.

Memory can not be compared directly unfortunately, since PHP can not measure what the system library uses. PHPExcel’s memory usage follows an exponential course with a peak value of 180 megabyte for 200000 writes to the XLSX document.

Next, reads are measured. We read from a document with 5 sheets and 20 columns.
rxlsruntimerxlsxruntime Similar to the writes, the runtime of both PHPExcel and libXl follow an exponential curve, again libXl is significantly better, this times almost two orders of magnitude. In absolute numbers, for 200000 reads from a XLSX document with 40000 entries PHPExcel needs 55 seconds, whereas libXl is done in 700 milliseconds.

rxlsmemory Reading lets libXl use some memory, so we at least have something to compare, although the difference can rather be used to determine what PHP uses to store the data, instead of the amount of memory used by the library itself.

Finally, we did some measurements of extreme cases, just to be able to reach the limits of what can be done.

Test-Case Number of Writes Execution Time (libxl) Memory Usage* (libxl) Execution Time (PHPExcel) Memory Usage* (PHPExcel)
1 sheet, 10 columns, 1.000 rows 10.000 58ms 262k / 28M** 599ms 14M / 40M**
1 sheet, 10 columns, 500.000 rows 5.000.000 34s 262k / 985M** 37min 4,2G / 4.2G**
1 sheet, 10 columns, 1.048.576 rows
(max rows since Excel 2007)
10.485.760 74s 262k / 2.1G** PHP Fatal error: Allowed memory size of 8589934592 bytes exhausted
1 sheet, 10 rows, 1.000 columns 10.000 58ms 262k / 27M** 567ms 14M / 40M**
1 sheet, 10 rows, 16.384 columns
(max columns since Excel 2007)
163.840 1.010ms 262k / 54M** 10s 148M / 170M**
1 sheet, 16.384 columns, 3.500 rows 57.344.000 8min 262k / > 8G** PHP Fatal error: Allowed memory size of 8589934592 bytes exhausted
1 sheet, 16.384 columns, 1.048.576 rows 17.179.869.184 Neither tool can handle this
10.000 sheets, 1 column, 1 row 10.000 124s 262k / 62M** 124s 93M / 123M**
20.000 sheets, 1 column, 1 row 20.000 10min 252k / 94M** 39min 188M / 220M**

* measurement with memory_get_peak_usage(true) gives an idea how much PHPExcel uses, libXl can’t be measured from within PHP.
** approximate overall memory use with ps

As you can see, due to a smaller memory footprint, libXl is slightly more useful when you have a special set of data to crunch. Apparently you should be able to solve problems with either by simply adding enough memory. PHPExcel is perfectly sufficient for all kinds of data that normally fits on your screen. For larger data sets, you can use libXl to boost your performance significantly.

Avatar von Martin Brotzeller

Kommentare

4 Antworten zu „Performant Handling of Excel Files in PHP“

  1. Lesenswert: Performant Handling of Excel Files in PHP http://t.co/DaYMsVm4HS

  2. Hey Martin,

    That’s a great article! If you are still looking for alternatives, you can take a look at Spout: https://github.com/box/spout.
    According to the measurements you did, Spout is way faster than libXL and only requires 10MB of memory to process any spreadsheet. It was designed to be able to handle large files so it does it pretty well, compared to other libraries. And it’s free :)

  3. @Adrien sagte I would like to give a try to Spout, however I don’t find any documentation of it except for the „Basic usage“ in Git, any help on that? Thanks.

    1. We do not have any affiliation with spout or it’s company, box. The readme states their support address, besides there are already questions about spout on stackoverflow – perhaps that would be a better place to get the answers you want.

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert


Für das Handling unseres Newsletters nutzen wir den Dienst HubSpot. Mehr Informationen, insbesondere auch zu Deinem Widerrufsrecht, kannst Du jederzeit unserer Datenschutzerklärung entnehmen.