PHPWord: 3 Was to create Word documents with PHPWord

Three ways to create Word documents with PHPWord

Avatar von Maria Haubner

Creating Microsoft Word documents in PHP can be a challenge. Word offers a multitude of options and settings and while creating a document in PHP you want do take advantage of those options to achieve a satisfying result. Perhaps you need to dynamically create documents for a client and the client will only know the capabilities of Mircosoft Word, but not the limitations of PHP libraries. This can result in an unsatisfied client.

In this article we will take a closer look at PHPWord and three different ways to create Word documents with it: basic easy templating, the creation of Word documents from scratch, and (going a little crazy there) the combination of both by merging existing templates with dynamically created documents. Hopefully, after reading through the text, you will have an idea of how to implement the perfect Word creator for your needs.

PHPWord

PHPWord is a part of the PHPOffice library which enables to read and write documents in different file formats from PHP. Supported formats are HTML, ODText, PDF, RTF and Word2007, which we will concentrate on. PHPWord is open source and currently available in version 0.14 on GitHub. Not every feature of Microsoft Word is implemented in PHPWord (yet), but the available features are numerous. Some of them will be presented or at least mentioned below using example code. For the examples to work, you of course need to include PHPWord into your project. So let’s dive in.

Basic and easy templating

PHPWord: Simple Template
Fig. 1: Basic Template created in Microsoft Word

PHPWord provides the possibility for easy to use templating. If you need to create simple documents in a static layout in which, for example, only the addressee changes, this will be your fastest way to go.

In fig. 1 you can see a template named Template.docx that was created in Microsoft Word. As you can see the date and address are made up out of placeholder strings in the form of ${placeholder}.

The PHP code you need do fill this template dynamically with your desired data is as follows:

$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor('Template.docx');
 
$templateProcessor->setValue('date', date("d-m-Y"));
$templateProcessor->setValue('name', 'John Doe');
$templateProcessor->setValue(
    ['city', 'street'],
    ['Sunnydale, 54321 Wisconsin', '123 International Lane']);
 
$templateProcessor->saveAs('MyWordFile.docx');
PHPWord: Template with replaced placeholders
Fig. 2: Template with replaced placeholders

Let’s take a closer look. You load your template into a new TemplateProcessor by calling the class and giving it the path to your template as the parameter. Now you can replace your placeholder strings with values. Note that the setValue command needs the placeholder name as first parameter, without the dollar sign and curly brackets used to mark them in the template.

After replacing all placeholders, you save the modified template and you are done. The result will come close to fig. 2.

The template processor can do more than to only set values, but it is rather limited. It is, for example, not possible to put multiple paragraphs into one of those placeholders. They only hold single line strings. If you want to take a closer look at the TemplateProcessor, check out the according docs.

The main advantage of this method is the freedom to create your templates in Microsoft Word. You are free to take full advantage of the settings Word offers and you are not limited by PHP. But if you need to insert multiple paragraphs or more complex structures like tables, this won’t be the way for you. So let’s take a look at the next method, let’s build documents completly in PHPWord.

Create a word document from scratch

To generate a Word document you need to create a PhpWord object which you then fill with the contents:

$phpWord = new PhpWord();

To create your Word file from this object you need to save it using the Word2007 Writer:

$objWriter = IOFactory::createWriter($phpWord, 'Word2007');
$objWriter->save('MyDocument.docx');

It’s no use saving an empty document, though, so let’s put some content in there.

PHPWord has two basic objects: containers and elements. Unsurprisingly, containers can hold elements as well as other containers. Some examples for containers are headers, footers, textruns (which represent paragraphs), tables, rows and cells. Elements are texts, links, images, checkboxes and more. Most containers and elements can receive style declarations which we will not cover here. Please read the docs for further information on styling.

The basic container that holds all the others is the section which has to be added directly to thePhpWord object we just created. So if you only want to produce a simple text document all you have to do is add some text to the section. Basically you need the following code:

$phpWord = new PhpOffice\PhpWord\PhpWord();
$section = $phpWord->addSection();
$section->addText('Hello World');
 
$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007');
$objWriter->save('MyDocument.docx');

And there you have your first document! Granted, it’s a blank page with „Hello World!“ as content. So let’s spice it up a bit and create header, footer and a body containing a table (check out the PHPWord Samples for the complete table example and others).

$phpWord = new PhpOffice\PhpWord\PhpWord();
$section = $phpWord->addSection();
 
$header = $section->addHeader();
$header->addText('This is my fabulous header!');
 
$footer = $section->addFooter();
$footer->addText('Footer text goes here.');
 
$textrun = $section->addTextRun();
$textrun->addText('Some text. ');
$textrun->addText('And more Text in this Paragraph.');
 
$textrun = $section->addTextRun();
$textrun->addText('New Paragraph! ', ['bold' => true]);
$textrun->addText('With text...', ['italic' => true]);
 
$rows = 10;
$cols = 5;
$section->addText('Basic table', ['size' => 16, 'bold' => true]);
 
$table = $section->addTable();
for ($row = 1; $row <= 8; $row++) { $table->addRow();
    for ($cell = 1; $cell <= 5; $cell++) { $table->addCell(1750)->addText("Row {$row}, Cell {$cell}");
    }
}
 
$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007');
$objWriter->save('MyDocument.docx');

Step by step: Let’s take a closer look at this

First we create our PhpWord object and add a section to it. Then we add a header to the section and add some text. The footer follows accordingly. Those two objects will then be displayed on every page of your document (as expected). But PHPWord of course provides more options for headers and footers. You can create a special header for the first page, you can add tables for more precise text positioning, your footer can show page numbers etc.

Next we add some text. As you can see in the example we add two textruns which in turn get added some text. These text parts could be styled individually as you can see in the second paragraph. The first text part („New Paragraph!“) is bold, while the next („With text…“) is italic. If you add text to the section directly PHPWord will create a textrun for you, so the text will be in a new paragraph. However, this paragraph will be in the given style and you can not emphazise parts of it through styling. You can see this in the example in the code line $section->addText('Basic table', ['size' => 16, 'bold' => true]);

Furthermore we add a table to the section. A table obviously needs some rows and cells. As we only want a basic table to show how it’s done we use a loop for adding eight rows to our table. Again in a loop we add five cells (1750 twips wide) to each row displaying the row and cell count as text.

Finally, we save the PhpWord object using the Word2007 writer. This is the resulting document MyDocument.docx (fig. 3)

PHPWord: A table with eight rows and five cells
Fig. 3: A table with eight rows and five cells

As you can see, PHPWord is straight forward. You always start with a new PhpWord object and add a section. This section holds all the elements of your document, be it a header, footer or other elements you want displayed in the document body.

It’s up to you, how complex it gets. PHPWord provides possibilities for lists, charts, watermarks, images, interactive elements like input fields and checkboxes and much more. Check out the docs and look through the samples in the GitHub repository.

The advantage of this approach is the ability to dynamically create very complex documents only using PHP. You are completely free to style and arrange your content and all is handled in your code. The disadvantage is that managing all the detailed settings (like margins of pictures, logos, page margins, text styles and so on) of a Word document in PHPWord can lead to messy code if you don’t pay close attention from the start. Especially if you want to create your documents in different styles for different clients.

It may be more desirable for you to concentrate only on creating your content in PHP, while keeping template files in your clients designs on your server. This can be achieved with merging …

Go crazy and merge pre-existing templates with created files!

Imagine you need to create documents, that contain complex content and need to be in the corporate design of your different recipients or customers. Ideally you can create the document content dynamically in PHP using PHPWord and merge this content into existing corporate design templates stored on your server. This way, if the corporate design of your customers changes, you only need to replace the template, while your content generation remains untouched.

Strictly speaking this method has nothing to do with PHPWord, but combining the benefits of the previously introduced methods of templating and content creation seems the logical next step.

Let’s take a general look at how the merging can be done taking advantage of the XML structures of Word documents. A .docx file can be unpacked like a .zip archive with an archive tool like 7zip. If you unpack a .docx file you will see the following base structure:

- /_rels
- /docProps
- /word
|- /_rels
|- /theme
|- document.xml
|- fontTable.xml
|- numbering.xml
|- settings.xml
|- styles.xml
|- stylesWithEffects.xml
|- webSettings.xml
- [Content_Types].xml

There might be additional files, for example a header1.xml or a footer1.xml, or additional folders, but for our example we concentrate on the base files in the /word folder.

All your documents content can be found in the document.xml. If you take a look you’ll see something like that:


    
        
            
                
                Some text. 
            
            
                
                And more Text in this Paragraph.
            
        
        
            ...
        
        ...
        
            ...
        
    


You see a XML structure with the root node beeing w:document. It contains a w:body that in turn contains all the paragraphs w:p.

In your templates you need to mark the place were the content should be merged into. You can do this by putting a placeholder variable in there, e. g. $CONTENT$. Now you can use an XPath parser, like PHPs own DOMXPath class, to locate this placeholder. As text in a Word file is always stored in a w:p element, you only need to find the w:p tag containing your placeholder. The XPath to query for the placeholder is something like

//w:p[contains(translate(normalize-space(), " ", „“),’$CONTENT$’)]

In the XML of your dynamically created document all your content can be found in via the Xpath //w:document/w:body/*[not(self::w:sectPr)].

So let’s merge the content of our Table example from the last chapter into a nice template with a header, footer and watermark in the background (fig. 4).

PHPWord: A template with header, footer, and watermark
Fig. 4: A template with header, footer, and watermark

This template is named MergeTemplate.docx. You can see that it only contains the $CONTENT$-marker. The code you need to merge the contents from MyDocument.docx (meaning the two paragraphs and the table) into this template is as follows:

$templateFile  = "MergeTemplate.docx";
$generatedFile = "MyDocument.docx";
$targetFile    = "MergeResult.docx";
 
// copy template to target
copy($templateFile, $targetFile);
 
// open target
$targetZip = new \ZipArchive();
$targetZip->open($targetFile);
$targetDocument = $targetZip->etFromName('word/document.xml');
$targetDom      = new DOMDocument();
$targetDom->loadXML($targetDocument);
$targetXPath = new \DOMXPath($targetDom);
$targetXPath->registerNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
 
// open source
$sourceZip = new \ZipArchive();
$sourceZip->open($generatedFile);
$sourceDocument = $sourceZip->getFromName('word/document.xml');
$sourceDom      = new DOMDocument();
$sourceDom->loadXML($sourceDocument);
$sourceXPath = new \DOMXPath($sourceDom);
$sourceXPath->registerNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
 
/** @var DOMNode $replacementMarkerNode node containing the replacement marker $CONTENT$ */
$replacementMarkerNode = $targetXPath->query('//w:p[contains(translate(normalize-space(), " ", ""),"$CONTENT$")]')[0];
 
// insert source nodes before the replacement marker
$sourceNodes = $sourceXPath->query('//w:document/w:body/*[not(self::w:sectPr)]');
 
foreach ($sourceNodes as $sourceNode) {
    $imported = $replacementMarkerNode->ownerDocument->importNode($sourceNode, true);
    $inserted = $replacementMarkerNode->parentNode->insertBefore($imported, $replacementMarkerNode);
}
 
// remove $replacementMarkerNode from the target DOM
$replacementMarkerNode->parentNode->removeChild($replacementMarkerNode);
 
// save target
$targetZip->addFromString('word/document.xml', $targetDom->saveXML());
$targetZip->close();

First you create the target file by copying the template file to the name MergeResult.docx. You can skip this step if you don’t need to keep your template file. Next you open your template and the source for the content with ZipArchive or another library of your liking. To find the DomNode containing your $CONTENT$-placeholder, you use the XPath query introduced earlier.

To fetch the contents from MyDocument.docx you want to get all nodes in the document body that are not section properties (sectPr – those contain for example style informations). Then you take your $CONTENT$-node as reference and insert the found contens before it. Lastly you have to delete your marker node and save the XML.

PHPWord: Dynamically created contet in a Word-created template
Fig. 5: Dynamically created contet in a Word-created template

And finally our dynamically created content is merged into our nice template.

There are some stepping stones, though, depending on how complex your merge project gets. We won’t go into the specifics here, because the XML of Word documents is a whole article of its own, but let’s list some of them:

  • Styles and settings in Word documents are referenced by IDs. Of course those differ in your template and your generated Word. So when merging the two you need to use the correct IDs to reference styles and settings to text.
  • Word breaks its text strings up for version control purposes. So make sure your placeholder variable in the templates is not broken up before attempting to replace it.
  • In the /word/document.xml of your template you’ll find the section properties to your placeholder section. Copy those to every section of your generated content.
  • Remember to copy the styling files from your generated Word to the final merge result.
  • If your generated document contains pictures or other external content, remember to copy those files into your target Word document and make sure the IDs are still correct.
  • Word is very complex in styling lists. So if your content contains styled lists, it’s easier to tell PHPWord to do the styling and copy the corresponding files rather than trying to have the styling defined in the template files and linking it to your lists.

Obviously the devil is in the details. There may be some more difficulties while merging depending on your use case, but this article only wants to give you the general idea of how it can be done. The basic merging example here doesn’t touch on styled text or media. Dive into the XML structure of Word documents to make sure you understand how your merged document has to be put together.

Final thoughts

It is worth mentioning that at the moment of writing this article PHPWord is still in version 0.14. so you may encounter functionality in Word that is not yet implemented in the library. Additionally, the GitHub repository was without a maintainer for a while so there are some pending pull request. But a new maintainer has been found so work on this library continues.

But: PHPWord is a powerful tool to create Word documents and hopefully one of the described methods will help you generate your perfect Word documents.

unsplash-logoKelly Sikkema

Software-Modernisierung

Avatar von Maria Haubner

Kommentare

16 Antworten zu „Three ways to create Word documents with PHPWord“

  1. Three ways to create Word documents with PHPWord https://t.co/p3NN530ItE via @mayflowergmbh

  2. Thank you. Good Article, I was trying to use 3rd option for merging template (docm) and Word Docx file to a „docm“ targetFile. Merge is Successfull, on opening targetFile in MS WORD resulting into following Errors.
    1) We’re sorry, We can’nt open „file.docm“ because we found a problemwith it’s content.
    On selecting ok, i get
    2) Word found unreadable content in „file.docm“, Do you want to recover the contents of this document? If you trust the source of this document, click yes.
    On clicking yes „docm“ opens.

    What might be the issue in merging and i also want to copy media files also.

  3. Thanks for this tutorial. Been looking for an easier integration for my site to allow clients to download content. I hope this will work.

  4. Avatar von Maria Haubner
    Maria Haubner

    Hey Arun,
    thanks for your feedback!

    As general advise when using the 3rd option for merging documents, I can only encourage you to delve deeper into Microsofts XML structure (the link is in the article). That’s also what you need to do to include media files like pictures into the merging process. Going into detail on this topic would greatly exceed this answer here.

    Good luck :)

  5. Avatar von Richard

    There is an error in the table creation:
    for ($row = 1; $row >= 8; $row++)
    should be:
    for ($row = 1; $row = 5; $cell++)
    should be
    for ($cell = 1; $cell <= 5; $cell++)

  6. Avatar von Richard

    Not sure why but the blog removes some code – anyway check these lines

  7. Avatar von Joan Varon
    Joan Varon

    This article was awesome, thanks!

  8. Avatar von Daniel

    I LOVE YOU !!!!

    Save my life !!!

  9. Avatar von Pintér Ferenc
    Pintér Ferenc

    Thank You for the valuable content! I have only one question: Trying the third method I experienced that formattings of MyDocument.docx can disappear in the MergeResult.docx target file. Can You tell me what should I change in the code to keep the formattings? Many thanks in advance!

  10. Hey Pintér Ferenc
    I’m glad to hear the article could help you! :) As stated in the article, styles and settings are referenced by IDs. Have you checked that those IDs are the correct ones in the resulting docx? I remember that issue being a bit finicky but if the IDs were referencing the right styles and the referenced xml has been copied, it should work (assuming word didn’t change its inner workings).
    I hope you can fix your issue and good luck with your project!

    1. Avatar von Pintér Ferenc
      Pintér Ferenc

      Thanks for the quick answer! Sorry, I am not familiar with xml, so can You be a bit more specific on this „id“ topic? Where can I find the ids and they look like? If I unzip my MergeResult.docx target file, In the document.xml part I see the code detail that probably creates the table (I am just guessing) Unfortunately the table loses its original formats (border color, bgcolor of head cell) and appears as a no-border, no bgcolor table. The code part reads as follows:

      …GOOD AND BADRow …

      Is this the part I need to modify? Can You tell me where I find the crucial ids in it?

      Many thanks in advance!

      1. Avatar von Pintér Ferenc
        Pintér Ferenc

        The code is not displayed correctly :(((

        1. Here you can find documentation on the XML of Word documents at the time I was writing the article: https://docs.microsoft.com/en-us/previous-versions/office/developer/office-2007/bb266220(v=office.12)?redirectedfrom=MSDN#open-packaging-conventions-for-the-word-xml-format
          As you can see, there should be a styles.xml. Those styles should have an ID property. If the structure has changed since I wrote the article, then I can not advise you on how to handle that.

          I know there is a lot to untangle in the XML files and corresponding documentation, but I highly advise you to read up on that. Apart from that it could help if you unzip your template.docx and compare the xml files with your result.
          As stated in the article, going into deeper detail on that topic is a whole new article on its own…

          Sorry that I can not be of more help to you but since writing the article I actually didn’t have to deal with Word XML so I can’t give you working sandbox or something like that…

          1. Avatar von Pintér Ferenc
            Pintér Ferenc

            Thank You very much!

  11. Avatar von Martin

    Source document had a bullet numbering after merge with template docx using XML bullets were converted into numbering.

    Thanks in advance!

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert


Für das Handling unseres Newsletters nutzen wir den Dienst HubSpot. Mehr Informationen, insbesondere auch zu Deinem Widerrufsrecht, kannst Du jederzeit unserer Datenschutzerklärung entnehmen.