Natural Language Understanding (NLU) with PHP

22. November 2018

Anyone can write an Alexa Skill in PHP, but what about NLU, natural language understanding? There are now some tools for this, but they were written in Python, as it is often the case in the machine learning environment. Below I show you how to do this with RASA NLU in PHP.

Why not use tools from Amazon or Google?

Of course, it would make life a lot easier, but nobody really knows about the lifetime of data in the networks of those global companies. And what about data protection?

So there can be only one solution from my point of view: set it up on your own by using an existing open source project. Unless your training data is too complex, it won’t really hurt you from a hardware point of view.

NLU vs. NLP – some basics

The first thing to know is that NLU is not NLP, which means natural understanding is not natural language processing. If you want to read more about the differences, I recommend this blog post. In brief, NLP is the complete ecosystem to do human-machine interaction, NLU is “only” the AI part in sorting unstructured data (text) and bring it into a

form machines can understand
evaluating an intent
extraction of meaningful entities
possibility to train models

Let’s dig a little bit deeper.

Intent

Understanding the intent – aka what does the other person mean – is something which is even quite tricky in human interactions. What is the value the other person wants to say? But that is a different story altogether.

At the moment we try to train machines to understand the intent of a sentence. Let’s take a look at the following example:

I have to stay at home until Friday.

This sentence can have multiple intents which makes it quite hard to understand:

The narrator has kids and no nanny for the upcoming week
The narrator is ill.
The narrator has to work remotely for some reason.

Those intents have almost nothing in common except the fact that the narrator is back on Friday.

Entity

For those intents, it is important to grab some useful values out of this sentence. Let’s take the example from above. What can be a value we should try to to get? It is “Friday,” I’d say.

Why? Depending on the intent, this information is really useful. So let us persist it in a kind of a variable. Say it is the last_day of something. Having an intent like a “sick report” this value can be used to create, for instance, a Jira ticket with the subject “Sick: Maximilian Berghoff [date of today] – [date of last day]”.

Model

However, to get a machine to understand our intent and grab the entities, we need to train it. For this blog post, I used RASA NLU.

The training data looks like this:

language: "en"

pipeline:
- name: "nlp_spacy"
  model: "de"
- name: "tokenizer_spacy"
- name: "ner_crf"
- name: "intent_featurizer_spacy"
- name: "intent_classifier_sklearn"

data: |
  ## intent:report_sick_duration
  - I will stay home for [3 days](duration)
  - I can not come the next [4 days](duration)
  - I will stay in bed for the next [5 days](duration)
  ## intent:report_sick_from_to
  - I will stay home until [friday](last)
  - My doctor suggests me to stay home until [friday](last)
  - I am absent from [monday](first) to [friday](last)
  - I will be back on [friday](last)


  ## lookup:duration
  - 1 days
  - 2 days
  - a week

  ## lookup:first
  - monday
  - tuesday
  - wednesday
  - thursday
  - friday
  

  ## lookup:last
  - monday
  - tuesday
  - wednesday
  - thursday
  - friday

Let’s look into the data key. Here you can find the intents. I split up the sick report into two different intents: One to notify your colleagues/boss that you are absent due to sickness from X to Y and one to give a duration (number of days).

Having two different intents is useful, as I can define two different sets of entities.

The first one: I need a date for the first and the last day, at least the first day should exist, so let’s assume today as the first day.

For the second entity, I only need the duration (and assume today as the first day).
In that example, I also described a kind of setting which values are possible. This will be enriched during the training process.

Although this looks like a kind of a pattern or algorithm to do the lookup for intent and entities, it isn’t.

In the end, the machine should recognize sentences which are not on this list. Until this is performed with acceptable confidence, we have to give more examples like those above.

Tool to use – RASA NLU

There are other software tools out there to do NLU, but I decided to do it with RASA NLU. It has all the features we need for our use case, and it is quite easy to install:

pip install rasa_nlu
# OR (to get a bleeding edge)
# git clone https://github.com/RasaHQ/rasa_nlu.git 
cd rasa_nlu
pip install -r requirements.txt
pip install -e .

You can pass/create some configurations now and start training. But this one has a little downside: it is written in Python. But what if you run a PHP application? For that, I recommend to use the HTTP API of RASA and include it into your application by doing simple curl requests.

PHP Integration

For a talk at PHP Central Europe, I prepared some code to show that is possible to integrate NLU into PHP without implementing NLU in PHP. You can check out the repository, and you should also have a look at rasa_client/lib. This code should be enough to do some basic requests and get meaningful models back. For our example (sick report) I also introduced a command-line application written in Symfony.

This is not mandatory at all, it is just the fastest way for me to call the given code and to get something readable.

To start working with that example, you should run both docker containers and enter the one for the app code:

$ cd examples/
$ docker-compose up -d
$ docker exec -it rasa-nlu-client sh
$ cd /app/src/
$ bin/console

The command bin/console should return a list looking like this:

rasa
  rasa:nlu:parse          Parse a given text for its intents.
  rasa:nlu:remove-model   Remove a training model.
  rasa:nlu:status         return the available projects and models
  rasa:nlu:train          Train a project by a well-defined training data. For the training data, you should have a look at https://rasa.com/docs/nlu/dataformat/

… which is an overview of the given commands. So then let’s do them.

status

$ bin/console rasa:nlu:status
Got following projects\ 

Project: sick_report
=======================

currently training:0
----------------------- 
Available Model        
----------------------- 
model_20181027-164038  
model_20181027-173358  
----------------------- 

----------------------- 
Loaded Model           
----------------------- 
model_20181027-173358  
-----------------------

Gives an overview of models and currently running training sets. Models listed in “Loaded Model” are loaded into memory, means you will get the fastest answers for.

train

rasa:nlu:train --project=illness_report data/config_train_illness_report.yml 

new model trained
=================

Created Model: model_20181029-062412

Posts a valid training data file into a project (I used the one I mentioned above) to train a new model. You can also mention a model by using –model to train an existing model.

Status (new model there)

# bin/console rasa:nlu:status
Got following projects\ 

Project: illness_report
=======================

currently training:0
 ----------------------- 
  Available Model        
 ----------------------- 
  model_20181027-164038  
  model_20181027-173358  
  model_20181029-062412
 ----------------------- 

 ----------------------- 
  Loaded Model           
 ----------------------- 
  model_20181027-173358  
 -----------------------

After you add new training data without mentioning the model the newly created model will be visible in the status list.

Parse

$ bin/console rasa:nlu:parse --project=illness_report "I will be absent due to sickness until friday"

Intent: report_illness_from_to - Confidence: 0.8078944273721
============================================================

Entities found:
 ------ -------- ------- ----- ----------- ------------------ 
  Name   Value    start   end   extractor   confidence        
 ------ -------- ------- ----- ----------- ------------------ 
  last   friday   20      26    ner_crf     0.93667437133644  
 ------ -------- ------- ----- ----------- ------------------ 


Ranking:
 ------------------------- ----------------- ------------ 
  Pos.                      Name              Confidence  
 ------------------------- ----------------- ------------ 
  report_illness_from_to    0.8078944273721               
  report_illness_duration   0.1921055726279               
 ------------------------- ----------------- ------------

Now you can start parsing text. You will get a statistic answer back. This means each intent you get is one with calculated confidence only. “0.8” is quite ok, but a little bit more training will increase your confirmation level. You also get an entity back if it has been defined in your training data.

Conclusion

As you can see it is possible to use PHP to implement NLU in your application and not having to draw back on Python, AWS and the like. And you don’t need to rely on web services where you have no control over your data. Instead, you can use tools like RASA NLU that are quite easy to implement and allow you to analyze text using AI without leaving the PHP context.

How about you, have you used PHP to implement NLU? If so, what for?

Ich arbeite als Developer seit Oktober 2014 bei Mayflower. In meiner Freizeit kümmere ich mich um das Symfony CMF. So bin ich beispielsweise Autor vom sogenannten SeoBundle.

Kommentare

Eine Antwort zu „Natural Language Understanding (NLU) with PHP“

Wie man PHP verwenden kann, um NLU ohne Python einzubinden.

14. Januar 2019

[…] Dieser Artikel erschien zuerst im Englischen, geschrieben von Maximilian Berghoff auf dem Blog von Mayflower. […]

Antworten

Schreibe einen Kommentar Antworten abbrechen

Newsletter

Aktuelle Artikel per Mail

Verpasse keinen Artikel mehr und lass Dich von uns benachrichtigen, sobald es etwas Neues im Blog gibt.

Für das Handling unseres Newsletters nutzen wir den Dienst HubSpot. Mehr Informationen, insbesondere auch zu Deinem Widerrufsrecht, kannst Du jederzeit unserer Datenschutzerklärung entnehmen.