Anyone can write an Alexa Skill in PHP, but what about NLU, natural language understanding? There are now some tools for this, but they were written in Python, as it is often the case in the machine learning environment. Below I show you how to do this with RASA NLU in PHP.
Why not use tools from Amazon or Google?
Of course, it would make life a lot easier, but nobody really knows about the lifetime of data in the networks of those global companies. And what about data protection?
So there can be only one solution from my point of view: set it up on your own by using an existing open source project. Unless your training data is too complex, it won’t really hurt you from a hardware point of view.
NLU vs. NLP – some basics
The first thing to know is that NLU is not NLP, which means natural understanding is not natural language processing. If you want to read more about the differences, I recommend this blog post. In brief, NLP is the complete ecosystem to do human-machine interaction, NLU is “only” the AI part in sorting unstructured data (text) and bring it into a
- form machines can understand
- evaluating an intent
- extraction of meaningful entities
- possibility to train models
Let’s dig a little bit deeper.
Intent
Understanding the intent – aka what does the other person mean – is something which is even quite tricky in human interactions. What is the value the other person wants to say? But that is a different story altogether.
At the moment we try to train machines to understand the intent of a sentence. Let’s take a look at the following example:
I have to stay at home until Friday.
This sentence can have multiple intents which makes it quite hard to understand:
- The narrator has kids and no nanny for the upcoming week
- The narrator is ill.
- The narrator has to work remotely for some reason.
Those intents have almost nothing in common except the fact that the narrator is back on Friday.
Entity
For those intents, it is important to grab some useful values out of this sentence. Let’s take the example from above. What can be a value we should try to to get? It is “Friday,” I’d say.
Why? Depending on the intent, this information is really useful. So let us persist it in a kind of a variable. Say it is the last_day of something. Having an intent like a “sick report” this value can be used to create, for instance, a Jira ticket with the subject “Sick: Maximilian Berghoff [date of today] – [date of last day]”.
Model
However, to get a machine to understand our intent and grab the entities, we need to train it. For this blog post, I used RASA NLU.
The training data looks like this:
language: "en" pipeline: - name: "nlp_spacy" model: "de" - name: "tokenizer_spacy" - name: "ner_crf" - name: "intent_featurizer_spacy" - name: "intent_classifier_sklearn" data: | ## intent:report_sick_duration - I will stay home for [3 days](duration) - I can not come the next [4 days](duration) - I will stay in bed for the next [5 days](duration) ## intent:report_sick_from_to - I will stay home until [friday](last) - My doctor suggests me to stay home until [friday](last) - I am absent from [monday](first) to [friday](last) - I will be back on [friday](last) ## lookup:duration - 1 days - 2 days - a week ## lookup:first - monday - tuesday - wednesday - thursday - friday ## lookup:last - monday - tuesday - wednesday - thursday - friday
Let’s look into the data key. Here you can find the intents. I split up the sick report into two different intents: One to notify your colleagues/boss that you are absent due to sickness from X to Y and one to give a duration (number of days).
Having two different intents is useful, as I can define two different sets of entities.
The first one: I need a date for the first and the last day, at least the first day should exist, so let’s assume today as the first day.
For the second entity, I only need the duration (and assume today as the first day).
In that example, I also described a kind of setting which values are possible. This will be enriched during the training process.
Although this looks like a kind of a pattern or algorithm to do the lookup for intent and entities, it isn’t.
In the end, the machine should recognize sentences which are not on this list. Until this is performed with acceptable confidence, we have to give more examples like those above.
Tool to use – RASA NLU
There are other software tools out there to do NLU, but I decided to do it with RASA NLU. It has all the features we need for our use case, and it is quite easy to install:
pip install rasa_nlu # OR (to get a bleeding edge) # git clone https://github.com/RasaHQ/rasa_nlu.git cd rasa_nlu pip install -r requirements.txt pip install -e .
You can pass/create some configurations now and start training. But this one has a little downside: it is written in Python. But what if you run a PHP application? For that, I recommend to use the HTTP API of RASA and include it into your application by doing simple curl requests.
PHP Integration
For a talk at PHP Central Europe, I prepared some code to show that is possible to integrate NLU into PHP without implementing NLU in PHP. You can check out the repository, and you should also have a look at rasa_client/lib. This code should be enough to do some basic requests and get meaningful models back. For our example (sick report) I also introduced a command-line application written in Symfony.
This is not mandatory at all, it is just the fastest way for me to call the given code and to get something readable.
To start working with that example, you should run both docker containers and enter the one for the app code:
$ cd examples/ $ docker-compose up -d $ docker exec -it rasa-nlu-client sh $ cd /app/src/ $ bin/console
The command bin/console
should return a list looking like this:
rasa rasa:nlu:parse Parse a given text for its intents. rasa:nlu:remove-model Remove a training model. rasa:nlu:status return the available projects and models rasa:nlu:train Train a project by a well-defined training data. For the training data, you should have a look at https://rasa.com/docs/nlu/dataformat/
… which is an overview of the given commands. So then let’s do them.
status
$ bin/console rasa:nlu:status Got following projects\ Project: sick_report ======================= currently training:0 ----------------------- Available Model ----------------------- model_20181027-164038 model_20181027-173358 ----------------------- ----------------------- Loaded Model ----------------------- model_20181027-173358 -----------------------
Gives an overview of models and currently running training sets. Models listed in “Loaded Model” are loaded into memory, means you will get the fastest answers for.
train
rasa:nlu:train --project=illness_report data/config_train_illness_report.yml new model trained ================= Created Model: model_20181029-062412
Posts a valid training data file into a project (I used the one I mentioned above) to train a new model. You can also mention a model by using –model to train an existing model.
Status (new model there)
# bin/console rasa:nlu:status
Got following projects\
Project: illness_report
=======================
currently training:0
-----------------------
Available Model
-----------------------
model_20181027-164038
model_20181027-173358
model_20181029-062412
-----------------------
-----------------------
Loaded Model
-----------------------
model_20181027-173358
-----------------------
After you add new training data without mentioning the model the newly created model will be visible in the status list.
Parse
$ bin/console rasa:nlu:parse --project=illness_report "I will be absent due to sickness until friday" Intent: report_illness_from_to - Confidence: 0.8078944273721 ============================================================ Entities found: ------ -------- ------- ----- ----------- ------------------ Name Value start end extractor confidence ------ -------- ------- ----- ----------- ------------------ last friday 20 26 ner_crf 0.93667437133644 ------ -------- ------- ----- ----------- ------------------ Ranking: ------------------------- ----------------- ------------ Pos. Name Confidence ------------------------- ----------------- ------------ report_illness_from_to 0.8078944273721 report_illness_duration 0.1921055726279 ------------------------- ----------------- ------------
Now you can start parsing text. You will get a statistic answer back. This means each intent you get is one with calculated confidence only. “0.8” is quite ok, but a little bit more training will increase your confirmation level. You also get an entity back if it has been defined in your training data.
Conclusion
As you can see it is possible to use PHP to implement NLU in your application and not having to draw back on Python, AWS and the like. And you don’t need to rely on web services where you have no control over your data. Instead, you can use tools like RASA NLU that are quite easy to implement and allow you to analyze text using AI without leaving the PHP context.
How about you, have you used PHP to implement NLU? If so, what for?
Schreibe einen Kommentar