Machine learning (ML) is part of Artificial Intelligence (AI) that allows a system to learn on its own. Recently, Machine Learning is growing and has been applied in various fields ranging from video recommendations on YouTube to self-driving cars.
PHP-ML is a Machine Learning library for the PHP programming language. This library provides various tools to create an ML system starting from reading data, pre-processing, training model, to testing. For the complete function can be seen on the documentation site belonging to php-ml. This article will provide a simple example of using php-ml.
Setup
Installing php-ml is very easy, but composer must be installed on your computer beforehand. Create a new folder and open a command line in that folder and run the command below.
Create a .php file, e.g., index.php and write the following code.
Data
The data use for this article is student performance data in exams taken from https://www.kaggle.com/spscientist/students-performance-in-exams. Please download the .csv file from the site and move the .csv file into the folder where you installed php-ml. Here are some initial lines from the data. In this article the ML system will be made to predict the value of "math_score" by using the first 5 columns, so the last 2 columns will be ignored.
Pre-processing
Reading data
To read .csv data the CsvDataset function is required. The function requires 3 parameters, the .csv file path, the number of feature columns (starting from the first column & the last column is a label/value), and a Boolean to ignore the first row (because the first row is a header). Here is the code to read the .csv data.
Sharing data
The data that has been read needs to be divided into training data and test data by using the StratifiedRandomSplit function. The function requires 3 parameters, a dataset variable, a division ratio (e.g., 0.2), and a seed so that the division results can be repeated (not random). Here is the code to share the data.
Encode Data
The data used in this article are all categorical data and still in the form of text. Machine learning models can only read numerical data; therefore, the data needs to be converted into numerical data. One of the methods that can be used is LabelEncoder, the label encoder will change the value to 0, 1, 2, ... for each category. But I couldn't find LabelEncoder documentation, and the following code is my guess to use php-ml's own LabelEncoder.
Model training
The model use for this article is a decision tree, for details about the decision tree can be read here. Here is the code to train the decision tree model using php-ml.
Model testing
The previously separated test data is used to determine the quality of the model. In this article, the metric used is the mean absolute error, the metric will even out the absolute value of the difference between the predicted value and the actual value. Here is the code to test our model.
From the results of the testing, the mean absolute error value obtained was 18.5601..., which means that the prediction produced deviated by as much as 18.5601... from the actual value.