# The easy way to predict stock prices using machine learning

## The step by step you need to predict the stock price of companies, and please read the disclaimer.

In this post, I show the step-by-step method of making stock price predictions using the R language ,and the H2o.ai Framework. Please understand that this article is only a simple demonstration on how to get start using the H2O.ai Framework. It’s not a financial advise. Don’t take any financial decision based on this post.

The Framework is also available in Python, however, as I am more familiar with R, I will present the tutorial in that language. You may have already asked yourself: how to predict the stock price using artificial intelligence? Here are the steps to do it:

1. Collect the data
2. Import the data
3. Clean and manipulate the data
4. Split test and training observations
5. Choose a model
6. Train the model
7. Apply the model to the test data
8. Evaluate the results
9. Enhance the model if necessary
10. Repeat step 5 to ten until you are satisfied with the result.

In the last post, I showed how to plot high-frequency data using the Plotly library, and I explained how to collect the data to perform the analysis. Let’s skip straight to step 3 on our list, if you want to know how to do steps 1 and 2, visit the previous publication.

Our research problem is this: “What will be the closing value of the asset in the next hour?”

# Data cleaning

After we have imported the asset data that we want to make the predictions using MetaTrader, we need to change some variables. First, we define the names of the variables:

`#seting the name of variablescol_names <- c("Date", "Open", "High", "Low", "Close", "Tick", "Volume")colnames(data) <- col_nameshead(data)`

Our data will take the following form:

We will only use some of the available variables: Open, High, Low, Close and Volume. That way, we will eliminate the others.

`data\$Date <- NULLdata\$Tick <- NULL`

Since we want to know the closing price for the next observation, we need to shift the following values to a row above. To do this, we create a function and create a variable in the original dataset with the new data:

`#shifting n rows up of a given variableshift <- function(x, n) {   c(x[-(seq(n))], rep(NA, n))}data\$shifted <- shift(data\$Close, 1)tail(data)`

Notice that we have the values of the variable, Close, allocated one row above. With that, we have a NA in the last line, we use the na.omit () function to omit that line:

`#remove NA observationsdata <- na.omit(data)write.csv(data, "data.csv")`

Perfect, we have our data ready to start modeling.

# Splitting the data

In this problem, we will use a package called H2O.ai, which offers us a complete solution for analyzing and training artificial intelligence models. Its user-friendly structure allows people without a data science background to solve complex problems. Let’s start by loading the library into our environment:

`#Installing the packageinstall.packages("h2o")#loading the library library(h2o)`

Once installed and loaded, we start our virtual machine that will serve as a basis for building our model. When starting the virtual machine, we must set the desired number of cores and memory parameters:

`#Initializing the Virtual Machine using all the threads (-1) and 16gb of memoryh2o.init(nthreads = -1, max_mem_size = "16g")`

Importing the data:

`h2o.importFile("data.csv")h2o.describe(data)`

We now define which variable we want to predict in our data set and those that will serve to “teach” our model.

`y <- "shifted" #variable we want to forecastx <- setdiff(names(data), y)`

Then, we split the data into training and testing at a proportion of 80% for the training data.

`parts <- h2o.splitFrame(data, .80)train <- parts[]test <- parts[]`

Having the data divided, we go to the part where the magic of the H2O.ai package happens. There is an important trick here on splitting the data (The method shown here is biased), however we will get to this issue on the next article.

# Choosing the model

One of the tasks that every data scientist needs to perform when creating his Machine Learning projects is identifying the best model or set of models to make his predictions. This requires a lot of knowledge, especially from a strong mathematical base, to decide the best ones for specific tasks.

Thanks to the H2O.ai package, we can ask it to choose the best model for us while taking care of anything else. This is called Automodeling. Obviously, this magic may not be the most robust solution to problems, but it is a good start.

# Train the model

To create our model, we call the automl function and pass the necessary parameters as follows:

`automodel <- h2o.automl(x, y, train, test, max_runtime_secs = 120)`

After a few minutes, we obtain a list of models in order of performance. To learn more about them just call:

`automodel@leader`

# Apply the model

Now that we have our leader let’s apply it to the test data !! This is the coolest part, as we will use data not yet observed by the model to evaluate performance.

We call the predict function with the model and test data as parameters!

`predictions <- h2o.predict(automodel@leader, test)`

# Conclusion

In this post, we saw how to handle and manipulate the financial data of an asset and easily create a machine learning model to make predictions of closing prices in the hour following the analysed data.

On the next article we’ll see how bad this model performed on the test data.

See you next week!

Disclaimer: This article is not an investment recommendation or anything like that. Forecasting stock prices is not a trivial task and this post is simply a demonstration on how easy is using the H2O.ai framework to start solving machine learning problems. It’s easy to make predictions, however it doesn’t mean that they are correct or accurate. And no, I don't have any connection with the company.

Mathematics Education Ph.D. and Computer Science Teacher based in Brazil. Photographer. Data Science and Stock Market enthusiast. Building wealth through data.

## More from Pedro Lealdino Filho

Mathematics Education Ph.D. and Computer Science Teacher based in Brazil. Photographer. Data Science and Stock Market enthusiast. Building wealth through data.