What is machine learning?
Machine learning is a pretty cool thing. You basically feed a computer a dataset and it will learn from that dataset. Once it learns from that dataset it can then predict the outcome of new data. The more data you feed it, the more accurately it can predict.
For example, lets say I told the computer about 100 males and 100 females based off their age, height, weight, and body mass index. I could then ask the computer whether it thinks a person with an age of 20, height of 5’3", weight of 112lbs, and a BMI of 20 is a male or a female. Based off of the dataset it was given to learn from it would probably predict this new person is a female.
I’m going to show you an in depth example of machine learning using the R programming language. You can download R Studio for free and follow along if you’d like.
Let’s Get Some Data
First I’m going to download a dataset from Kaggle. This dataset contains features (descriptors, or columns) describing specific type of a plant, the dataset also tells us what species the plant is. We are going to try to leverage machine learning to let the computer predict what species a plant is when we feed it data.
Once they dataset is downloaded you will have to open up R Studio and change your working directory to wherever you downloaded the CSV dataset file to and read the CSV file in R.
# Change Working Directory and Read CSV File
getwd()
setwd("/Users/SamuelCuster/Desktop/R")
# List Files and Verifiy CSV in Current Directory
list.files()
# Read CSV file
iris <- read.csv("Iris.csv")
Now I'm going to load the ggplot2 graphing library, this library gives us some easy to use graphing functions in R.
# Advanced Plotting Library
install.packages("ggplot2")
library(ggplot2)
Now we want to try to learn the basics of our data. What features are present? What is the mean, median, and mode of each feature? How many of each species of plant?
# Understand Structure of Data
head(iris,5)
str(iris)
summary(iris)
Now we are going to build a pie chart to visualize the number of species of our plants.
# Visualize Data
## Pie
slices <- c(
nrow(subset(iris,(Species == "Iris-setosa"))),
nrow(subset(iris,(Species == "Iris-versicolor"))),
nrow(subset(iris,(Species == "Iris-virginica"))))
slice.labels <- c("Iris-setosa","Iris-versicolor","Iris-virginica")
pie(slices,labels=slice.labels,main="Species of Iris")