Skip to content
Sign upLog in
← Back to Community

Machine learning for beginners in R #1Introduction to KNN& Reading data from a dataset.

Profile icon
[deleted]

Hello world!
This is, in fact, my first tutorial on here, so I hope it is comprehensive and easy to work with. In this tutorial, i am going to show you the basic steps of machine learning in R. I recommend to look into the basics of R, so you have an idea what you are actually working with then. But, you can still work with this if you are an absolute newby in R.

I'll show you how to use R to work with the pretty well-known machine learning algorithm called k-nearest neighbors (KNN). The KNN algorithm is a simple machine learning algorithm and is an example of instance-based learning, where new data is classified based on labeled instances. (Which is a pretty good start, in my opinion)
Specifically, the differnce between the two data sets (Stored and new) is calculated by means of a similarity measurement. It is often expressed by the Euclidean distance.

So-TLDR; The similarity to the data that was already in the system is calculated for any new data input.

Using that similarity value, we can perform predictive modeling. predictive modleing
Predictive modeling is either classification, assigning a label or a class to the new instance, or regression, assigning a value to the new instance. Which one you use, is on you, when you use KNN later.

SO! Enough talking, lets get to code! Finally. In this example, we are going tu use the already existing dataset IRIS. It already exists in R, so we don't need to import anything. Try it out now!

iris

Hell yeah, this works great! (HOPEFULLY)
Now, lets load the whole dataset using this script:

# Read in `iris` data iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE) # Print first lines head(iris) # Add column names names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species") # Check the result iris

Nice! You were officially able to import data from your data set! Now, what does this script do?
The command reads the .csv file from a source. The header argument has been put to FALSE, which means that this source won't give you the attribute names of the data.

To simplify working with the data set, i recommend to create the column names yourself: you can do this through the function names(), which gets or sets the names of an object. Define the names of the attributes as you would like them to appear. In the code chunk above, you’ll have listed Sepal.Length, Sepal.Width, Petal.Length, Petal.Width and Species.
These names are in fact not random, but listed in the datasets description.

In our next tutorial, we will talk about how to understand your data. We will look into statistics with R, and even some advanced stuff.

Voters
Profile icon
DiveshTheReal
Profile icon
aromatictoast
Profile icon
mandpd
Profile icon
JonathanFranzk1
Profile icon
JhosephEspinel
Profile icon
Broni
Profile icon
ganni1809
Profile icon
alan2511
Profile icon
PYTHON01100100
Profile icon
Relindede
Comments
hotnewtop
Profile icon
MahdiNohtani

good, how we can proceed with the rest of tutorial

Profile icon
[deleted]

I might write a part two in the next days

@MahdiNohtani

Profile icon
toyeiei

Hi @enigma_dev :) thanks for the tutorial. Do you know how to install R packages in repl? I was trying, but no success.

Profile icon
mahdikhalil

that is good.

Profile icon
dailydentist

Our mission as individual dental professionals is to make a difference in a patient’s life by striving to provide pain free dentistry. We care deeply about our patients and what we do to help them maintain dental health for a lifetime. Pediatric Dentist

Profile icon
irwinsstudy

Our teaching philosophy is that “education should not be the filling of a pail, but the lighting of a fire.” (W.B. Yeats) We teach not only to help our students improve their academic grades, but also to inspire them in their learning & growth journey with us! GP Tuition

Profile icon
chuntsubaki

I always love to read quality content. Thanks for sharing. Natural Light Photography Studio

Profile icon
justzahiphopco

Wonderful article with very useful info. Zakes bantwini osama mp3 download

Profile icon
drchirocomsg

What a wonderful webpage is this. Thanks for sharing. Dr Chiro Singapore

Profile icon
bestecons

I am looking for this type of article. Economics Tuition

Profile icon
Nicholas2323

Machine Learning is the process of building statistical models so that computers can use them to predict new data, uncover hidden relationships in existing data, or do both. As resumehelpaustralia said Machine Learning relies heavily on two internal components - computation and information.

Profile icon
Nicholas2323

Machine Learning is the process of building statistical models so that computers can use them to predict new data, uncover hidden relationships in existing data, or do both. As resumehelpaustralia said Machine Learning relies heavily on two internal components - computation and information.

Profile icon
Soddoso

Mustafs

Profile icon
TuralYusubov

hello

Profile icon
w50zt303

Great expirience

Profile icon
rullombokekas

top top, how to acces series 2?

Profile icon
hasan-py

useful . thanks