For the last couple of years I’ve been focusing mostly on front-end development. Machine learning, AI, data science and other buzz words I’ve been hearing have become louder and louder. This series are notes that I keep while trying to dive in this new world.
Before we get started let us first separate what Artificial Intelligence is and what Machine Learning is:
Artificial Intelligence is the broader concept of machines being able to carry out tasks in a way that we would consider “smart”.
Machine Learning is a current application of AI based around the idea that we should really just be able to give machines access to data and let them learn for themselves.
So Machine Learning is an application that works really well today to achieve a level of “weak” AI. Weak AI is when we teach a machine to do something really well (image recognition, playing Mario, …), they are not “sophisticated” or “smart”. General purpose AI would be “smart” and capable to learn, just like a human.
Data Science is the exploration and quantitative analysis of all available structured and unstructured data to develop understanding, extract knowledge, and formulate actionable results.
Where does machine learning fit in the data science pipeline?
To extract knowledge from data usually the steps are:
- Set the project objectives
- Collect and review data
- Select and cleanse data
- Modeling. Manipulate the data to draw conclusions
- Evaluate Model
- Apply conclusion to business
So basically machine learning fits in step number 4,5 and it is a way to manipulate our data to draw conclusions.
- Step 1 – Install pyenv
Pyenv is like nvm where we can install different python versions and work between then easily => https://github.com/pyenv/pyenv#homebrew-on-mac-os-x
- Step 2 – Install python 3.6.5
Run “pyenv install 3.6.5” in your console to install the 3.6.5 version of python.
- Step 3 – Add this command in your “.bash_profile”
eval “$(pyenv init -)”
- Step 4 – Install Anacoda
Anacoda basically installs Jupiter that is a collection of libraries and tools for data science => https://www.anaconda.com/download/#macos
What is machine learning or predictive modeling?
“Technique in which we train a software model using data”
f(X) = Y
where X1, X2, X3, … are the characteristics and Y is the result we want to predict.
Machine learning is basically trying to find f(x).
If the f(x) returns yes or no then we call this classification.
If the f(x) returns a number, then we call this regression.
We don’t have Y and want to find f(x) only by finding similarities between X1, X2, X3 where they produce Y with the most similar characteristics. Usually this is called clustering as we group observations into similar looking groups.