Back to overview

Python Machine Learning

| 4 min read

Intro

Now we are starting to move towards AI development. Let’s move into Machine Learning a subset of AI, machine learning will be used for modeling information we want our programs to work off. This can then be used for language processing, vision processing and even for forecasting stock market trends and crypto trends.

The Process

Steps

  1. Import the Data
  2. Clean the Data, this can involve removing duplicates or anything that can create negative results
  3. Split the Data into Training/ Test Sets
  4. Create a Model, this can be a decision tree or something as sophisticated as a neural network
  5. Train the Model
  6. Make Predictions
  7. Evaluate and Improve

Libraries

  1. Numpy - Useful python tool for working with numbers docs
  2. Pandas - Useful for working with rows and columns docs
  3. MatPlotLib - Used for working with 2d charts docs
  4. SciKit-Learn - A common library for working with machine learning docs

Tools

  1. Jupyter - Web based IDE for Python docs
  2. Anaconda - Within Jupyter we will use Anaconda docs

Prerequisetes

Installation

$ pip install -U notebook
$ pip install -U pandas
$ pip install -U scikit-learn

Run

$ jupyter notebook

Jupyter Notebook Intro

Create a new Notebook

Importing Dataset

Coding the Application

Segment 1

import pandas as pd
df = pd.read_csv('vgsales.csv')     # Pandas allows us to read the CSV file
# df                                # This allows us to dump the raw data out if we want to see it
df.shape                            # This allows us to see the rows and columns of the dataset

Example

Segment 2

df.describe()                       # Another way to dump data about the dataset

Example

Segment 3

df.values                           # This will output the actual data itself in its original array format

Example

The Application

Segment 1

import pandas as pd

music_data = pd.read_csv('music.csv')
X = music_data.drop(columns=['genre'])  # Input Data (Always with a capital letter for input)
y = music_data['genre']                 # Output Data (Always with lower case for output)

Segment 2

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X.values, y.values)                        # Here we are training our data
predictions = model.predict([[21, 1], [22, 0]])      # Then we use this data to predict future results
predictions

Segment 3

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)  # Allocating 20% of data for testing and 80% for training, this method returns a tuple

model.fit(X_train.values, y_train.values)
predictions2 = model.predict(X_test.values)

score = accuracy_score(y_test.values, predictions2)
score

Segment 4

import joblib                                      # This still comes from scikit-learn

joblib.dump(model, 'music-recommender.joblib')     # Here we create a file on our desktop of the model for future use

Segment 5

model2 = joblib.load('music-recommender.joblib)

predictions2 = model.predict([[21, 1]])

Segment 6

from sklearn import tree

tree.export_graphviz(
    model2, 
    out_file='music-recommender.dot',
    feature_names=['age', 'gender'],
    class_names=sorted(y.unique()),              # This is required as many names are repeated
    label='all',
    rounded=True,
    filled=True
)

Final Product

Decision Tree