Machine Learning From Scratch [Part 2]


This is part two of Machine Learning from Scratch. You’re about to follow a straight forward and short tutorial about plotting a technical bar chart using Python, Pyplot, and a statistics tool called Decile.

In this lesson, you’ll learn how to:

  • Work with collections library and Counter module
  • Work with bucketed lists and deciles
  • Plot bar charts at an advanced level with histograms
  • Generate a line chart (X and Y axis) from the lists
  • Generate a bar chart

We’ll keep studying data visualization with Pyplot. Visualizing data is a good part of a data scientist or machine learning engineer. The data itself is not that valuable – we must be smart enough to analyze it and display in an understandable way.

As we’ve seen in part 1, Pyplot is an easy and fast library to plot your data, but it certainly has its limitations.

Now, let’s jump straight into our next task.

Let’s now declare a list of grades that will be our data object this time and also import the Counter module from the Collections library.

from collections import Counter
grades = [83, 95, 91, 87, 70, 0, 85, 82, 100, 67, 73, 77, 0]

Also, we need to import Pyplot. Assuming that you’re using the jupyter notebook from the previous lesson, you just need to run the cell where you imported the module.

Now, let’s declare our histogram using Counter. Let’s bucket all grades by decile and put 100 with the 90s. Also, let’s print our histogram variable and check out its content.

A decile is a descriptive statistics’ concept which “is any of the nine values that divide the sorted data into ten equal parts so that each part represents 1/10 of the sample or population”.

To determine our decile from the grades, we’ll use the Counter, which is a dict subclass for counting hashable items. It returns its elements as dictionary values.

#Bucket grades by decile, but put 100 in with the 90s
histogram = Counter(min(grade // 10 * 10, 90) for grade in grades)
print(histogram)

We want the minimum value of the iteration (grade // 10 * 10, 90). We’re using // to return only the integer of the division.

You’ve probably observed the output of our histogram:

Counter({80: 4, 90: 3, 70: 3, 0: 2, 60: 1})

That is what a decile looks like.

Now, let’s print our histogram and see what it looks like.

plt.bar([x + 5 for x in histogram.keys()],
       #Shift bar right by 5
       histogram.values(),
       #give each bar its correct height
       10,
       #Give each bar a width of 10
       edgecolor=(0, 0, 0))

#x-axis from -5 to 105
#y-axis from 0 to 5
plt.axis([-5, 105, 0, 5])

plt.xticks([10 * i for i in range(11)])
#x-axis labels at 0, 10, ..., 100
plt.xlabel("Decile")
plt.ylabel("# of Students")
plt.title("Distribution of Exam 1 Grades")
plt.show()
That’s how our distribution of the grades will look like

Statistics play a significant role in machine learning. Sometimes, pure statistics will satisfy your project’s objective. There is a huge discussion about whether statistics tools are machine learning or not – and that’s merely a discussion.

We should be concerned about objective goals for our machine learning projects – no matter how you call it (AI, Data Science, Statistics…). It doesn’t matter if you’re running a basic linear regression or a hardcore deep learning framework, you must deliver practical results.

By the end of this article, you’ve had more contact with Python handling data and visual demonstrations using Pyplot. In the next article (Part 3), we’ll jump into Numpy, which is widely used for numerical computing.



Machine Learning From Scratch [Part 1]

This is part one of Machine Learning from Scratch

In this lesson, you’ll learn how to:

  • Import a module from a bigger library
  • Start working with Matplotlib and Pyplot
  • Declare lists of data
  • Generate a line chart (X and Y axis) from the lists
  • Generate a bar chart


Discover the power of data by implementing machine learning algorithms in Python. Here, I’ll show you the logic behind each technique, and you are going to be able to apply machine learning in different situations.

No more talking, let’s get straight to it.

Assuming that you have Anaconda and Jupyter Notebooks installed, create a new notebook.

Let’s import the pyplot module from the library matplotlib. Pyplot is useful for generating simple charts from data. It’s not recommended for heavy-duty data visualizations – you wouldn’t use it live in a web dashboard.

#For making simple plots
from matplotlib import pyplot as plt

Now, let’s declare two lists – each one containing 7 elements. You’ll notice that their elements are corresponding. years[0] is related to gdp[0] – that’s for all lists’ elements.

years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]

gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]

Now, using pyplot, let’s plot a line chart.

X-axis: years

Y-axis: gdp

Take a close look at plt.plot syntax. The attribute on the X-axis goes first, the Y-axis goes second. Then, you select the attributes you want:

  • color
  • marker (‘o’ means a circle as indicator in the chart)
  • linestyle
#create a line chart. Years on x-axis, gdp on y-axis

plt.plot(years, gdp, color = 'green', marker = 'o', linestyle = 'solid')

#add a title
plt.title("Nominal GDP")

Now, let’s add a title to our chart and print it right into Jupyter notebook:

#add a label to the y-axis
plt.ylabel("Billions of $")
plt.show
This is the output you should see

Pyplot is a simple and fast solution to generate visualizations from data.

In business, you need to be agile. Pyplot charts may not be that good looking or interactive, but they will certainly do their job.

You don’t need to memorize each parameter for a function. For example, put your mouse cursor next to plt.plot() and press shift + tab. The docstring of the function will pop into your screen:

Here they are: all possible parameters your function might receive. If you don’t specify all of them (apart from x-axis and y-axis) the default values will be used

Now, let’s learn how to plot a bar chart.

Bar charts are useful when when you want to show how some quantity varies among some discrete set of items.

Discrete items are not continuous values – which means that they are not a progression of numbers.

We want to visualize the names and heights in meters of the tallest buildings in the world. After a quick Google search, you will come up with two lists of corresponding items: building_names and heights

building_names = ["Burj Khalifa", "Shanghai Tower", "Makkah Tower", "Ping An Financial Center"]
heights = [828, 632, 601, 555]

As you’ve declared Pyplot previously, it’s already instantiated into your Jupyter Notebook, so there’s no need to declare it again. If you’ve close this notebook, you will have to execute the import statement again.

If you type in plt.bar() and press shift+tab, the docstring of the function will pop into your screen:

Again, you don’t need to memorize the parameters each function receives.

To make the bar chart look good, we might want to set up that the length of each bar has the same length of the name of the building. Also, we’ll set the bars’ heights. As we are talking about a range of values, we might simply call range:

plt.bar(range(len(building_names)), heights)

Let’s add titles to our bar chart and y-axis:

plt.title("Tallest buildings in the world") #add a title
plt.ylabel("#height in meters") # label the y-axis

To add labels to our X-axis, we’ll call xticks:

plt.xticks(range(len(building_names)), building_names)
plt.show()

plt.show() will literally show our bar chart which must look like this:

We’ve just generated a bar chart using Pyplot. Note that the titles are messy thanks to their large names. Pyplot is fast but not pixel perfect. Deal with it.

That’s good for now. I believe that short tutorials are more productive than larger ones.

On the next tutorial of Machine Learning from Scratch we’ll keep playing around with Pyplot, collections, histograms and line charts.

Success



How To Handle Meetings

In most cases, it’s not always the most popular person who gets the job done.

From all my experiences in the business world, meetings are (almost) always terrible. In the absence of leaders who would set things straight, meetings flow just as unmanned ships at the ocean.

Meetings have the obligation to be productive, otherwise, it’s simply a waste of time. Of course, that’s different than building a solid and healthy relationship with your co-workers or teammates. That’s extremely important, but business meetings must be designed to be productive and getting things done.

Do you even wonder why? Businesses are supposed to deliver value in the form of physical or digital products and services. Meetings are supposed to set and refresh operational points, data, and intelligence among leaders and workers – and that won’t get done by screwing around.

 

What is a business meeting?

A meeting is any encounter between two or more people to talk about anything.

A business meeting is an encounter between two or more people to talk about business perspective, progress update, feedback receival or any subject valuable and indispensable to operations.

Here’s a common scenario that we’ve all been through:

a meeting starts to talk about subject XYZ and, for the next thirty minutes, XYZ is not touched. Instead, participants engaged in what I call “ice-breaking conversation” – which is nothing but bullshit.

I’m like him in 97.492% of all meetings I attend

How to Handle Meetings

There are ways of making a meeting productive – if you’re an executive, that’s your obligation. Meetings must be work sessions, not bull sessions.

1. Decide what kind of meeting it will be

Different meetings require different types of preparation to have different results.

If there’s a meeting to write a marketing campaign, press release or something that needs to have a draft, a member or team has to prepare a draft beforehand. Otherwise, your meeting will be filled with brainstorms and conversation that won’t get the job done.

Objective meetings are supposed to ship the necessary/requested results at a glance. If you’re developing a new product, then you may arrange brainstorm/creative sessions, modularization, operations and scaling sessions.

If you’re dealing with a crisis, you may need results even faster. Delegating the right functions to the right teams will be a key to shipping such results.

Also, leaders can set meetings to happen in strategic parts of the day. Priorities should be handled early in the week – and that’s a nice excuse to arrange an 8 AM on Monday. Brainstorming or product development events may be handled after priorities are cleared.

Informal meetings, on the other hand, could be arranged

2. Reports

If one or all members report, the meeting should be confined to that matter.

Either there should be no discussion at all or the discussion should be limited to make the points clearer. If all reports must be discussed, then they should be previously emailed or handled to each member. Also, each report should have a predefined time-space.

3. Product Development

Product development and brainstorming sessions could be disastrous if there are no rules to be respected. Here are some points that might help you organize creative sessions:

  • defining the beginning and end of the meeting. If you planned a 1-hour session, such timeframe must be followed, especially if general thoughts are leading nowhere and except if thoughts and points are being extremely productive, then such meeting may be extended;
  • documenting valuable (and only valuable) points. These are the ideas and points that should be discussed or developed in next sessions or operation meetings;
  • don’t ask for unnecessary stuff. Just don’t.

4. Use your weapons

Slack, Google Drive, Dropbox, Evernote and thousands of other apps are there to make your day more productive. Stick to one or two platforms and integrate them as much as necessary – one of the things I offer in my consulting hours.

Now it’s time for you to speak:

  1. How do you handle your meetings?
  2. Which strategies do you think are valuable?

Comment your answers or email me them @ brunocampos.dev@gmail.com

Do you publish online content? I strongly recommend this article.