Fixing the Magical Hat

Oh no, the Choosing Cap hat has lost its magic, half way through assigning students at Willow Academy to their houses,
the Amber Foxes and Blue Herons .
We need to re-wire its brain using the power of machine learning, using a program called Tensorflow Playground.

The blue and orange dots show how people should be assigned to houses.
The background of the graph shows where the cap currently wants to assign each person.

Right now, the background is blue where it should be orange, and orange where it should be blue. The Magical Hat needs your help!

In the activities below, you will help design and train Hat’s new ‘brain’ as a neural network. Your goal is to make sure that the background (the hat's decision) matches the color of each dot (the correct house).

Before and after training

Activities

The top left has a start button. When you press go, the hat’s magic starts to program its newly wired brain.

Activity 1: Separating two adjacent groups

Try this: Press go

Hit go without making any changes.
The background color slowly changes between blue and orange.
Eventually, the blue points get a blue background, and the orange points get an orange background.

Try this: Help the hat learn faster

The learning rate helps determine how quickly the network learns.
A small learning rate results in careful, steady progress.
But here the learning rate is much too slow.
1. Hit the reset (circle) button.
2. Find the learning rate drop down at the top of the screen, and select a slightly larger number.
3. Press start (play).
Try resetting and running the simulation with a range of different learning rates.

But be careful. Setting the learning rate too high in some of the later activities will cause the magical hat's training to jump around too much.

Learn more: Reset the training

Next to the "play" button is a circle.

Click this button to reset the training run.
Try repeatedly pressing reset then play. Notice how the different training runs behave differently.
Training a magical hat is like playing the lottery - some training runs work much better than others, even with the same settings!

Below the "Data" label is a button titled Regenerate.

This moves around the blue and orange circles.
If you press this button while training, the hat will work to adjust its predictions to match the new data.

Try this: Unstable learning

At the far left of the screen, there is a button called "Regenerate".

Click that button and see your dataset update.
This shows the magical hat how it should assign a different group of students to their houses.

Let's try to see one of the problems that a big learning rate can cause.

Set the Learning Rate to 10
Press Go
Press Regenerate a few times, waiting a little bit between each click

If you're lucky, you will start to see the line jumping around wildly.
In more complex datasets, a high learning rate will cause the magical hat to repeatedly over-correct its estimates. But don't worry if you don't see this behavior here - the training is random.

In the remainder of the exercises, stick with a learning rate of somewhere between 0.001 and 0.03, unless you want to see the training become unstable.

Learn More: Measuring error

At first, the hat does a very bad job of sorting.
To improve, it must measure the error of its current prediction about each student's house.
If you add up these errors, you get the current loss.
The graph in the top right of the interative shows a picture of how the loss has changed during training.
- You want the error (loss) to steadily go down.
- A flat loss means that the model isn't learning anything,
- An increasing loss means that the model is doing worse and worse!
- When loss becomes noisy, the training is unstable, and the colors jump around.

In the rest of the activities, we will experiment with the magical wiring¹ that makes the magic hat work.

The brain has 0-6 layers
Each layer has 1-8 neurons
There are connections between each neuron and the next layer

Your job is to try different combinations of layers and neurons and to help hat relearn how to match each of the patterns below.

More complicated brains can learn more complicated patterns.
But magic hat brains are also very expensive, so try to find the smallest number of layers and neurons that work.

Activity 2: Separating groups inside and outside of a circle

Try this: Run with no hidden layers

First, hit "Play". What kind of pattern does the hat learn?
Is this pattern enough to separate the people inside the circle from the ones outside the circle?

Try this: Add hidden layers

Look for 'Hidden Layer' in the middle of the screen.
Click +, and observe as two new 'nodes' show up in the middle of the screen.
Hit go, and see what happens.
Try adding a second hidden layer, and hit play. What happens? What does a third, fourth, and fifth layer do?

Try this: Add neurons to the layers

Go back to a single hidden layer (or refresh your page and add a single new layer).
Above each layer, it says the number of neurons (nodes) in the layer, and has a +/- sign to increase and decrease the number of those nodes.
Try adding a neuron to your layer, and hit play. What happens?
Add additional layers and neurons. What are the impact of these changes?

Challenge: Find the smallest network that can learn this pattern

The smallest network that can approximate this data set has a single hidden layer of three neurons.

Activity 3: Separating groups by quadrant

Try this: Add hidden layers

Try adding a second hidden layer, and hit play. What happens?
What does a third, fourth, and fifth layer do?

Try this: Add neurons to the layers

Go back to a single hidden layer of two neurons (or refresh your page and add a single new layer).
Try adding a third neuron to your layer, and hit play. What happens?
Now add a fourth neuron. What happens?
Experiment with different numbers of layers and neurons in each layer.

Challenge: Find the smallest network that can learn this pattern

The smallest network that can approximate this data set has a single hidden layer of four neurons.

With a few additional layers of four or more neurons, the network converges faster.

But with two many layers, the training does not converge at all.

Now it’s time to challenge our thinking cap. None of the tricks we've learned so far will help the magic hat to match the next data set. We will need to learn two new tricks to help the same network learn better.

Activity 4: Separating groups in a spiral

Try this: Change the number of layers and neurons

Try increasing the number of neurons in each layer, and try increasing the number of layers. Can you help the hat learn to separate the spirals?
What if you increase the layers to 6 and neurons to 8/layer (the largest allowed)? We clearly need some more advanced magical tricks!

Learn More: The magic behind the neuron

Put the mouse over different edges. You will see a number pop up. This is called the weight of the edge.
Blue means positive and orange means negative, with darker colors representing larger numbers.
Most of what the magic hat does is multiply and add numbers.
Each neuron also includes a more complicated function, called activation function.
You can think of these as the magic behind the neuron.
To train more complicated networks, it can be helpful to try chainging these magical brains.

Try this: Change the activation function

Try setting the number of neurons and layers to be very large, then press "go".
It seems to work, but it jumps around a lot.
Now look for a drop down list next to the word activation, and click on the word ReLU.
Press "Go", and watch the network train.
Often (but not always), the network will train faster and more smoothly.

Learn More: Fairness and division of labor

It turns out that training the hat sometimes breaks when one neuron does an unfair amount of work. We need to divide up the magic more carefully using regularization. But different versions of regularization work for different problems. Think about fairness of division of work among group projects.

Its not fair to make one person do all the work if everyone is equally good at everything. L2 regularization tries to make sure that one person isn’t being made to do all the work.
If everyone in the group is good at something different, it will take less work if you divide up tasks, than if one person tries to do everything. L1 regularization tries to make sure that everyone is doing the task that they excel at.

Try this: Use regularization

Find the drop down under the word Regularization and select L2.
Depending on the number of layers and neurons you are using, you may need to wait some time for the training to converge.
Keep an eye on the graph on the top right - the line tells you a rough measurement of the error that your network is making. A downward line tells you that the error is getting smaller, and the network is still learning, even if the network doesn't match the data set yet.
Once you can train the hat to match the spiral, write down your settings.
Then try reducing the number of layers and/or neurons to find the smallest network that will match the spiral.

Hint:

You can train the hat to match the spiral fairly quickly with the following settings: - 5 hidden layers, with 6 neurons per layer (30 total neurons) - Learning rate: 0.03 - Activation: ReLU - Regularization: L2 - Regularization Rate: 0.001

But this is not the best solution. How small of a network can you train to match the spiral quickly?

Can you train make a smaller network if you are willing to wait longer? How much longer will you need to wait?

Not everyone is a perfect match for either house. In a regression problem, the hat is trying to provide a range of selections from dark blue (definitely Heron) to white (no strong leaning) to dark orange (definitely Fox).

Activity 5: Ambiguous Matches

Try again adjusting the number of layers, the activation function, and the regularization. What is the same from the first few examples? What is different?

When you're done (Return Ticket)

The chosing cap has been fixed, thanks to your hard work!

Please fill out the survey

When you are done with the survey, you'll get a special rhyme.
Share this with one of the activity coordinators, and you can collect a small prize!

Learning more

This kind of "magic hat" shows up in many different places, including

Google search results
Netflix recommendations
Amazon suggestions
Generative AI Chat-bots

If you want to learn more, study Mathematics and Computer Science! Machine learning builds on:

Calculus
Linear Algebra
Programming Languages (like Python)
and much more!

Want to stay in contact?

Contact one of the organizers at stephen.flood@bridgew.edu
Follow the BSU Mathematics department on Instagram
Enroll at BSU, where we have excellent programs in Mathematics and Computer Science.
These are also particularly good together as a double major.

Acknowledgements

The interactive graphics from each of the activities all use a locally hosted version of Tensorflow Playground. The only change made in the local fork of is the removal of non-linear input features.

These activities were written by Stephen Flood in collaboration with Poonam Kumari.

The magic is actually just a lot of mathematics, written up in a programming language. ↩

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search