Quiz: How Neural Networks Actually Learn (Gradient Descent)

Question 1 of 9

What is gradient descent?

An optimization algorithm that adjusts weights to minimize loss

A method for calculating the initial random weights

A technique for measuring prediction accuracy

An algorithm for forward propagation

Question 2 of 9

In the mountain analogy, what does the hiker use to navigate when surrounded by thick fog? (Read through https://infolia.ai/archive/39 to answer)

The slope beneath their feet

A GPS device

A compass

The position of the sun

Question 3 of 9

What is the formula for updating weights in gradient descent?

new_weight = old_weight - (learning_rate × gradient)

new_weight = old_weight + (learning_rate × gradient)

new_weight = gradient - (learning_rate × old_weight)

new_weight = old_weight × learning_rate × gradient

Question 4 of 9

Why do we move in the opposite direction of the gradient when updating weights?

Because the gradient points toward higher loss and we want to go toward lower loss

Because the gradient points toward lower loss and we want to maximize accuracy

Because moving opposite to the gradient increases the learning rate

Because the gradient indicates the initial random weight direction

Question 5 of 9

Which type of gradient descent uses small batches of data, typically 32, 64, or 128 samples?

Mini-Batch Gradient Descent

Batch Gradient Descent

Stochastic Gradient Descent (SGD)

Adaptive Gradient Descent

Question 6 of 9

What happens when the learning rate is too high?

The algorithm bounces around the minimum without converging and loss may increase

Training becomes extremely slow but accurate

The algorithm gets stuck in local minima

Gradients become too small to calculate

Question 7 of 9

Which type of gradient descent is described as the default choice in practice?

Mini-batch

Batch Gradient Descent

Stochastic Gradient Descent (SGD)

Random Gradient Descent

Question 8 of 9

What are the common learning rate values mentioned for experimentation?

0.001, 0.01, and 0.1

0.1, 1.0, and 10.0

0.0001, 0.001, and 0.01

0.5, 1.5, and 2.0

Question 9 of 9

Why is the problem of local minima less concerning in modern deep networks?

High-dimensional spaces don't have the same local minima problems as 2D landscapes

Modern networks use higher learning rates that skip over local minima

Batch gradient descent automatically avoids local minima

Activation functions prevent local minima from forming