Infolia AI
Learn AI
Newsletter
Blog
Subscribe Free
▼
Get Started Today
Get curated AI insights delivered to your inbox
Subscribe
No spam, unsubscribe anytime
🧠 Quiz
Quiz: How Neural Networks Actually Learn (Gradient Descent)
Question 1 of 9
What is gradient descent?
An optimization algorithm that adjusts weights to minimize loss
A method for calculating the initial random weights
A technique for measuring prediction accuracy
An algorithm for forward propagation
Question 2 of 9
In the mountain analogy, what does the hiker use to navigate when surrounded by thick fog? (Read through https://infolia.ai/archive/39 to answer)
The slope beneath their feet
A GPS device
A compass
The position of the sun
Question 3 of 9
What is the formula for updating weights in gradient descent?
new_weight = old_weight - (learning_rate × gradient)
new_weight = old_weight + (learning_rate × gradient)
new_weight = gradient - (learning_rate × old_weight)
new_weight = old_weight × learning_rate × gradient
Question 4 of 9
Why do we move in the opposite direction of the gradient when updating weights?
Because the gradient points toward higher loss and we want to go toward lower loss
Because the gradient points toward lower loss and we want to maximize accuracy
Because moving opposite to the gradient increases the learning rate
Because the gradient indicates the initial random weight direction
Question 5 of 9
Which type of gradient descent uses small batches of data, typically 32, 64, or 128 samples?
Mini-Batch Gradient Descent
Batch Gradient Descent
Stochastic Gradient Descent (SGD)
Adaptive Gradient Descent
Question 6 of 9
What happens when the learning rate is too high?
The algorithm bounces around the minimum without converging and loss may increase
Training becomes extremely slow but accurate
The algorithm gets stuck in local minima
Gradients become too small to calculate
Question 7 of 9
Which type of gradient descent is described as the default choice in practice?
Mini-batch
Batch Gradient Descent
Stochastic Gradient Descent (SGD)
Random Gradient Descent
Question 8 of 9
What are the common learning rate values mentioned for experimentation?
0.001, 0.01, and 0.1
0.1, 1.0, and 10.0
0.0001, 0.001, and 0.01
0.5, 1.5, and 2.0
Question 9 of 9
Why is the problem of local minima less concerning in modern deep networks?
High-dimensional spaces don't have the same local minima problems as 2D landscapes
Modern networks use higher learning rates that skip over local minima
Batch gradient descent automatically avoids local minima
Activation functions prevent local minima from forming
Submit Quiz
← Back to Newsletter