Infolia AI
Learn AI
Newsletter
Blog
Subscribe Free
▼
Get Started Today
Get curated AI insights delivered to your inbox
Subscribe
No spam, unsubscribe anytime
🧠 Quiz
Quiz: Transformers: The Architecture That Changed Everything
Question 1 of 10
What was the title of the Google research paper that introduced transformers?
Beyond RNNs: The Future of AI
Attention Is All You Need
Multi-Head Attention Mechanisms
Transformers: A New Architecture
Question 2 of 10
What are the two main parts of each transformer layer?
Encoder and decoder
Hidden state and attention mechanism
Positional encoding and normalization
Multi-head self-attention and feed-forward network
Question 3 of 10
How many layers does BERT have?
24 layers
96 layers
12 layers
8 layers
Question 4 of 10
What problem does positional encoding solve in transformers?
It helps the network know the order of words
It speeds up parallel processing
It enables multi-head attention
It normalizes the word embeddings
Question 5 of 10
What mathematical functions are used in positional encoding?
Tangent and cotangent functions
Linear and polynomial functions
Sine and cosine functions at different frequencies
Exponential and logarithmic functions
Question 6 of 10
What type of attention does BERT use?
Sequential attention
Unidirectional attention
Bidirectional attention
Causal attention
Question 7 of 10
What is the purpose of residual connections in transformer layers?
They let gradients flow smoothly during training and make deep networks possible
They enable parallel processing of words
They add positional information to embeddings
They combine outputs from multiple attention heads
Question 8 of 10
How many parameters does GPT-3 have?
175 billion parameters
300 billion parameters
96 billion parameters
1.5 billion parameters
Question 9 of 10
What type of transformer architecture does ChatGPT use?
Encoder-decoder
Encoder-only
Bidirectional encoder
Decoder-only
Question 10 of 10
What is the main reason transformers won over RNNs?
Parallelization - transformers process all words at once instead of sequentially
Lower computational cost per word
Simpler architecture with fewer layers
Better understanding of context and meaning
Submit Quiz
← Back to Newsletter