Infolia AI

Learn AI Newsletter Blog

Get Started Today

Get curated AI insights delivered to your inbox

No spam, unsubscribe anytime

🧠 Quiz

Quiz: Transformers: The Architecture That Changed Everything

Question 1 of 10

What was the title of the Google research paper that introduced transformers?

Beyond RNNs: The Future of AI

Attention Is All You Need

Multi-Head Attention Mechanisms

Transformers: A New Architecture

Question 2 of 10

What are the two main parts of each transformer layer?

Encoder and decoder

Hidden state and attention mechanism

Positional encoding and normalization

Multi-head self-attention and feed-forward network

Question 3 of 10

How many layers does BERT have?

24 layers

96 layers

12 layers

8 layers

Question 4 of 10

What problem does positional encoding solve in transformers?

It helps the network know the order of words

It speeds up parallel processing

It enables multi-head attention

It normalizes the word embeddings

Question 5 of 10

What mathematical functions are used in positional encoding?

Tangent and cotangent functions

Linear and polynomial functions

Sine and cosine functions at different frequencies

Exponential and logarithmic functions

Question 6 of 10

What type of attention does BERT use?

Sequential attention

Unidirectional attention

Bidirectional attention

Causal attention

Question 7 of 10

What is the purpose of residual connections in transformer layers?

They let gradients flow smoothly during training and make deep networks possible

They enable parallel processing of words

They add positional information to embeddings

They combine outputs from multiple attention heads

Question 8 of 10

How many parameters does GPT-3 have?

175 billion parameters

300 billion parameters

96 billion parameters

1.5 billion parameters

Question 9 of 10

What type of transformer architecture does ChatGPT use?

Encoder-decoder

Encoder-only

Bidirectional encoder

Decoder-only

Question 10 of 10

What is the main reason transformers won over RNNs?

Parallelization - transformers process all words at once instead of sequentially

Lower computational cost per word

Simpler architecture with fewer layers

Better understanding of context and meaning

← Back to Newsletter