CS 7643 / CS7643 Quiz 4 (Latest Update 2025 / 2026) Deep Learning | Questions & Answers | Grade A | 100% Correct - Georgia Tech
Question:
Teacher Forcing
Answer:
- next input to model is not predicted value, but the actual value from the
- allows model to train effectively even if a mistake was made
- if used instead of hidden-to-hidden recurrence nodes, can allow for
- emerges from MLE
- issues may arise if network is later going to be used in "closed-loop" mode
training data
parallelization, but model becomes less powerful
where output is fed back as input
- / 3
Question:
Skip-Gram Model: Loss/Objective Function
Answer:
Loss - for each position t, we try to predict the context words within a fixed window size given some context word
- multiple these probabilities to get a likelihood
- L(theta) = product(product(P(w_(t+j) | w_(t) ; theta))
- Objective function: J(theta) = - 1/T log(L(theta))
Question:
Skip-Gram Model: Calculate P(w_(t+j) | w_(t) ; theta)
Answer:
- Two vectors for each word:
- u_w when w is center word
- v_o when o is a context word
- uses inner product (u_w, v_o) to measure how likely it is that u_w appears
with context word v_o
- P(w_(t+j)) = SOFTMAX(u_(wt) * v(w_(t+1)))
- params to optimize are thus u and w
- / 3
Question:
Skip-Gram Model: Main Disadvantage
Answer:
- Expensive to compute
- Can solve this via hierarchical Softmax
- Can solve this via Negative Sampling
Question:
Skip-Gram Model: Negative Sampling
Answer:
- for each (w, c) pair, sample k negative pairs (w, c')
- Maximize probability that outside word appears, minimize probability that
- choose a distribution that samples less frequent words likely
random word appears
Question:
Word Embeddings as a graph
Answer:
- each word is a node with edge connections to context words
- / 3