CS 7643 QUIZ 4 (ACTUAL / ) 40 QUESTIONS AND
VERIFIED ANSWERS/UPDATE| 100%
Evaluating Word Embeddings Extrinsic - --aNSWERS---- Evaluation on real task
- Can take a long time to compute
- Unclear if the subsystem is the problem or its interaction
- if replacing exactly one subsystem with another improves
accuracy -> winning
Why Graph Embeddings - --aNSWERS---- They are a form of unsupervised learning on graphs
- Results in task-agnostic entity representations
- Features are useful on downstream tasks without much data
- Nearest Neighbors are semantically meaningful
Graph Embeddings Loss Function - --aNSWERS---- Margin loss between the score of an edge f(e) and a negative sampled edge f(e')
- Negative sampled edges are constructed by taking real edge
and replacing either the source or destination vertex with a random node
- the score of an edge f(e) is a similarity (dot product) between
the source embedding and a transformed version of the destination embedding 1 / 2
- f(e) = cos( theta(s) , theta(d) + theta(r) )
Graph Embedding is Slow: Reason and Solution - --aNSWERS-
--- Training time dominated by computing scores for "fake edges"
- Corrupt a sub-batch of edges with the same set of random
nodes
Debiasing word2vec - --aNSWERS---- identify gender subspace with gendered words
- project all words onto this subspace
- subtract those projections from the original word
Problem: Not that effective and bias pervades the word
embedding space
t-SNE things to remember - --aNSWERS---1. Run until it stabilizes
- Set perplexity b/w 2 and N
- perplexity loosely measures # neighbors
- balances b/w local and global aspects of nodes
- Re-run t-SNE multiple times to ensure we get the same
- / 2
shape