Website Source: blog / sae_macaronic_blog
Summary
Pending synthesis from local website source.
Original source title: Do LLMs Have a Secret Interlingua? Testing GPT-2's Hinglish Comprehension with Sparse Autoencoders
Extracted Preview
Note: I grew up switching between Hindi and English mid-sentence without thinking about it. One afternoon I started wondering if GPT-2, trained only on English, had any idea what was happening when you dropped a Hindi word in. Two days later I had a GitHub repo. Worth it.
Do LLMs Have a Secret Interlingua? Testing GPT-2's Hinglish Comprehension with Sparse Autoencoders
- Project Home: [Github Repo](https://github.com/yash-srivastava19/sae-macaronic-analysis)
---
The Question That Wouldn't Leave Me Alone
Something I've always been curious about: when bilingual speakers code-switch - dropping Hindi words mid-English sentence, the way you do when you're talking to your family - does a model like GPT-2 have *any* idea what's happening semantically? Or does it just see an unknown token and shrug?
The specific form of this question that grabbed me was: does GPT-2 fire the same internal features for "gaana" (Hindi for "song") as it does for "song," when both appear in otherwise identical sentences?
If the answer is yes - even partially - that's genuinely surprising. GPT-2-small was not trained on Hindi. It was trained on English web text. There is no reason it *should* generalize. But language models are strange things, and I've learned to not assume I know what they know.
This is a mechanistic interpretability experiment. The method: Sparse Autoencoders (SAEs). The dataset: 8 matched English/Hinglish sentence pairs, differing by exactly one word. The result: partial, inconclusive, and worth talking about.
Integration Notes
- Source section:
blog - Local source:
/home/yashs/Desktop/Programming/yash_blog/yash-srivastava19.github.io/blog/sae_macaronic_blog.md - Raw copy:
raw/website/yash-srivastava19-github-io/blog/sae_macaronic_blog.md