Website Source: blog / sae_macaronic_blog

Summary

Pending synthesis from local website source.

Original source title: Do LLMs Have a Secret Interlingua? Testing GPT-2's Hinglish Comprehension with Sparse Autoencoders

Extracted Preview

Note: I grew up switching between Hindi and English mid-sentence without thinking about it. One afternoon I started wondering if GPT-2, trained only on English, had any idea what was happening when you dropped a Hindi word in. Two days later I had a GitHub repo. Worth it.

Do LLMs Have a Secret Interlingua? Testing GPT-2's Hinglish Comprehension with Sparse Autoencoders

Project Home: [Github Repo](https://github.com/yash-srivastava19/sae-macaronic-analysis)

---

The Question That Wouldn't Leave Me Alone

Something I've always been curious about: when bilingual speakers code-switch - dropping Hindi words mid-English sentence, the way you do when you're talking to your family - does a model like GPT-2 have *any* idea what's happening semantically? Or does it just see an unknown token and shrug?

The specific form of this question that grabbed me was: does GPT-2 fire the same internal features for "gaana" (Hindi for "song") as it does for "song," when both appear in otherwise identical sentences?

If the answer is yes - even partially - that's genuinely surprising. GPT-2-small was not trained on Hindi. It was trained on English web text. There is no reason it *should* generalize. But language models are strange things, and I've learned to not assume I know what they know.

This is a mechanistic interpretability experiment. The method: Sparse Autoencoders (SAEs). The dataset: 8 matched English/Hinglish sentence pairs, differing by exactly one word. The result: partial, inconclusive, and worth talking about.

Integration Notes

Source section: blog
Local source: /home/yashs/Desktop/Programming/yash_blog/yash-srivastava19.github.io/blog/sae_macaronic_blog.md
Raw copy: raw/website/yash-srivastava19-github-io/blog/sae_macaronic_blog.md