External Publication
Visit Post

PiC/phrase_retrieval dataset (PR-pass & PR-page) is broken — does anyone have a local copy?

Hugging Face Forums [Unofficial] May 5, 2026
Source

Hey everyone,

I’ve been trying to use the 'PiC (Phrase-in-Context) Phrase Retrieval dataset from HuggingFace (PiC/phrase_retrieval, configs: PR-pass and PR-page) but the loader is broken because the underlying data files hosted at auburn.edu/~tmp0038/PiC/ are returning a ‘403 Forbidden’ error.

The HuggingFace dataset loader depends entirely on that external Auburn University server, so the dataset is currently unusable for anyone trying to load it programmatically.

I’ve already reached out to the authors (Thang Pham and Anh), but unfortunately got no positive response yet.

If anyone: Downloaded this dataset before the server went down and has the raw JSON files (train-v1.0.json, dev-v1.0.json, test-v1.0.json) for either PR-pass or PR-page; I would really appreciate if you could share.

Thanks in advance!

Discussion in the ATmosphere

Loading comments...