Indic-faker: Generate realistic Indian synthetic data for NLP/ML — 8 languages, native scripts, batch DataFrame export
Hugging Face Forums [Unofficial]
March 30, 2026
Amazing work on the Indic synthetic profiles dataset! This kind of tooling is super valuable for Indian language NLP , especially for low-resource contexts where real data is limited. Synthetic profiles can really help with pre-training, fine-tuning, and evaluation workflows by boosting diversity in language, scripts, and entity types.
Really appreciate the effort to support multiple languages and scripts — this will make it easier for researchers and developers to build more inclusive models. If you’re planning future releases, it’d be great to see metrics about quality checks , language coverage , or benchmarking insights.
Thanks for contributing this to the community!
Discussion in the ATmosphere