Dataset
Open on Hugging Facewxl_amh
The speech dataset that should lead the page. It makes the Amharic voice-data work legible immediately and ties directly to the Waxal gap you called out.
- 3k rows
- 988 audio files in repo tree
- Speech + transcription
Documentation
- Best public proof point for the voice-data side of the work.
- Useful for ASR training, transcription workflows, and speech evaluation in Amharic.
- Pairs naturally with the Shook speech models on the same page.