Slides: click
Talk based on:
- Hill, M.J. and Hengchen, S., 2019. Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study. Digital Scholarship in the Humanities, 34(4), pp.825-843. link; email me if you do not have access
- Hämäläinen, M. and Hengchen, S., 2019. From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In Recent Advances in Natural Language Processing (pp. 432-437). INCOMA link
- Duong, Q., Hämäläinen, M. and Hengchen, S., 2020. An Unsupervised method for OCR Post-Correction and Spelling Normalisation for Finnish. arXiv preprint arXiv:2011.03502. link