Work History
- 2021 – Current: Founder, iguanodon.ai, Belgium
- 2019 – Current: Lecturer, Université de Genève, Switzerland
- 2020 – 2022: Postdoctoral researcher, Göteborgs universitet, Sweden
- 2018 – 2020: Postdoctoral researcher, Helsingin yliopisto, Finland
Education
- 2017: PhD in Information Science, Université libre de Bruxelles. Thesis: When does it mean? Detecting semantic change in historical texts.
- 2012: MSc in Information Science, Université libre de Bruxelles.
- 2010: MA in Langues et littératures germaniques, Université libre de Bruxelles.
News
Last updated: 2026-03-24
- Perla and Jonathan are in Rabat to present our latest work (https://aclanthology.org/2026.vardial-1.27/) at EACL 2026.
- I have had the pleasure of working with Perla Al Almaoui and Pierrette Bouillon on this cool project that resulted in a dataset and a paper to be presented at MT Summit 2025.
- I am giving a talk (in French) about data quality, OCR, and text analyses at the Université libre de Bruxelles on 2024/04/16, in the context of the « Analyse critique et amélioration de la qualité de l'information numérique » FNRS contact group. More info and registration: Link.
- I have been co-organising recent editions of LChange, the International Workshop on Computational Approaches to Historical Language Change. Link.
- I have recently founded iguanodon.ai, a Brussels-based language technology and data science company. Do not hesitate to reach out.
- Our book on computational approaches to semantic change is out! More info and free download: Link.
- We're organising the 2nd International Workshop on Computational Approaches to Historical Language Change 2021 (LChange'21), co-located with ACL2021! More info: Link.
- We're organising a workshop at SLTC 2020! Come and talk about computational language change in November in Gothenburg.
- New job! In June I am joining the Language Change project at the University of Gothenburg, Sweden.
- Our SemEval 2020 task on lexical semantic change detection has been accepted! More info: https://languagechange.org/semeval
Publications
All (most?) publications are listed on Google Scholar: Link. I do not update this list anymore.
Books and theses
- Tahmasebi, N., Borin, L., Jatowt, A., Xu, Y., & Hengchen, S. (2021). Computational Approaches to Semantic Change. Language Science Press: Berlin. https://langsci-press.org/catalog/book/303
- Hengchen, S., 2017. When does it mean? Detecting semantic change in historical texts. PhD Thesis, Université libre de Bruxelles.
- van Hooland, S., Gillet, F., Hengchen, S., and De Wilde, M., 2016. Introduction aux humanités numériques: méthodes et pratiques. De Boeck supérieur.
Selected papers in peer-reviewed journals
- Hengchen, S. & Tahmasebi, N. (2021). 'A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data'. Journal of Open Humanities Data, 7, p.2. https://doi.org/10.5334/johd.22
- Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (accepted). 'A data-driven approach to studying changing vocabularies in historical newspaper collections'. Accepted for publication, Digital Scholarship in the Humanities : DSH.
- Tahmasebi, N., & Hengchen, S. (2019). 'The Strengths and Pitfalls of Large-Scale Text Mining for Literary Studies'. Samlaren, 140, 198–227. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-406938
- McGillivray, B., Hengchen, S., Lähteenoja, V.E., Palma, M. & Vatri, A. (2019). 'A computational approach to lexical polysemy in Ancient Greek'. Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz036
- Hill, MJ & Hengchen, S. (2019). 'Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study'. Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz024
- De Wilde, M., & Hengchen, S. (2017). Semantic Enrichment of a Multilingual Archive with Linked Open Data. Digital Humanities Quarterly, 11(4). http://www.digitalhumanities.org/dhq/vol/11/4/000328/000328.html
Selected papers in peer-reviewed proceedings
- Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., & Tahmasebi, N. (2020). SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the 14th International Workshop on Semantic Evaluation, Barcelona, Spain. Association for Computational Linguistics. Link
- Frossard, E., Coustaty, M., Doucet, A., Jatowt, A. & Hengchen, S. (2020). Dataset for Temporal Analysis of English-French Cognates. In Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC'20). European Language Resources Association (ELRA). Link
- Hämäläinen, M., & Hengchen, S. (2019). From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In G. Angelova, R. Mitkov, I. Nikolova, & I. Temnikova (Eds.), Proceedings of Recent Advances in Natural Language Processing (pp. 432–437). Shoumen: INCOMA. Link
- Perrone, V., Palma, M., Hengchen, S., Vatri, A., Smith, J.Q. & McGillivray, B. (2019). GASC: Genre-Aware Semantic Change for Ancient Greek. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change (LChange'19). ACL, Florence, Italy. Link
- Dubossarsky, H., Hengchen, S., Tahmasebi, N. & Schlechtweg, D. (2019). Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL2019). Florence, Italy. Link
- Hengchen, S., Coeckelbergs, M., van Hooland, S., Verborgh, R., and Steiner, T. (2016). Exploring archives with probabilistic models: Topic modelling for the valorisation of digitised archives of the European Commission. In First Workshop «Computational Archival Science: digital records in the age of big data», IEEE Big Data, Washington, volume 8.
Peer-reviewed book chapters
- Hengchen, S., Tahmasebi, N., Schlechtweg, D. and Dubossarsky, H. (2021). Challenges for computational lexical semantic change. In Tahmasebi, Borin, Jatowt, Xu, and Hengchen (eds.), Computational Approaches to Semantic Change, chapter 11. Language Science Press, Berlin. Preprint
- Perrone, V., Hengchen, S., Palma, M., Vatri, A., Smith, J.Q., McGillivray, B. (2021). Lexical semantic change for Ancient Greek and Latin. In Tahmasebi, Borin, Jatowt, Xu, and Hengchen (eds.), Computational Approaches to Semantic Change, chapter 9. Language Science Press, Berlin. Preprint
Datasets
- Hengchen, S. & Tahmasebi, N. (2021). A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data. https://zenodo.org/record/4301658
- Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (2019). Models for "A data-driven approach to studying changing vocabularies in historical newspaper collections". https://zenodo.org/record/3585027
- Hengchen, S., Ros, R., and Marjanen, J. (2019). Models for "A data-driven approach to the changing vocabulary of the 'nation' in English, Dutch, Swedish and Finnish newspapers, 1750–1950". https://zenodo.org/record/3270648
Teaching
- Spring 2018: LDA-H502 – Data team, Digital Humanities Hackathon · University of Helsinki
- Spring 2019: LDA-H506 – Introduction to NLP for DH with Python · University of Helsinki
- Spring 2019: LDA-H502 – Group (co-)leader, Digital Humanities Hackathon · University of Helsinki
- Spring 2020: LDA-H506 – Introduction to NLP for DH with Python · University of Helsinki
- Spring 2020: TIC-MATIM – Technologies de l'information et de la communication · Université de Genève
- Spring 2021: TIC-MATIM – Technologies de l'information et de la communication · Université de Genève
- Spring 2021: LT2402/LT2215 – Masterprojekt · University of Gothenburg
- Spring 2021: TIC-MATIM – Technologies de l'information et de la communication · Université de Genève
- Autumn 2021: BTM0908 – Ingénierie linguistique · Université de Genève
- Spring 2022: TIC-MATIM – Technologies de l'information et de la communication · Université de Genève
- Spring 2023: TIC-MATIM – Technologies de l'information et de la communication · Université de Genève
- Autumn 2023: BTM0908 – Ingénierie linguistique · Université de Genève
- Spring 2024: TIC-MATECH – Méthodes et pratiques numériques · Université de Genève
- Autumn 2025: TIC-MATECH – Méthodes et pratiques numériques · Université de Genève
- Spring 2026: BTM0908 – Ingénierie linguistique · Université de Genève