Simon Hengchen

Hi! I am an NLP/AI consultant at iguanodon.ai. For the past few years I have also been a lecturer in NLP at the Université de Genève, Switzerland.

This is a quick overview of my academic outputs, research stays, and teaching. For a full curriculum vitae, please email me at hengchen.simon@gmail.com.

Work History

2021 - Current: Founder, iguanodon.ai, Belgium
2019 - Current: Lecturer, Université de Genève, Switzerland
2020 - 2022: Postdoctoral researcher, Göteborgs universitet, Sweden
2018 - 2020: Postdoctoral researcher, Helsingin yliopisto, Finland

Education

2017: PhD in Information Science, Université libre de Bruxelles. Thesis: "When does it mean? Detecting semantic change in historical texts."
2012: MSc in Information Science, Université libre de Bruxelles.
2010: MA in Langues et littératures germaniques, Université libre de Bruxelles.

News:

This page has last been updated on 2025-05-12.

I have had the pleasure of working with Perla Al Almaoui and Pierrette Bouillon on this cool project that resulted in a dataset and a paper to be presented at MT Summit 2025
I am giving a talk (in French) about data quality, OCR, and text analyses at the Université libre de Bruxelles on 2024/04/16, in the context of the « Analyse critique et amélioration de la qualité de l’information numérique » FNRS contact group. More info and registration: Link.
I have been co-organising recent editions of LChange, the International Workshop on Computational Approaches to Historical Language Change. Link.
I have recently founded iguanodon.ai, a Brussels-based language technology and data science company. Do not hesitate to reach out.
Our book on computational approaches to semantic change is out! More info and free download: Link.
We're organising the 2nd International Workshop on Computational Approaches to Historical Language Change 2021 (LChange'21), co-located with ACL2021! More info: Link.
We're organising a workshop at SLTC 2020! Come and talk about computational language change in November in Gothenburg
New job! In June I am joining the Language Change project at the University of Gothenburg, Sweden.
Our SemEval 2020 task on lexical semantic change detection has been accepted! More info: https://languagechange.org/semeval

Publications

All (most?) publications are listed on Google Scholar: Link. I do not update this list anymore.

Books and theses

Tahmasebi, N., Borin, L., Jatowt, A., Xu, Y., & Hengchen, S. (2021). Computational Approaches to Semantic Change. Language Science Press: Berlin. https://langsci-press.org/catalog/book/303
Hengchen, S., 2017. When does it mean? Detecting semantic change in historical texts. PhD Thesis, Université libre de Bruxelles.
van Hooland, S., Gillet, F., Hengchen, S., and De Wilde, M., 2016. Introduction aux humanités numériques: méthodes et pratiques. De Boeck supérieur.

Selected papers in peer-reviewed journals

Hengchen, S. & Tahmasebi, N. (2021). ‘A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data'. Journal of Open Humanities Data, 7, p.2. https://doi.org/10.5334/johd.22
Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (accepted). ‘A data-driven approach to studying changing vocabularies in historical newspaper collections’. Accepted for publication, Digital Scholarship in the Humanities : DSH.
Tahmasebi, N., & Hengchen, S. (2019). ‘The Strengths and Pitfalls of Large-Scale Text Mining for Literary Studies’. Samlaren, 140, 198–227. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-406938
McGillivray, B., Hengchen, S., Lähteenoja, V.E., Palma, M. & Vatri, A. (2019). ‘A computational approach to lexical polysemy in Ancient Greek’ Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz036
Hill, MJ & Hengchen, S. (2019), ‘Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study’ Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz024
De Wilde, M., & Hengchen, S. (2017). Semantic Enrichment of a Multilingual Archive with Linked Open Data. Digital Humanities Quarterly, 11(4). http://www.digitalhumanities.org/dhq/vol/11/4/000328/000328.html

Selected papers in peer-reviewed proceedings

Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., & Tahmasebi, N. (2020). SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the 14th International Workshop on Semantic Evaluation, Barcelona, Spain. Association for Computational Linguistics. Link
Frossard, E., Coustaty, M., Doucet, A., Jatowt, A. & Hengchen, S. (2020). Dataset for Temporal Analysis of English-French Cognates. in Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC'20). European Language Resources Association (ELRA). Link
Hämäläinen, M., & Hengchen, S. (2019). From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In G. Angelova, R. Mitkov, I. Nikolova, & I. Temnikova (Eds.), Proceedings of Recent Advances in Natural Language Processing (pp. 432-437). Shoumen: INCOMA. Link
Perrone, V, Palma, M, Hengchen, S, Vatri, A, Smith, JQ & McGillivray, B 2019, GASC: Genre-Aware Semantic Change for Ancient Greek. in Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change (LChange’19). ACL, Florence, Italy. Link
Dubossarsky, H., Hengchen, S., Tahmasebi, N. & Schlechtweg, D. 2019, Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change. in 57th Annual Meeting of the Association for Computational Linguistics (ACL2019). ACL, The 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28/07/2019. Link
Hengchen, S., Coeckelbergs, M., van Hooland, S., Verborgh, R., and Steiner, T., 2016. Exploring archives with probabilistic models: Topic modelling for the valorisation of digitised archives of the European Commission. In First Workshop «Computational Archival Science: digital records in the age of big data», IEEE Big Data, Washington, volume 8.

Peer-reviewed book chapters

Hengchen, S, Tahmasebi, N., Schlechtweg, D. and Dubossarsky, H. (2021). Challenges for computational lexical semantic change. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen, editors, Computational Approaches to Semantic Change, Language Variation, chapter 11. Language Science Press, Berlin. Preprint
Perrone V., Hengchen, S, Palma, M., Vatri, A., Smith, J.Q., McGillivray, B. (2021). Lexical semantic change for Ancient Greek and Latin. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen, editors, Computational Approaches to Semantic Change, Language Variation, chapter 9. Language Science Press, Berlin. Preprint

Datasets

Hengchen, S. & Tahmasebi, N. (2021). A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data. https://zenodo.org/record/4301658
Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (2019). Models for "A data-driven approach to studying changing vocabularies in historical newspaper collections". https://zenodo.org/record/3585027
Hengchen, S., Ros, R., and Marjanen, J. (2019). Models for “A data-driven approach to the changing vocabulary of the ‘nation’ in English, Dutch, Swedish and Finnish newspapers, 1750-1950”. https://zenodo.org/record/3270648

Teaching

Spring 2018: LDA-H502 – Data team, Digital Humanities Hackathon - University of Helsinki
Spring 2019: LDA-H506 – Introduction to NLP for DH with Python - University of Helsinki
Spring 2019: LDA-H502 – Group (co-)leader, Digital Humanities Hackathon - University of Helsinki
Spring 2020: LDA-H506 – Introduction to NLP for DH with Python - University of Helsinki
Spring 2020: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
Spring 2021: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
Spring 2021: LT2402/LT2215 – Masterprojekt - University of Gothenburg
Spring 2021: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
Autumn 2021: BTM0908 – Ingénierie linguistique - Université de Genève
Spring 2022: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
Spring 2023: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
Autumn 2023: BTM0908 – Ingénierie linguistique - Université de Genève

Academic visits

June – July 2019: Cornell University
Feb. 2019: Hosted Dr Nina Tahmasebi, Dr Haim Dubossarsky, and Mr Dominik Schlechtweg for a one-week academic visit
Nov. 2018: Hosted Ms Sara Budts for a one-week academic visit
April and May 2018: Two ten-day academic visits at the Alan Turing Institute, London
Oct. 2017: Short-term scientific mission at the Alan Turing Institute
Aug. – Oct. 2015: CENDARI Fellow at Trinity College Dublin