Simon Hengchen
Hi! I am an NLP/AI consultant at iguanodon.ai. For the past few years I have also been a lecturer in NLP at the Université de Genève, Switzerland.
This is a quick overview of my academic outputs, research stays, and teaching. For a full curriculum vitae, please email me at hengchen.simon@gmail.com.
Work History
- 2021 - Current: Founder, iguanodon.ai, Belgium
- 2019 - Current: Lecturer, Université de Genève, Switzerland
- 2020 - 2022: Postdoctoral researcher, Göteborgs universitet, Sweden
- 2018 - 2020: Postdoctoral researcher, Helsingin yliopisto, Finland
Education
- 2017: PhD in Information Science, Université libre de Bruxelles. Thesis: "When does it mean? Detecting semantic change in historical texts."
- 2012: MSc in Information Science, Université libre de Bruxelles.
- 2010: MA in Langues et littératures germaniques, Université libre de Bruxelles.
News:
This page has last been updated on 2024-02-21.
- I am giving a talk (in French) about data quality, OCR, and text analyses at the Université libre de Bruxelles on 2024/04/16, in the context of the « Analyse critique et amélioration de la qualité de l’information numérique » FNRS contact group. More info and registration: Link.
- I have been co-organising recent editions of LChange, the International Workshop on Computational Approaches to Historical Language Change. Link.
- I have recently founded iguanodon.ai, a Brussels-based language technology and data science company. Do not hesitate to reach out.
- Our book on computational approaches to semantic change is out! More info and free download: Link.
- We're organising the 2nd International Workshop on Computational Approaches to Historical Language Change 2021 (LChange'21), co-located with ACL2021! More info: Link.
- We're organising a workshop at SLTC 2020! Come and talk about computational language change in November in Gothenburg
- New job! In June I am joining the Language Change project at the University of Gothenburg, Sweden.
- Our SemEval 2020 task on lexical semantic change detection has been accepted! More info: https://languagechange.org/semeval
Publications
All (most?) publications are listed on Google Scholar: Link.
Books and theses
- Tahmasebi, N., Borin, L., Jatowt, A., Xu, Y., & Hengchen, S. (2021). Computational Approaches to Semantic Change. Language Science Press: Berlin. https://langsci-press.org/catalog/book/303
- Hengchen, S., 2017. When does it mean? Detecting semantic change in historical texts. PhD Thesis, Université libre de Bruxelles.
- van Hooland, S., Gillet, F., Hengchen, S., and De Wilde, M., 2016. Introduction aux humanités numériques: méthodes et pratiques. De Boeck supérieur.
Selected papers in peer-reviewed journals
- Hengchen, S. & Tahmasebi, N. (2021). ‘A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data'. Journal of Open Humanities Data, 7, p.2. https://doi.org/10.5334/johd.22
- Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (accepted). ‘A data-driven approach to studying changing vocabularies in historical newspaper collections’. Accepted for publication, Digital Scholarship in the Humanities : DSH.
- Tahmasebi, N., & Hengchen, S. (2019). ‘The Strengths and Pitfalls of Large-Scale Text Mining for Literary Studies’. Samlaren, 140, 198–227. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-406938
- McGillivray, B., Hengchen, S., Lähteenoja, V.E., Palma, M. & Vatri, A. (2019). ‘A computational approach to lexical polysemy in Ancient Greek’ Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz036
- Hill, MJ & Hengchen, S. (2019), ‘Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study’ Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz024
- De Wilde, M., & Hengchen, S. (2017). Semantic Enrichment of a Multilingual Archive with Linked Open Data. Digital Humanities Quarterly, 11(4). http://www.digitalhumanities.org/dhq/vol/11/4/000328/000328.html
Selected papers in peer-reviewed proceedings
- Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., & Tahmasebi, N. (2020). SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the 14th International Workshop on Semantic Evaluation, Barcelona, Spain. Association for Computational Linguistics. Link
- Frossard, E., Coustaty, M., Doucet, A., Jatowt, A. & Hengchen, S. (2020). Dataset for Temporal Analysis of English-French Cognates. in Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC'20). European Language Resources Association (ELRA). Link
- Hämäläinen, M., & Hengchen, S. (2019). From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In G. Angelova, R. Mitkov, I. Nikolova, & I. Temnikova (Eds.), Proceedings of Recent Advances in Natural Language Processing (pp. 432-437). Shoumen: INCOMA. Link
- Perrone, V, Palma, M, Hengchen, S, Vatri, A, Smith, JQ & McGillivray, B 2019, GASC: Genre-Aware Semantic Change for Ancient Greek. in Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change (LChange’19). ACL, Florence, Italy. Link
- Dubossarsky, H., Hengchen, S., Tahmasebi, N. & Schlechtweg, D. 2019, Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change. in 57th Annual Meeting of the Association for Computational Linguistics (ACL2019). ACL, The 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28/07/2019. Link
- Hengchen, S., Coeckelbergs, M., van Hooland, S., Verborgh, R., and Steiner, T., 2016. Exploring archives with probabilistic models: Topic modelling for the valorisation of digitised archives of the European Commission. In First Workshop «Computational Archival Science: digital records in the age of big data», IEEE Big Data, Washington, volume 8.
Peer-reviewed book chapters
- Hengchen, S, Tahmasebi, N., Schlechtweg, D. and Dubossarsky, H. (2021). Challenges for computational lexical semantic change. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen, editors, Computational Approaches to Semantic Change, Language Variation, chapter 11. Language Science Press, Berlin. Preprint
- Perrone V., Hengchen, S, Palma, M., Vatri, A., Smith, J.Q., McGillivray, B. (2021). Lexical semantic change for Ancient Greek and Latin. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen, editors, Computational Approaches to Semantic Change, Language Variation, chapter 9. Language Science Press, Berlin. Preprint
Datasets
- Hengchen, S. & Tahmasebi, N. (2021). A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data. https://zenodo.org/record/4301658
- Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (2019). Models for "A data-driven approach to studying changing vocabularies in historical newspaper collections". https://zenodo.org/record/3585027
- Hengchen, S., Ros, R., and Marjanen, J. (2019). Models for “A data-driven approach to the changing vocabulary of the ‘nation’ in English, Dutch, Swedish and Finnish newspapers, 1750-1950”. https://zenodo.org/record/3270648
Teaching
- Spring 2018: LDA-H502 – Data team, Digital Humanities Hackathon - University of Helsinki
- Spring 2019: LDA-H506 – Introduction to NLP for DH with Python - University of Helsinki
- Spring 2019: LDA-H502 – Group (co-)leader, Digital Humanities Hackathon - University of Helsinki
- Spring 2020: LDA-H506 – Introduction to NLP for DH with Python - University of Helsinki
- Spring 2020: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
- Spring 2021: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
- Spring 2021: LT2402/LT2215 – Masterprojekt - University of Gothenburg
- Spring 2021: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
- Autumn 2021: BTM0908 – Ingénierie linguistique - Université de Genève
- Spring 2022: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
- Spring 2023: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
- Autumn 2023: BTM0908 – Ingénierie linguistique - Université de Genève
Academic visits
- June – July 2019: Cornell University
- Feb. 2019: Hosted Dr Nina Tahmasebi, Dr Haim Dubossarsky, and Mr Dominik Schlechtweg for a one-week academic visit
- Nov. 2018: Hosted Ms Sara Budts for a one-week academic visit
- April and May 2018: Two ten-day academic visits at the Alan Turing Institute, London
- Oct. 2017: Short-term scientific mission at the Alan Turing Institute
- Aug. – Oct. 2015: CENDARI Fellow at Trinity College Dublin