Simon Hengchen
NOTE: While this page is still factually correct, it won't be updated much anymore.
NEW: Riksbankens Jubileumsfonds has decided to grant us SEK 33.5M to study lexical semantic change in a 6-year programme Change is Key!. The programme is a collaboration between the University of Gothenburg, Queen Mary University London, IMS Stuttgart, KU Leuven, Linköping University, and Lund University.
NEW: I have recently founded iguanodon.ai, a Brussels-based language technology and data science company. Don't hesitate to reach out!
Hi! I am an NLP/AI consultant at a iguanodon.ai. I used to be postdoctoral researcher at Språkbanken Text (University of Gothenburg), where I worked within the Language Change project. My main research focus is lexical semantic change in multilingual, unstructured, OCRed, historical textual data.
This is a quick overview of my academic outputs, research stays, and teaching. For a full curriculum vitae, please email me at hengchen.simon@gmail.com.
News:
This page has last been updated on 2022-09-28.
- I have recently founded iguanodon.ai, a Brussels-based language technology and data science company. Do not hesitate to reach out.
- Our book on computational approaches to semantic change is out! More info and free download: Link.
- We're organising the 2nd International Workshop on Computational Approaches to Historical Language Change 2021 (LChange'21), co-located with ACL2021! More info: Link.
- We're organising a workshop at SLTC 2020! Come and talk about computational language change in November in Gothenburg
- New job! In June I am joining the Language Change project at the University of Gothenburg, Sweden.
- Our SemEval 2020 task on lexical semantic change detection has been accepted! More info: https://languagechange.org/semeval
Education
- 2017: PhD in Information Science, Université libre de Bruxelles. Thesis: "When does it mean? Detecting semantic change in historical texts".
- 2012: MSc in Information Science, Université libre de Bruxelles.
- 2010: MA in Langues et littératures germaniques, Université libre de Bruxelles.
Publications
Books and theses
- Tahmasebi, N., Borin, L., Jatowt, A., Xu, Y., & Hengchen, S. (2021). Computational Approaches to Semantic Change. Language Science Press: Berlin. https://langsci-press.org/catalog/book/303
- Hengchen, S., 2017. When does it mean? Detecting semantic change in historical texts. PhD Thesis, Université libre de Bruxelles.
- van Hooland, S., Gillet, F., Hengchen, S., and De Wilde, M., 2016. Introduction aux humanités numériques: méthodes et pratiques. De Boeck supérieur.
Selected papers in peer-reviewed journals
- Hengchen, S. & Tahmasebi, N. (2021). ‘A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data'. Journal of Open Humanities Data, 7, p.2. https://doi.org/10.5334/johd.22
- Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (accepted). ‘A data-driven approach to studying changing vocabularies in historical newspaper collections’. Accepted for publication, Digital Scholarship in the Humanities : DSH.
- Tahmasebi, N., & Hengchen, S. (2019). ‘The Strengths and Pitfalls of Large-Scale Text Mining for Literary Studies’. Samlaren, 140, 198–227. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-406938
- McGillivray, B., Hengchen, S., Lähteenoja, V.E., Palma, M. & Vatri, A. (2019). ‘A computational approach to lexical polysemy in Ancient Greek’ Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz036
- Hill, MJ & Hengchen, S. (2019), ‘Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study’ Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz024
- De Wilde, M., & Hengchen, S. (2017). Semantic Enrichment of a Multilingual Archive with Linked Open Data. Digital Humanities Quarterly, 11(4). http://www.digitalhumanities.org/dhq/vol/11/4/000328/000328.html
Selected papers in peer-reviewed proceedings
- Schlechtweg, D., McGillivray, B., Hengchen, S., Dubossarsky, H., & Tahmasebi, N. (2020). SemEval-2020 Task 1: Unsupervised Lexical Semantic Change Detection. In Proceedings of the 14th International Workshop on Semantic Evaluation, Barcelona, Spain. Association for Computational Linguistics. Link
- Frossard, E., Coustaty, M., Doucet, A., Jatowt, A. & Hengchen, S. (2020). Dataset for Temporal Analysis of English-French Cognates. in Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC'20). European Language Resources Association (ELRA). Link
- Hämäläinen, M., & Hengchen, S. (2019). From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In G. Angelova, R. Mitkov, I. Nikolova, & I. Temnikova (Eds.), Proceedings of Recent Advances in Natural Language Processing (pp. 432-437). Shoumen: INCOMA. Link
- Perrone, V, Palma, M, Hengchen, S, Vatri, A, Smith, JQ & McGillivray, B 2019, GASC: Genre-Aware Semantic Change for Ancient Greek. in Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change (LChange’19). ACL, Florence, Italy. Link
- Dubossarsky, H., Hengchen, S., Tahmasebi, N. & Schlechtweg, D. 2019, Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change. in 57th Annual Meeting of the Association for Computational Linguistics (ACL2019). ACL, The 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28/07/2019. Link
- Hengchen, S., Coeckelbergs, M., van Hooland, S., Verborgh, R., and Steiner, T., 2016. Exploring archives with probabilistic models: Topic modelling for the valorisation of digitised archives of the European Commission. In First Workshop «Computational Archival Science: digital records in the age of big data», IEEE Big Data, Washington, volume 8.
Peer-reviewed book chapters
- Hengchen, S, Tahmasebi, N., Schlechtweg, D. and Dubossarsky, H. (2021). Challenges for computational lexical semantic change. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen, editors, Computational Approaches to Semantic Change, Language Variation, chapter 11. Language Science Press, Berlin. Preprint
- Perrone V., Hengchen, S, Palma, M., Vatri, A., Smith, J.Q., McGillivray, B. (2021). Lexical semantic change for Ancient Greek and Latin. In Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu, and Simon Hengchen, editors, Computational Approaches to Semantic Change, Language Variation, chapter 9. Language Science Press, Berlin. Preprint
Other things (DH talks, etc.) are available at the following link: Google Scholar.
Datasets
- Hengchen, S. & Tahmasebi, N. (2021). A Collection of Swedish Diachronic Word Embedding Models Trained on Historical Newspaper Data. https://zenodo.org/record/4301658
- Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (2019). Models for "A data-driven approach to studying changing vocabularies in historical newspaper collections". https://zenodo.org/record/3585027
- Hengchen, S., Ros, R., and Marjanen, J. (2019). Models for “A data-driven approach to the changing vocabulary of the ‘nation’ in English, Dutch, Swedish and Finnish newspapers, 1750-1950”. https://zenodo.org/record/3270648
Teaching
- Spring 2018: LDA-H502 – Data team, Digital Humanities Hackathon - University of Helsinki
- Spring 2019: LDA-H506 – Introduction to NLP for DH with Python - University of Helsinki
- Spring 2019: LDA-H502 – Group (co-)leader, Digital Humanities Hackathon - University of Helsinki
- Spring 2020: LDA-H506 – Introduction to NLP for DH with Python - University of Helsinki
- Spring 2020: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
- Spring 2021: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
- Spring 2021: LT2402/LT2215 – Masterprojekt - University of Gothenburg
Academic visits
- June – July 2019: Cornell University
- Feb. 2019: Hosted Dr Nina Tahmasebi, Dr Haim Dubossarsky, and Mr Dominik Schlechtweg for a one-week academic visit
- Nov. 2018: Hosted Ms Sara Budts for a one-week academic visit
- April and May 2018: Two ten-day academic visits at the Alan Turing Institute, London
- Oct. 2017: Short-term scientific mission at the Alan Turing Institute
- Aug. – Oct. 2015: CENDARI Fellow at Trinity College Dublin
Professional service