Simon Hengchen
Hi! I am a postdoctoral researcher at Språkbanken Text (University of Gothenburg), where I work within the Language Change project. My main research focus is lexical semantic change in multilingual, unstructured, OCRed, historical textual data. I can also do most text mining stuff.
This is a quick overview of my academic outputs, research stays, and teaching. For a full curriculum vitae, please email me at hengchen.simon@gmail.com.
News:
This page has last been updated on 2020-08-11.
Education
- 2017: PhD in Information Science, Université libre de Bruxelles. Thesis: "When does it mean? Detecting semantic change in historical texts".
- 2012: MSc in Information Science, Université libre de Bruxelles.
- 2010: MA in Langues et littératures germaniques, Université libre de Bruxelles.
Publications
Books and theses
- Hengchen, S., 2017. When does it mean? Detecting semantic change in historical texts. PhD Thesis, Université libre de Bruxelles.
- van Hooland, S., Gillet, F., Hengchen, S., and De Wilde, M., 2016. Introduction aux humanités numériques: méthodes et pratiques. De Boeck supérieur.
Selected papers in peer-reviewed journals
- Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (accepted). ‘A data-driven approach to studying changing vocabularies in historical newspaper collections’. Accepted for publication, Digital Scholarship in the Humanities : DSH.
- Tahmasebi, N., & Hengchen, S. (2019). ‘The Strengths and Pitfalls of Large-Scale Text Mining for Literary Studies’. Samlaren, 140, 198–227. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-406938
- McGillivray, B., Hengchen, S., Lähteenoja, V.E., Palma, M. & Vatri, A. (2019). ‘A computational approach to lexical polysemy in Ancient Greek’ Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz036
- Hill, MJ & Hengchen, S. (2019), ‘Quantifying the impact of dirty OCR on historical text analysis: Eighteenth Century Collections Online as a case study’ Digital Scholarship in the Humanities : DSH. https://doi.org/10.1093/llc/fqz024
- De Wilde, M., & Hengchen, S. (2017). Semantic Enrichment of a Multilingual Archive with Linked Open Data. Digital Humanities Quarterly, 11(4). http://www.digitalhumanities.org/dhq/vol/11/4/000328/000328.html
Selected papers in peer-reviewed proceedings
- Frossard, E., Coustaty, M., Doucet, A., Jatowt, A. & Hengchen, S. (2020). Dataset for Temporal Analysis of English-French Cognates. in Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC'20). European Language Resources Association (ELRA). Link
- Hämäläinen, M., & Hengchen, S. (2019). From the Paft to the Fiiture: a Fully Automatic NMT and Word Embeddings Method for OCR Post-Correction. In G. Angelova, R. Mitkov, I. Nikolova, & I. Temnikova (Eds.), Proceedings of Recent Advances in Natural Language Processing (pp. 432-437). Shoumen: INCOMA. Link
- Perrone, V, Palma, M, Hengchen, S, Vatri, A, Smith, JQ & McGillivray, B 2019, GASC: Genre-Aware Semantic Change for Ancient Greek. in Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change (LChange’19). ACL, Florence, Italy. Link
- Dubossarsky, H., Hengchen, S., Tahmasebi, N. & Schlechtweg, D. 2019, Time-Out: Temporal Referencing for Robust Modeling of Lexical Semantic Change. in 57th Annual Meeting of the Association for Computational Linguistics (ACL2019). ACL, The 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28/07/2019. Link
- Hengchen, S., Coeckelbergs, M., van Hooland, S., Verborgh, R., and Steiner, T., 2016. Exploring archives with probabilistic models: Topic modelling for the valorisation of digitised archives of the European Commission. In First Workshop «Computational Archival Science: digital records in the age of big data», IEEE Big Data, Washington, volume 8.
Other things (DH talks, etc.) are available at the following link: Google Scholar.
Datasets
- Hengchen, S., Ros, R., and Marjanen, J., and Tolonen, M. (2019). Models for "A data-driven approach to studying changing vocabularies in historical newspaper collections". https://zenodo.org/record/3585027
- Hengchen, S., Ros, R., and Marjanen, J. (2019). Models for “A data-driven approach to the changing vocabulary of the ‘nation’ in English, Dutch, Swedish and Finnish newspapers, 1750-1950”. https://zenodo.org/record/3270648
Teaching
- Spring 2018: LDA-H502 – Data team, Digital Humanities Hackathon - University of Helsinki
- Spring 2019: LDA-H506 – Introduction to NLP for DH with Python - University of Helsinki
- Spring 2019: LDA-H502 – Group (co-)leader, Digital Humanities Hackathon - University of Helsinki
- Spring 2020: LDA-H506 – Introduction to NLP for DH with Python - University of Helsinki
- Spring 2020: TIC-MATIM – Technologies de l’information et de la communication - Université de Genève
Academic visits
- June – July 2019: Cornell University
- Feb. 2019: Hosted Dr Nina Tahmasebi, Dr Haim Dubossarsky, and Mr Dominik Schlechtweg for a one-week academic visit
- Nov. 2018: Hosted Ms Sara Budts for a one-week academic visit
- April and May 2018: Two ten-day academic visits at the Alan Turing Institute, London
- Oct. 2017: Short-term scientific mission at the Alan Turing Institute
- Aug. – Oct. 2015: CENDARI Fellow at Trinity College Dublin
Professional service