Hi! I am a postdoctoral researcher at the University of Helsinki, where I have joined the COMHIS group. I research semantic change in multilingual, unstructured, OCRed, historical textual data and specialise in topic modelling (LDA) with a special interest on the relation between words and ideas, and how they evolve through time.
I was a visiting researcher at the Long Room Hub of Trinity College Dublin for three months In 2015, and for two weeks at the Alan Turing Institute in 2017. In 2018 I have returned twice to the Turing, both times for a bit more than a week.
I have obtained my PhD from the Université libre de Bruxelles.
I speak French, Dutch and English. I can decipher very basic Norwegian Bokmål.
I code in Python and dabble in R.
In the Spring 2019, I will be teaching “LDA-H506, Method course in digital humanities II: Intro to NLP for DH (Python)”, an elective course in the Master’s Programme “Linguistic Diversity in the Digital Age”.
One day I will make a better webpage for myself, today is not this day.
My most recent publications are here: tuhat
Others are here: difusion
There's also google scholar: scholar
Most important publications/things are, as of today (2018-09-11):
Also this book:
van Hooland, S., Gillet, F., Hengchen, S., and De Wilde, M., 2016. Introduction aux humanités numériques: méthodes et pratiques. De Boeck supérieur.
- Hengchen, S., O’Connor, A., Munnelly, G., and Edmond, J., 2016a. Comparing topic model stability across language and size. In Proceedings of the Japanese Association for Digital Humanities Conference 2016.
- Hengchen, S., Coeckelbergs, M., van Hooland, S., Verborgh, R., and Steiner, T., 2016b. Exploring archives with probabilistic models: Topic modelling for the valorisation of digitised archives of the European Commission. In First Workshop «Computational Archival Science: digital records in the age of big data», Washington, volume 8.
- Hengchen, S., 2017a. When does it mean? Detecting semantic change in historical texts. In Digital Humanities at Oxford Summer School – Text to Tech.
- Hengchen, S., 2017b. Detecting Semantic Change Using LDA in Historical Texts: a Case Study on Dutch. In Language Technology Lab Seminars, Cambridge.
- De Wilde, M., & Hengchen, S., 2017. Semantic Enrichment of a Multilingual Archive with Linked Open Data. Digital Humanities Quarterly, 11(4).
- Hengchen, S., Kanner, A., Marjanen, P., and Mäkelä, E., 2018. Comparing Topic Model Stability Between Finnish, Swedish, English and French. Digital Humanities in the Nordic Countries 2018.
- Lathi, L., Vaara, V., Marjanen, J., Roivainen, H., Ijaz, A., Säily, T., Kanner, A., Hill, M., Mäkelä, E., Tolonen, M., 2018. Quantitative analysis of public discourse in Europe 1470-1910. DHBenelux 2018.
Soon a collective work will be available, with a contribution from McGillivray, B., Buning, R., and Hengchen, S. Title of the collective work is "Reassembling the Republic of Letters: Systems, Standards, Scholarship", edited by Howard Hotson and Thomas Wallnig. Title of our contribution is "Extracting topics over time in the Hartlib Papers."