Simon Hengchen

Hi! I am a postdoctoral researcher at the University of Helsinki, where I have joined the COMHIS group. I research semantic change in multilingual, unstructured, OCRed, historical textual data and specialise in topic modelling (LDA) with a special interest on the relation between words and ideas, and how they evolve through time.

I was a visiting researcher at the Long Room Hub of Trinity College Dublin for three months In 2015, and for two weeks at the Alan Turing Institute in 2017. In 2018 I have returned twice to the Turing, both times for a bit more than a week.

I have obtained my PhD from the Université libre de Bruxelles.

I speak French, Dutch and English. I can decipher very basic Norwegian Bokmål. I code in Python and dabble in R.

In the Spring 2019, I will be teaching “LDA-H506, Method course in digital humanities II: Intro to NLP for DH (Python)”, an elective course in the Master’s Programme “Linguistic Diversity in the Digital Age”.
One day I will make a better webpage for myself, today is not this day.

My most recent publications are here: tuhat
Others are here: difusion
There's also google scholar: scholar

Most important publications/things are, as of today (2018-09-11): Also this book: van Hooland, S., Gillet, F., Hengchen, S., and De Wilde, M., 2016. Introduction aux humanités numériques: méthodes et pratiques. De Boeck supérieur.

Soon a collective work will be available, with a contribution from McGillivray, B., Buning, R., and Hengchen, S. Title of the collective work is "Reassembling the Republic of Letters: Systems, Standards, Scholarship", edited by Howard Hotson and Thomas Wallnig. Title of our contribution is "Extracting topics over time in the Hartlib Papers."