avatar

Hello,
my name is Simon28yo and this is my resume

About Me

I am a PhD Candidate at the Université libre de Bruxelles, and have been for nearly three years. I research information extraction in multilingual, unstructured, OCRed, historical textual data and specialise in topic modelling (LDA). Before focusing on LDA, I tackled named-entity recognition, too.

A PDF version is available here.

Education

  • 1988

    I was born in Belgium

    ... at a very young age, to Mr and Mrs Hengchen-Dubois.

  • 2005

    Secondary school specialising in languages and science

    I did my last four years of secondary school at Collège Saint-Michel, in Brussels, where I focused on languages and sciences.

  • 2008

    BA in Germanic languages

    At the Université libre de Bruxelles, I studied languages -- mostly English and Dutch.

  • 2010

    MA in Germanic languages

    The BA was quickly followed by an MA, degree for which I did an internship in an NGO in South Eastern India, Volontariat.

  • 2012

    MSc in Information Science and Technologies, specialising in Natural Language Processing

    Wanting to be prepared for a life in the 21st century, I chose to tackle this challenge and enrolled in the MaSTIC program. Courses included Algorithmics, Programming (C++), Natural Language Processing, Databases, and Library Science.

  • 2017

    PhD in Information Science and Technologies, specialising in Natural Language Processing

    I research information extraction in multilingual, unstructured, OCRed, historical textual data and specialise in topic modelling (LDA).

Experience

Trinity College Dublin
2015
CENDARI Fellow
In the context of the CENDARI project, I have been invited to research a 3.1 million pages dataset pertaining to the daily life of the city of Ypres, in Belgium. This research was carried through at the Long Room Hub for three months.
Université libre de Bruxelles
2013-2017
PhD Candidate
I research information extraction in multilingual, unstructured, OCRed, historical textual data and specialise in topic modelling (LDA). Before focusing on LDA, I tackled named-entity recognition, too. I am the beneficiary of a BELSPO grant and work on the TIC Belgium project, a project aiming to develop a virtual research environment to help historians delve into millions of pages of historical, textual data with a focus on transnational intellectual cooperation.
As a representative of the scientific community, I also take part in various scientific commissions and am a full member of the Faculty Council.
GDF Suez (now ENGIE)
2013
Young Knowledge Officer
As part of the Department of Strategic Watch and Analysis, my tasks were, in a nutshell, to monitor various sources of information and dispatch relevant data to other departments. As a consequence of an NDA, any other information query should be addressed to my then-supervisor, Yohann Delzant.

Publications

Preprint
2016
How hot is .brussels?
Analysis of the uptake of the .brussels top-level domain name extension
This paper presents an analysis of the uptake of the .brussels domain name extension. A quantitative approach of the dataset determines several characteristics of the gTLDN, such as the name and country of registrants of .brussels domains, or the number of redirections vs used-as-such domains. A more qualitative analysis, based on a representative sample, indicates the language of the .brussels websites, the commercial sectors that use them, and whether there is a direct link to the city of Brussels.
Code and preprint available on howhotis.brussels.
De Boeck Université
2016
Introduction aux humanités numériques
Méthodes et pratiques numériques en sciences humaines
et sociales
This book, co-written with Seth van Hooland, Max De Wilde and Florence Gillet, introduces digital methods to humanities students. The book tackles information searching, data modeling, digitisation best practices and data analysis.
Digital Humanities Quarterly
Accepted for publication
Semantic Enrichment of a Multilingual Archive with Linked Open Data
This paper, co-written with Max De Wilde, presents MERCKX, a novel tool to semi-automatically enrich a multilingual archive with Linked Open Data. Using a 3.1 million pages dataset focusing on the city of Ypres in Belgium, we introduce a robust language-independent system that beats state-of-the art solutions.
I2D
2015
L'extraction des entités nommées : une opportunité pour le secteur culturel ?
This paper, co-written with Seth van Hooland, Ruben Verborgh and Max De Wilde, evaluates different NER services on a historical, French-language corpus. By doing so, we demonstrate that it is possible for libraries, archives and museums (LAMs) and, by extension most cultural heritage institutions, to easily enrich their datasets with Linked Data URIs in a low-cost way. PDF available on CAIRN.info.

Skills

Natural Language Processing & Text Mining
Topic Modelling
Named-Entity Recognition
Semantic Web
Linked Data
Linux
Python
Beginner
Proficient
Master
Expert

Conferences, Workshops, Trainings

JADH
2016, Tokyo
Comparing Topic Model Stability across Language and Size
UCSB
2016, UCSB
Topic modelling in the Library
University of California: Santa Barbara libraries. Slides.
DHBenelux
2015, Universiteit Antwerp
Semantic Enrichment of a Multilingual Archive with Linked Open Data
DHBenelux
2014, The Hague
NER as a gateway drug to the Linked Data cloud: Application of Named-Entity Recognition on cultural heritage metadata
Digital Humanities FNRS
2014, UCLouvain
Named-Entity Recognition et Linked Data: quelle valeur ajoutée pour les archives ?

Member of boards and committees

Reviewer
ISSN: 2095-9230
Frontiers of Information Technology & Electronic Engineering
Founding Member
2013 -
FNRS contact group for Digital Humanities
Member
2013 -
TIC Belgium Technical Committee
Member
Programme Committee
DHBenelux 2016

Languages

French
English
Dutch
Norwegian Bokmål
Modern Hebrew
Tamil

Hobbies

Boxing
Running
Weird languages
Robotics and automation
Electronics
Norse mythology