Janosch Haber

Based in Amsterdam, the Netherlands · mail@janoschhaber.com

I am a second year Master's student in the Artificial Intelligence track of the University of Amsterdam. My interests mostly lie in Natural Language Processing, with a focus on capturing the intentions of speakers to better grasp the meaning of words and sentences in their utterances - and vice versa. I'm currently working on my Master's Thesis about partner-specificity in visually-grounded dialogue.

You can download my CV here.

Education

Master of Science (MSc) in Artificial Intelligence

University of Amsterdam (UvA), The Netherlands
Focus on the interface of Natural Language Processing and Machine Learning

Master Thesis written under supervision of Dr. Raquel Fernández and Dr. Elia Bruni.
Topic: Partner-Specificity in Visually-Grounded Dialogue.
Supported by a Facebook ParlAI Research Award

GPA: 8.7

August 2016 - July 2018

Bachelor of Science (BSc) in Artificial Intelligence

University of Amsterdam (UvA), The Netherlands

Bachelor Tehsis written under supervision of Dr. Roberto Valenti.
Topic: Modeling Distributed Cybernetic Management for Resource Based Economies - A simulation approach to Stafford Beer’s 1971 CyberSyn Project.

Cum Laude, GPA: 8.3

August 2012 - July 2015

Zeugnis der Allgemeinen Hochschulreife (Abitur)

Internatsschule Schloss Hansenberg (ISH), Johannisberg, Germany

Advanced courses in Mathematics, Chemistry and Politics & Economy

Final grade: 1.6

August 2009 - July 2011

Research

Partner-specificity in Visually-grounded Dialogue (Master's Thesis)

In 1992, Clark and Schober famously stated that “The common misconception is that language has to do with words and what they mean. It doesn’t. It has to do with people and what they mean.” In order to understand what people mean however, we need some context. In dialogue, this context is the common ground developed by the speakers through previous conversations and often can only be interpreted when realizing that conventions created in this context are unique to that specific pairing of speakers.

In this research we focus on visually-grounded dialogue, so conversations about things that speakers can see, to learn how referring expressions are collaboratively formed - and updated when referred objects are encountered repeatedly. To this end we collect a novel dataset with previously unavailable data of a fixed speaker dyad talking about a controlled set of objects in multiple rounds of a image identification task and present several baselines to approach this task computationally.

Ongoing

Topic Segmentation in Spoken Dialogue through Convergence in Utterance Complexity

In 2018, Xu and Reitter introduced a novel, information-theoretic view of dialogue, in which they proposed modeling a conversation between two interlocutors as a two-way communication system. In such a system the information flow follows a number of general principles. One of those principles that is assumed to hold in dialogue as well is the Uniform Information Density hypothesis (UID). The UID hypothesis states that a communication system as a whole has the tendency to distribute data in such a way that the density of information remains constant.

In two-party dialogue both interlocutors are equal parts of the communication system. This means that they are jointly responsible for the level of information density at every moment of the conversation. In order to ensure the validity of the UID hypothesis, the two speakers must therefore have an agreement in an implicit sense about their contribution to the conversation. Xu and Reitter propose that speakers take on certain roles during a conversation: One leads the conversation by steering the ongoing topic, while the other follows along. These roles can switch during a conversation and rather than steering turn-taking behavior, they describe a higher-level segmentation of a conversation into topics.

In this research we investigate whether we can detect the boundaries of these conversation segments, formally referred to as topic shifts, based on the speaker’s contribution to the conversation alone. To this end we extract a number of simple syntactical features that have been shown to correlate well with the amount of information transmitted and build a simple prediction model based on these features.

While we had to conclude that this simple approach does not yield a model expressive enough to correctly predict topic shifts produced by more involved methods, we beleive that it nonetheless produces coherent and intuitively sound topic segments even for noisy dialogue transcripts. As a next step, we will investigate different methods to validate this claim.

You can view our preliminary results here.

Ongoing

Text-Style Transfer for Non-parallel Corpora

Supervision: Mostafa Dehghani

In this ongoing project we aim to separate the content and style of an utterance in order to allow for text-style transfer between different domains. While this is shown to work well for visual input, text poses a more difficult problem due to long term dependencies and less clear-cut evaluation.

Ongoing

Automatic Timeline Summarization from Non-Curated News Streams

Supervision: Nikos Voskarides

The continuously increasing amount of online news articles requires new ways of filtering relevant information into a human-digestible form. Recently, research has focused on providing such selections by generating timelines for known entities through extending and extracting information from Knowledge Graphs. Contrasting this approach, we propose a new method to generate an entity timeline based directly on a non-curated, unstructured set of news items so as to allow this approach to be extended to long-tail entities.

In this research, Wikipedia pages of entities are seen as a gold-label timeline consisting of information cited from news-worthy articles, while other news articles about those entities that are not cited are treated as negative examples. To learn what makes an article news-worthy, we take a supervised approach based on a set of 28 handcrafted features.

One of our main contributions is a novel, larger dataset for this task, covering 379 unique entities and containing 13146 news articles with an equal distribution of positive and negative examples per entity. Using this dataset we obtain a basic classification accuracy of 68.9% for deciding whether an unseen news article contains relevant information about a given entity. As a baseline method of evaluation, the top article predictions per entity are then summarized and concatenated to generate a dummy Wikipedia entries which we compare to the original ones. As no standardized, gold-label evaluation methods were developed yet, we also propose an A/B testing method for a more qualitative performance estimate.

You can read the project paper here.

2017

Modeling Distributed Cybernetic Management for Resource Based Economies (Bachelor's Thesis)

In the early 1970's, AI once before was THE big thing that would revolutionize the world as they knew it. Many great researchers were optimistic that artificial systems with general intelligence were within grasp - and big plans were made to apply such systems to solve real-world problems. Among the most notorious ones: 1971’s CyberSyn project of British economist Stafford Beer - which came to an abrupt and violent end just two years later.

The context: 1970 Chile elected its first socialist president, Salvador Allende, which in turn appointed Fernando Flores, a young scientist devoted to the study of operations research and scholar of Beer’s work on the subject of management cybernetics to be the General Technical Manager of the new-found state development agency. In that function, Flores invited Beer to design and implement a cybernetic system to automatize the administration of the entire Chilean economy. An ambitious project that after taking first steps was cut short by a military coup in 1973.

In this research we aim to investigate whether the simple cybernetic approach proposed by Stafford and his colleagues could have been sufficient to manage something as complex as the Chilean state economy. We do so by modeling a simplified economic setting governed by CyberSyn’s management principles and analyze the model’s performance under a range of different parameter settings. The results of these initial experiments suggest that the model indeed exhibits emerging self-sustainability and lead to the conclusion that CyberSyn’s approach might have been principally feasible.

You can find my Bachelor's Thesis here.

2015

Philosophical Essay about Dennett’s Answer to the Objection of Original Intentionality in Artifacts

Many critics of AI argue that intentionality in computers - or any other artifact for that matter - can never be more than derivative. With the words of John Haugeland, their “tokens only have meaning because we give it to them” and consequently, “they only mean what we say that they do”. Contemporary philosopher Daniel Dennett however claims that “there is no principled (theoretically motivated) way to distinguish ‘original’ intentionality from ‘derived’ intentionality.” On the basis of this idea, he developed a three-stage model to explain the assignment of intentionality and refute the objection of derived intentionality in artifacts.

In this essay, we analyze Dennett’s model and answer the question How does Dennett’s elaborated model of the intentional stance answer Haugeland’s objection that intentionality in artifacts cannot be original?

You can read the essay here.

2015

Experience

Student Assistant for the BSc Artificial Intelligence

University of Amsterdam (UvA), The Netherlands

Courses: Computersystemen, Computational Logic, Brein & Cognitie and Natuurlijke Taalmodellen en Interfaces

September 2016 - June 2017

Volunteering as Assistant in a Community Center

House of Light, Shefa-‘Amr, Israel

Main tasks: Organizing and supporting regular youth and childrens' meetings, maitaining public relations, improving communications with supporters and helping out the center's founders and members in a wide range of tasks.

Internationaler Jugendfreiwilligendienst (IJFD) with CFI Freiwilligendienste

October 2016 - June 2017

Student Assistant for the BSc Artificial Intelligence

University of Amsterdam (UvA), The Netherlands

Courses: Brein & Cognitie and Natuurlijke Taalmodellen en Interfaces

January 2015 - June 2015

Volunteering as Highschool Assistant Teacher

Jabez Christian School, Dasmariñas, Philippines

Main tasks: Teaching classes in Informatics and Politics (seniors), assisting the Kindergarten supervisors and organizing activities for children in the affiliated orphanage.

Internationaler Jugendfreiwilligendienst (IJFD) with Co-Workers International

September 2011 - August 2012

Internship at Globals ITeS Pvt. Ltd.

Bangalore, India

Main task: Developing whitepapers concerning the topic 'IT-Offshoring in India'.

October 2009

Skills

Programming Languages & Tools
  • Python
  • Java
  • C++
  • MATLAB
  • LaTex
  • HTML/PHP
  • Photoshop
Frameworks
  • Tensorflow and PyTorch
  • Facebook's ParlAI framework
Languages
  • Native: German, Dutch
  • Perfect: English
  • Basics: Spanish, Arabic

Awards & Scholarships

Scholarship for High-Achieving Students

Awarded by Evangelische Studienstiftung Villigst e.V., Germany
February 2013 - July 2018

Best Human-Machine Interaction

with Uva@Work at RoCKIn Camp 2014, Rome, Italy

Robotics Workshop for KUKA youBot standard platform.

2014

Jugendsoftware-Preis

International Competition hosted by Klaus Tschira Stiftung, Germany

In a team developed, programmed and presented an interactive learning software.

2009

Personal

Every moment not filled with studies is definitively filled with music - either digital, analog, live or self-made. I play the guitar far longer now than my skill level might indicate - though somebody once might have said something about being me my own worst critic... whatever. My little home studio is growing slowly but constantly (mostly limited by my 20 sqm freight container appartment) and you might find some of my own music here once I decide it's ready for the great big world.