Oberseminar Mensch-Maschine-Kommunikation

Die Vorträge finden donnerstags 14:00 -15:00 Uhr (45 min Vortrag + Diskussion) in der Bibliothek des Lehrstuhls N0116 statt (Abweichungen sind jeweils einzeln angegeben).

WS 2017/2018

Donnerstag, 30.11.2017, 15:00 Uhr, N0116 (Lehrstuhlbibliothek)

Gaetano Andreisek, Former Doctoral Candidate '13-'17 at Audio Information Processing
Acoustic defect detection in composite wind turbine blades using the tap test

Abstract: Wind turbine blades experience some of the highest stresses among modern civil structures. In order to ensure a lifetime of fifteen to twenty years, regular testing of their structural integrity is required. Advanced non-destructive testing (NDT) methods such as ultrasonic or thermography testing can be used, however, they are difficult to apply in the field. More feasible yet cost-effective methods are sought after. The tap test is readily used in everyday field inspections of wind turbine blades, in which experienced engineers periodically tap on the blade shell and analyze the generated impact sounds. This NDT method combines an unmatched ease-of-operation, it does not require bulky equipment, and it is effective at covering large areas in a short time. However, it is often criticized for its subjective nature and vague defect detection mechanisms.

For this work, extensive tap test measurements were conducted on a set of wind turbine blades with different levels of defectiveness. With this at hand, the presentation will touch the approach to demonstrate the full detection potential of the tap test in terms of defect depth and defect size based on acoustic features. We will cover stages of acoustic feature extraction and reduction leading to an automated classification of the material condition based on well-defined features. As part of the analysis process, a new measure of defectiveness was developed to quantify the severity of critical voids in the glass-fiber composite material of wind turbine blades. Moreover, a comprehensive feature extraction toolbox was implemented in Matlab that facilitates a flexible definition and efficient extraction of several hundred features from acoustic impulse responses. The findings can be used to advance the tap test towards an automated testing procedure featuring the same capabilities as human inspectors while offering the opportunity to be more cost-effective and less subjective.


SS 2017

Donnerstag, 06.07.2017, 15:00 Uhr, N0116 (Lehrstuhlbibliothek)

Mikhail Startsev, wiss. Mitarbeiter der Nachwuchsforschergruppe VESPA
360-aware Saliency Prediction with Ensemble of Deep Networks 

Abstract: We aim at predicting saliency maps for equirectangular images of panoramic 360-image stimulus. To do so, we explore the applicability of regular 2D saliency predictors when combined with various ways of interpreting the input data, which incorporate the additional information provided by the recording scenario, as well as counteract projection distortions.

Donnerstag, 04.05.2017, 14:00 Uhr, N0116 (Lehrstuhlbibliothek)

Simon Schenk, M.Sc., wiss. Mitarbeiter am Lehrstuhl MMK
GazeEverywhere: Enabling Gaze-only User Interaction on an Unmodified Desktop PC in Everyday Scenarios  

Abstract: Eye tracking is becoming more and more affordable, and thus gaze has the potential to become a viable input modality for human-computer interaction. We present the GazeEverywhere solution that can replace the mouse with gaze control by adding a transparent layer on top of the system GUI. It comprises three parts: i) the SPOCK interaction method that is based on smooth pursuit eye movements and does not suffer from the Midas touch problem; ii) an online recalibration algorithm that continuously improves gaze-tracking accuracy using the SPOCK target projections as reference points; and iii) an optional hardware setup utilizing head-up display technology to project superimposed dynamic stimuli onto the PC screen where a software modification of the system is not feasible. In validation experiments, we show that GazeEverywhere's throughput according to ISO 9241-9 was improved over dwell time based interaction methods and nearly reached trackpad level. Online recalibration reduced interaction target ('button') size by about 25%. Finally, a case study showed that users were able to browse the internet and successfully run Wikirace using gaze only, without any plug-ins or other modifications.


WS 2016/2017

Donnerstag, 17.11.2016, 15:00 Uhr, N0116 (Lehrstuhlbibliothek)

Ayako Noguchi, Student of our Guest Professor Shinji Sako
Segmental conditional random fields audio-to-score alignment distinguishing percussion sounds from other instruments

Abstract: Audio-to-score alignment is useful technique because it can be widely applied to many practical applications for musical performance.
However, it is still open problem due to the complexity of audio signal especially in the polyphonic music. Additionally, performing in real-time is also important in practical situation. In this study, we propose a new alignment method based on segmental conditional random fields (SCRFs). The attractive feature of this method is utilizing to distinguish percussion sounds from the other instruments. In general, percussion sounds have a role in managing whole music. Moreover, performers can pick up the percussion sounds from the others by hearing whole sound thanks to their unique features of the sound. In the field of score alignment, hidden Markov models (HMMs) or CRFs was used in previous studies including our previous one. In summary, these methods were formulated as a matching problem of the state sequence of mixed notes with the audio feature sequence. In this study, we extend our previous method by combining an additional state which represents percussion sounds. Furthermore, we introduce the balancing factor to control the importance of classifying feature functions. We confirmed the effectiveness of our method by conducting experiments using RWC music database.

Shohei Awata, Student of our Guest Professor Shinji Sako
Vowel duration dependent hidden Markov model for automatic lyrics recognition

Abstract: Recently, due to the spread of music distribution service, a large amount of music is available on the Internet. Accordingly, it is generally increasing the demand of music information retrieval (MIR). In the field of MIR research, there are several researches to extract meaningful information from music audio signals. However, automatic lyrics recognition is still a challenging problem because the variation of singing voice is much larger than that of speaking voice and a large database of singing voice is not available. In the relevant study, lyrics recognition was performed by extending the framework of speech recognition using hidden Markov model (HMM). However, accuracy rate was not sufficient. To recognize singing voice precisely, one promising approach is utilizing musical features. This study considers the task of recognizing syllable from a cappella singing voice. To respond to the variation of the length of a phoneme, we construct the duration dependent HMM. A large database of singing voice is essential for training the acoustic model. We use synthetic singing voice by HMM based singing voice synthesis system to solve the lack of the database of a cappella singing voice. We confirmed the effectiveness of our method.


SS 2016

Donnerstag, 21.07.2016, 14:00 Uhr, N0116 (Lehrstuhlbibliothek)

Dr. Marco F.H. Schmidt, Leiter der Internationalen Nachwuchsforschergruppe „Developmental Origins of Human Normativity“ an der LMU, München
On the Ontogeny of Normativity  

The "DeNo" research group ("Developmental Origins of Human Normativity"), led by Dr. Marco F. H. Schmidt, investigates the ontogenetic basis of human normativity and cooperation. Their research focuses on the question of how children develop an understanding of norms and rules and which social-cognitive and motivational capabilities contribute to the ontogeny of human normativity.

Donnerstag, 14.07.2016, 14:00 Uhr, N0116 (Lehrstuhlbibliothek)

Guido Maiello, UCL Institute of Ophthalmology, University College London, London, UK
Naturalistic Depth Perception and Binocular Vision  

How is binocular visual information processed, integrated and exploited by our perceptual and
motor systems?

How does the visual system employ binocular input to maintain the accuracy of oculomotor control processes across the lifespan?

How does degraded binocular input affect the perceptual and motor systems?

Most eye movements in the real-world redirect the foveae to objects at a new depth and thus require the co-ordination of saccadic and vergence eye movements. This talk will cover a mix of studies regarding perception, misperception and motor control in binocular vision. Using a gaze contingent display we first examine how the time course of binocular fusion depends on depth cues from blur and stereoscopic disparity in natural images. Our findings demonstrate that disparity and peripheral blur interact to modify eye movement behavior and facilitate binocular fusion. We then examine oculomotor plasticity when error signals are independently manipulated in each eye, which can occur naturally owing to aging, or in oculomotor dysfunctions. We find that both rapid saccades and slow vergence eye movements are continuously recalibrated independently of one another and corrections can occur in opposite directions in each eye. Lastly, since it is well known that the motor systems controlling the eyes and the hands are closely linked when executing tasks in peripersonal space, we examine how eye-hand coordination is affected by binocular and asymmetric monocular simulated visual impairment. We observe a critical impairment level up to which pursuit and vergence eye movements maintain fronto-parallel and sagittal tracking accuracy.
Our results confirm that the motor control signals that guide hand movements are utilized by the visual system to plan pursuit and vergence eye movements in depth. The methods, results, and data I will present have implications for basic understanding of oculomotor control and 3D perception, as well as applied clinical implications.


WS 2015/2016

Donnerstag, 03.03.2016, 14:00 Uhr, N0116 (Lehrstuhlbibliothek)

Aiming Wang, Ph.D., director of Advanced Space System Division, R&D Center, ISSE
Introduction of CAST and ISSE’s Space Activities  

China Academy of Space Technology (CAST), subordinated to China Aerospace Science and Technology Corporation (CASC), is the main development base for space technology and products in China. It is mainly engaged in development and manufacturing of spacecraft, external exchange and cooperation in space technology, satellite applications, etc. Since 1970, CAST has successfully developed and launched over 180 spacecrafts of various kinds, such as Dongfanghong, Shenzhou, Chang'e, Ziyuan and Beidou.  Institute of Spacecraft System Engineering (ISSE) is an institute of CAST. ISSE is the first, so far the biggest, institute engaged in spacecraft system design and spacecraft specialized technology in China. ISSE mainly involved in system design, integrated assembly and the development of structure, thermal control, TT&C, data management and space environment subsystems. ISSE has successfully developed and launched about 100 spacecrafts.

Aiming Wang received the Ph.D. degree in signal processing from Institute of Electronics, Chinese Academy of Sciences in 2002. From 2006 to 2011 he has been a vice-professor, and since 2011 he was a professor with the Institute of Spacecraft System Engineering (ISSE), China Academy of Space Technology (CAST). Currently, he is the director of Advanced Space System Division, R&D Center, ISSE. His research interests include advanced space system design and SAR signal processing.

SS 2015

Dienstag, 30.06.2015, 14:00 Uhr, N0116 (Lehrstuhlbibliothek)

Dipl.-Ing. Andreas Haslbeck, Lehrstuhl für Ergonomie der Technische Universität München
Blickkettenanalyse bei Linienpiloten

Abstract: Beim manuellen Fliegen müssen Piloten einige wichtige Displays regelmäßig überprüfen. Derzeit wird am Lehrstuhl für Ergonomie untersucht, ob Piloten diese Anzeigen in wahlloser Reihenfolge oder anhand bestimmter Muster untersuchen. Ein weiterer Untersuchungsgegenstand ist dabei auch, ob entsprechende Muster an die jeweilige Situation adaptiert werden. Dieser Vortrag stellt die Vorgehensweise und zugehörige Ergebnisse der Untersuchung zur Diskussion. 

Donnerstag, 28.05.2015, 14:30 Uhr, N0116 (Lehrstuhlbibliothek)

Leana Copeland, Research School of Computer Science, Australian National University
Analysis of Eye Movements and Reading Behaviours in eLearning Environments

Abstract: Online learning extends teaching and learning from the classroom to a wide and varied audience that has different needs, backgrounds, and motivations. Particularly in tertiary education, eLearning technologies are becoming increasingly ubiquitous. As a result there is growing importance in designing effective eLearning environments. The focus of this investigation is on the use of eye tracking technology to analyse reading behaviour in eLearning environments. The purpose of which is to examine how eye gaze can be used to increase reading comprehension and reduce distraction during reading within eLearning. To do this three aspects are investigated; the first involved recording of participants eye gaze to analyse the effects of tutorial presentation on eye movements and reading behaviour. Secondly, the use of eye gaze to predict comprehension is explored. Finally, the use of eye tracking to mitigate distraction during reading in eLearning environments is explored.


WS 2014/2015

Freitag, 16.01.2015, 15:00 Uhr, N0116 (Lehrstuhlbibliothek)

D. M. Gavrila, Daimler R&D and Univ. of Amsterdam

Abstract: Human-Aware Intelligent Systems that use sensors to interact naturally with a human-inhabited environment represent the next frontier of information technology. They play an important role in applications such as intelligent vehicles, smart surveillance, social robotics, and entertainment.

The talk covers my research in this domain at Daimler R&D and the Univ. of Amsterdam focusing on visual perception and machine learning. I discuss pattern classification methods for learning visual appearance, probabilistic temporal models for pose estimation and activity recognition, and mixture models for activity discovery.

SS 2014

Mittwoch, 06.08.2014, 14:00 Uhr, N0116 (Lehrstuhlbibliothek)

Dr. Amr Ibrahim El-Desoky Mousa, Member of MISP-Group of MMK
Sub-Word Based Language Modeling of Morphologically Rich Languages for LVCSR

Abstract: Morphologically rich languages are challenging for efficient language modeling in large vocabulary continuous speech recognition. Complex morphology causes data sparsity and high out-of-vocabulary rates leading to poor language model probabilities. Traditional word m-gram models are usually characterized by high perplexities and suffer from the inability to model unseen words. In this work, we investigate alternative language modeling approaches for morphologically rich languages. We use hybrid sub-word based language models comprising multiple types of units such as morphemes, syllables, and joint character and phoneme sequence units. In addition, morphology-based classes derived on sub-word level are incorporated into the modeling process to support sparse m-grams. Models such as stream-based, class-based, and factored language models are tested and compared. The above approaches are combined with state-of-the-art models, like hierarchical Pitman-Yor language models, and feed-forward deep neural network language models. These approaches have been found to cope with data sparsity and improve the performance of Arabic, German, and Polish speech recognition systems. 

Mª Inmaculada Mohíno Herranz, University of Alcalá, Madrid/Spain:
Emotion recognition through analysis of biological signals

Abstract: First, I´ll introduce myself and then I will present my previous and current work on emotion recognition system. I was working on emotion recognition using speech and I proposed a new feature. In addition, as one of the collaborators in ATREC project (described below), I focused on emotion detection using biosignals (ECG, GSR, Respiration) to detect stress level in combatants. In the presentation I will discuss about the recorded dataset, methods used and results that I achieved. Finally, I would like to have your comments for collaboration. ATREC project: The Spanish Ministry of Defense, through its Future Combatant program, has sought to develop technology aids with the aim of extending combatants' operational capabilities. This project combines multidisciplinary disciplines and fields, including wearable instrumentation, textile technology, signal processing, pattern recognition and psychological analysis of the obtained information. Speech.

Felix Weninger
, Member of MISP-Group at MMK:
Supervised Training in Single-Channel Source Separation

Abstract: I will present my work on supervised training of non-negative matrix factorization (NMF) and deep neural network models for single-channel source separation and automatic speech recognition. I will give an accessible introduction to NMF and then present my pioneering work on discriminative NMF and deep neural networks for source separation, in particular long short-term memory deep recurrent neural networks (LSTM-DRNNs). My models obtain world-leading separation results on a very challenging task where speech is mixed with highly non-stationary noise. My talk will appeal to a general audience as well as to those interested in the latest developments in single-channel source separation.

Dienstag, 01.07.2014, 14:00 Uhr, N0116 (Lehrstuhlbibliothek)

Dr.-Ing. Marc Al-Hames, managing director of Cliqz in Munich
Learning the Language of the Internet

The major purpose of this talk is to present an answer to the question: What is really the language of the Internet and how can we learn (or classify) it - with focus on data and models ? Furthermore, it is explained why this is a key project at the company Cliqz and how this should result into promising applications and business for the company in the future. It will be also demonstrated that this turns out to be a real "Big Data" problem, resulting into the deployment of methods from this novel and popular research area. The talk will be presented in English.

Dr.-Ing. Marc Al-Hames enjoys working at the intersection of technology and business: He received a PhD in Electrical Engineering and Information Technology at TU Munich with his thesis about “Graphical Models for Pattern Recognition”. He then worked at McKinsey & Company as engagement manager. In 2011 he joined Hubert Burda Media as Head of Strategy and M&A for the publicly listed Internet holding TOMORROW FOCUS AG. In 2013 he became the managing director of the early stage Internet company Cliqz in Munich.

Donnerstag, 12.06.2014, 17:00 Uhr, N0116 (Lehrstuhlbibliothek)

Aswin Wijetillake, Wissenschaftlicher Mitarbeiter Fachgebiet Audio-Signalverarbeitung, TUM
Perceptual signal segregation in bilateral cochlear implant users

Donnerstag, 15.05.2014, 17:00 Uhr, N0116 (Lehrstuhlbibliothek)

Dr. Eleftheria Georganti, HNO des Universitätsklinikums Zürich
Evaluation schemes and methods towards the improvement of hearing devices


WS 2013/2014

Donnerstag, 12.12.2013, 15:00 Uhr, N0116 (Lehrstuhlbibliothek)

Prof. Haifeng Li, Harbin Institute of Technology (HIT)
Brain Signal Processing for a Brain Controlled News Browser

BCI --- brain–computer interface --- is a direct communication pathway between the brain and an external device. The key point in a BCI is the recognition of the user´s intention through the analysis of his brain signal --- electroencephalogram (EEG). After giving a brief introduction about the EEG processing technology, this talk will analyse some typical BCI mechanisms such as SSVEP, P300 and Eye Blink, and present how they are applied in the browser. The Steady State Visually Evoked Potentials (SSVEP) are signals that are natural responses to visual stimulation at specific frequencies. In the browser, the control buttons were designed to blink at different frequencies, and the recognition of SSVEP performs as the click on the buttons to explore the news list or to return to an upper menu. The P300 wave is an event related potential (ERP) component elicited in the process of decision making. Thus, in our brain-controlled news browser, the P300 is recognized to find out which news attract the user. Moreover, the eye blink motions are also detected and serve as to confirm or abort a choice. Such mechanisms do not request any training of/from the users, thus making the browser easily applicable to any person. Our work re-emphasises the goals of BCIs, which are directed at assisting, augmenting, or repairing human cognitive or sensory-motor functions.

Haifeng Li is professor at the Institute of Intelligent Human-Computer Interaction, School of Computer Science and Technology, Harbin Institute of Technology (HIT). He received his Ph.D. in Electro-Magnetical Measuring Technique & Instrumentation from HIT in 1997 and his Ph.D. in Computer, Communication and Electronic Science from University Paris 6, France in 2002. He started his teaching career in 1994 at HIT, was promoted as a lecturer in 1995, professor in 2003 and doctoral supervisor in 2004. His research fields include the Intelligent Information Processing Technology, Brain Machine Interaction Technology, Artificial Intelligence, and Cognitive Computing Science for application fields such as natural human-machine interaction, digital media processing and complex scientific process modeling. He earned two 1st class research and education awards and three 2nd class awards from the provincial government. He undertakes two projects of National Natural Science Foundation, a project of the National High-Tech Foundation and several projects of the Provincial and Ministry Science Foundation. He published 2 books and over 80 papers in journals and conferences on national and international level.

Donnerstag, 21.11.2013, 15:00 Uhr, N0116 (Lehrstuhlbibliothek)

Marko Takanen, Aalto University
Functional Modeling of the Auditory Pathway for the Assessment of Spatial Sound Reproduction


SS 2013

Freitag, 28.06.2013, 14:00 Uhr, N6507 (Seminarraum AIP)

Dr. Fritz Menzer, EPFL
Binaural Audio Signal Processing Using Interaural Coherence Matching

WS 2012/2013

Donnerstag, 14.02.2013, 14:00 Uhr, N0116 (Lehrstuhlbibliothek)

Dr. Verena Rieser, Lecturer at Heriot-Watt-University, Edinburgh/UK
Natural Language Generation as Planning under Uncertainty for Statistical Interactive Systems

In this talk I present a novel approach to Natural Language Generation (NLG) in statistical Spoken Dialogue Systems, using a data-driven statistical optimisation framework for incremental Information Presentation (IP), where there is a trade-off to be solved between presenting “enough” information to the user while keeping the utterances short and understandable. In a case study on recommending restaurants, we show that an optimised IP strategy outperforms a baseline mimicking human behaviour in terms of total reward gained, in simulation. The policy is then also tested with real users, and improves on a conventional hand-coded IP strategy with up to a 9.7% increase in task success. This methodology provides new insights into the nature of the NLG problem, which has previously been treated as a module following dialogue management with limited access to context features. This type of model is now widely used in research applications, which I will briefly discuss. For example, optimising information presentation in recommender systems, hierarchical NLG, personalisation and efficient incremental search and trading agents.

Verena Rieser is a lecturer in Computer Science at Heriot-Watt University, Edinburgh. Before she has undertaken post-doctoral research at the Schools of Informatics and GeoSciences at the University of Edinburgh. She holds a PhD (summa cum laude) from Saarland University (2008) and an MSc with distinction from the University of Edinburgh (2004). Her PhD also received the Dr. Eduard-Martin award for distinguished doctoral dissertations. Her research is at the intersection of Machine Learning and Natural Language Processing, with applications ranging from Multi-Modal Interaction, Spoken Dialogue System and Computational Sustainability. She is currently serving as secretary for the ACL Special Interest group in Natural Language Generation (SIGGEN) and she has recently published a book on "Reinforcement Learning for Spoken Dialogue Systems" (Springer, 2011). Since starting her research career in 2005, she has co-authored over 40 original research papers, which were cited over 400 times (H-Index: 13 according to Google Scholar). Verena has strong ties to industry, where she is a visiting researcher at Nuance's Research Lab, Sunnyvale in 2013.

Donnerstag, 08.11.2012, 15:00 Uhr im Raum N3815

Nicholas Evans, Assistant Professor at Eurecom
Biometric spoofing: countermeasures for speaker recognition

After a brief introduction to Eurecom and our wider interests in Speech and Audio Processing research, this talk will describe European efforts within the context of the FP7 Tabula Rasa project to develop new countermeasures to protect state-of-the-art biometric authentication systems from spoofing. After introducing the problem of biometric spoofing, I will introduce the Tabula Rasa consortium, the different biometric modalities that we are addressing and our work to develop software-based and hardware-based approaches to liveness detection, challenge-response countermeasures and new recognition methods which are intrinsically robust against attack. The talk will then concentrate on Eurecom’s role in developing speaker recognition spoofing countermeasures and some work in speech quality assessment, higher-level supra-segmental features, spectral texture analysis and fused approaches to spoofing detection. The talk concludes with a brief description of our contribution to the development of multimodal biometric countermeasures, where 2D face and voice modalities are combined.

Nicholas Evans is an Assistant Professor at Eurecom within its Multimedia Signal Processing Research Group and directs research in Speech and Audio Processing. His current interests include speaker diarization, modelling and recognition, multimodal biometrics and speech signal processing. He leads work in speaker recognition in the EU FP7 ICT Tabula Rasa project, has previously participated in the internationally competitive NIST Speaker Recognition Evaluations (SREs) and the NIST Rich Transcription Evaluations (RTEs) and is the lead author of a chapter on speaker recognition spoofing countermeasures for the Handbook of Biometric Anti-Spoofing to be published by Springer next year.