Themen für studentische Arbeiten

Sachgebiet: Speech Recognition

Generative Adversarial Networks for Speech Recognition

Thema
Generative Adversarial Networks for Speech Recognition
Typ Master, Forschungspraxis, IDP
Betreuer Dipl.-Ing. (Univ.) Ludwig Kürzinger
Tel.: +49 (0)89 289-28562
E-Mail: ludwig.kuerzinger@tum.de
Sachgebiet Speech Recognition, GANs
Beschreibung Speech Recognition enables a machine to understand human voice and convert it to text. State-of-the-art speech recognition systems are based on a combination of neural networks and hidden markov models. Modern speech recognition systems have already matured, with many methods to reduce error rates and improve the robustness against noise.

Generative adversarial networks [1] provide a powerful new method to generate fake data or manipulate data. GANs can be used to better train the speech recognition system [2] or even to attack it using adversarial examples [3].

The main task will be about using GANs to improve speech recognition. Neural networks will be trained in the kaldi [4] speech recognition framework and Pytorch or Tensorflow.

[1] 2014, Generative Adversarial Networks, arxiv.org/abs/1406.2661
[2] www.danielpovey.com/files/2017_nips_backstitch.pdf
[3] 2017, Audio Adversarial Examples: Targeted Attacks on Speech-to-Text
[4] github.com/kaldi-asr/kaldi
Voraussetzung - Experience with Python and/or C++
- Interest in machine learning
- Independent work style
- Motivation to learn new concepts
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Speech Recognition Using Machine Learning

Thema
Speech Recognition Using Machine Learning
Typ Master, Forschungspraxis, IDP
Betreuer Dipl.-Ing. (Univ.) Ludwig Kürzinger
Tel.: +49 (0)89 289-28562
E-Mail: ludwig.kuerzinger@tum.de
oder
Tobias Watzel, M.Sc.
Tel.: +49 (0)89 289-28547
E-Mail: tobias.watzel@tum.de
Sachgebiet Speech Recognition, Machine Learning
Beschreibung Speech Recognition enables a machine to understand human voice and convert it to text. State-of-the-art speech recognition systems are based on a combination of neural networks and hidden markov models. Modern speech recognition systems have already matured, with many methods to reduce error rates and improve the robustness against noise.
However, in the light of recent advances in machine learning, modern methods can be applied to speech recognition, for example, (deep) neural network vector-quantizers, generative adversarial
networks (GANs) or attention-based neural networks.

The main task will be about using neural networks for the acoustic model in speech recognition.
Neural networks will be trained in the kaldi [1] speech recognition framework and Pytorch or Tensorflow.

For more information about the topic, please contact the supervisor.

[1] github.com/kaldi-asr/kaldi
Voraussetzung - Experience with Python and/or C++
- Experience in machine learning
- Independent work style
- Motivation to learn new concepts
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Sachgebiet: Virtual Reality

Planning, Conducting and Evaluating User Studies in the MMK Driving Simulator

Thema
Planning, Conducting and Evaluating User Studies in the MMK Driving Simulator
Typ Bachelor, evtl. Master
Betreuer Patrick Lindemann, M.Sc.
Tel.: +49 (0)89 289-28538
E-Mail: patrick.lindemann@tum.de
Sachgebiet Automotive, Virtual Reality, Mixed/Augmented Reality
Beschreibung This topic actually encompasses different specific projects in the mixed-reality driving simulator of our institute. For a Bachelor Thesis, the student is provided with a relatively advanced implementation of a novel driver-car interface concept and a basic definition of a problem/research question that needs to be adressed (e.g. does our novel augmented reality interface support the driver in increasing his driving performance?). The student is expected to create or finalize a concept for a user study that allows answering the research question(s). In order to do his, the student should also research related work to have solid justification for decisions in the study design. Some implementations in the code may be necessary to support the conducting of the study for the examiner and the participants. The student will then conduct the user study with a sufficient number of participants and evaluate the collected data (measurements in the simulator and subjective data from questionnaires). For the evaluation, appropriate statistical tests are applied to determine whether the results are significant enough to give a definite answer to the research question(s).

Please make an appointment for a personal discussion to hear more about the specific topics of this kind that are currently open. In this area, there will generally be different topics available depending on the current occupation and progress of previous projects. Independent of the specific topic, you will be working with the Unity 3D engine. Changes or addition to the simulator code are done in the C# programming language.

We recommend to try out and get used to the Unity engine before starting a topic (there are good, quick tutorials available on the web). We also recommend good programming skills with an object-oriented language, ideally C#. For evaluating study results, we recommend basic knowledge about statistical testing. However, this is not a strict requirement as the supervisor will give support on matters regarding the planning, conducting and evaluating of the study.
Voraussetzung - Good programming skills, ideally in C# (or other OOP-languages: C++, Java, etc.)
- Helpful but not required beforehand: experience with the Unity 3D engine
- Helpful but not required beforehand: knowledge of statistical testing
- The thesis can be written in English or German
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Development of an AR-Interface to Support Handover in Semi-Automated Cars

Thema
Development of an AR-Interface to Support Handover in Semi-Automated Cars
Typ Master, evtl. Bachelor
Betreuer Patrick Lindemann, M.Sc.
Tel.: +49 (0)89 289-28538
E-Mail: patrick.lindemann@tum.de
Sachgebiet Automotive, Virtual Reality
Beschreibung A large part of current research & development in the automotive domain is focused on the growing (semi-)automation of driving. The long way to fully driverless cars will probably be bridged with an increasing number of advanced driver assistance systems (ADAS) for specific traffic situations. This, however, introduces new problems: how can a driver be kept attentive/in-the-loop or re-actived quickly if he is needed to intervene and retake control (handover)? How can the handover esp. in critical situations (e.g. cornering) be facilitated?

Such questions can be addressed particularly fast and easily via appropriate driving simulation environments. In this topic, the student is expected to determine scenarios in which a handover request must be executed on short notice or with some additional level of danger. Then, one or multiple of these scenarios are simulated in the MMK driving simulation. A prototype/simulation of an AR-Interface to support these handovers should be implemented. The student is expected to plan, conduct and evaluate a user study to clarify the potential effects of the interface on the driver's performance in the handover. The measured and collected data will be evaluated with appropriate statistical tests. For the practical part, the student will be working with the Unity 3D engine. Changes or addition to the simulator code are done in the C# programming language.

This topic is the continuation of an earlier Bachelor Thesis and the results and implementations of previous work may be used as a foundation or starting point for this thesis.

We recommend to try out and get used to the Unity engine before starting a topic (there are good, quick tutorials available on the web). We also recommend good programming skills with an object-oriented language, ideally C#. For evaluating study results, we recommend basic knowledge about statistical testing. However, this is not a strict requirement as the supervisor will give support on matters regarding the planning, conducting and evaluating of the study.
Voraussetzung - Good programming skills, ideally in C# (or other OOP-languages: C++, Java, etc.)
- Helpful but not required beforehand: experience with the Unity 3D engine
- Helpful but not required beforehand: knowledge of statistical testing
- The thesis can be written in English or German
Bewerbung If you are interested in a topic in this area, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and ECTS grade report.

Sachgebiet: Computer Vision

Distracted Driver Dataset

Thema
Distracted Driver Dataset
Typ Master
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kopuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation: According to the last National Highway Traffic Safety Administration (NHTSA) report, one in ten fatal crashes and two in ten injury crashes were reported as distracted driver crashes in the United State in 2014. Therefore detecting the drivers distraction state is utmost important to reduce driver-related accidents. For this task, properly annotated dataset for drivers actions observation is necessary. With such a dataset, state-of-the art Deep Learning Architectures can be used to recognize the distraction state of the drivers.

Task: The main task is to collect a “Distracted Driver Dataset”, and use a light-weight Convolutional Neural Networks (CNN) architecture in order to detect driver’s distractive actions. The dataset should contain the following annotations:
1. Predefined distractive actions that the drivers do
2. Drivers hand states (whether they are on the wheel or not)

During the thesis, the following steps will be followed in general:
1. State-of-the-art research
2. Dataset collection and preparation (i.e. labeling and formating)
3. Light-weight CNN Architecture design
4. Evaluation of the CNN Architecture on the prepared dataset
5. Demonstration of the working system

References:
[1] Baheti, B., Gajre, S., & Talbar, S. (2018). Detection of Distracted Driver using Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1032-1038).
[2] Hssayeni, M. D., Saxena, S., Ptucha, R., & Savakis, A. (2017). Distracted driver detection: Deep learning vs handcrafted features. Electronic Imaging, 2017(10), 20-26.
[3] G. Borghi, E. Frigieri, R. Vezzani and R. Cucchiara, "Hands on the wheel: A Dataset for Driver Hand Detection and Tracking," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi'an, 2018, pp. 564-570.
Voraussetzung 1. Excellent coding skills, preferable in Python
2. Experience in deep learning frameworks, preferably in Torch/PyTorch
3. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Spatio-temporal Action Localisation using 3D CNNs

Thema
Spatio-temporal Action Localisation using 3D CNNs
Typ Master
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kopuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation: Current state-of-the-art approaches usually work offline, and are too slow to be useful in real-world settings. Moreover, there are too many building blocks constituting the overall architecture, which makes it nearly impossible to understand the complete architecture. For example, usually architectures first have a Region Proposal Network (RPN), and a classifier for the proposed regions and finally a non-maximum suppression at the to get rid of the redundant detections. Simpler, light-weight architectures are need for real-time capability, similar to YOLO algorithm used for object detection.

Task: The main task is to create a YOLO-like architecture for spatio-temporal action localisation. Instead of 2D-CNN architecture, a 3D-CNN architecture will be used such that we can feed a video clip instead of image frame. Overall, the following steps will be followed during the thesis:
1. State-of-the-art research
2. 3D CNN Architecture design and pretraining it with a large dataset
3. Creating a YOLO-like spatio-temporal action localisation architecture using the pretrained 3D-CNN
4. Evaluation of the architecture on the various dataset
5. Demonstration of the working system

References:
[1] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, realtime object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[2] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. arXiv preprint.
[3] Hara, K., Kataoka, H., & Satoh, Y. (2018, June). Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA (pp. 18-22).
Voraussetzung 1. Excellent coding skills, preferable in Python,
2. Experience in deep learning frameworks, preferably in Torch/PyTorch.
3. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Convolutional Neural Network for Human Re-Identification

Thema
Convolutional Neural Network for Human Re-Identification
Typ Master, Forschungspraxis, IDP
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung Human re-identification aims to re-identify persons across multiple viewpoints or along a crowded video. This becomes more challenging in case of having occlusion, background clutter, pose, and illumination variations. Convolutional neural networks (CNNs) have shown their promising ability in addressing many problems in computer vison such as object detection, segmentation, and recognition.
In this work, we aim to develop a deep convolutional neural network to identify the pedestrians in video(s) at different time steps or from different viewpoints. We intend to adopt semantic body part information [1] in order to have more robust and discriminative representations of persons. Below is a schematic of such approach where a CNN is to predict the similarity score of each pair of person detections along a video sequence.



Reference:
[1] github.com/MVIG-SJTU/WSHP
Voraussetzung The student is expected to have sound background in deep learning and experience in Python and Tensorflow programming.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Object Detection and Segmentation in Video Using Deep Learning

Thema
Object Detection and Segmentation in Video Using Deep Learning
Typ Master, Forschungspraxis, IDP
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung Deep learning plays a key role now in many computer vision problems including object detection and segmentation. For instance, Mask-RCNN [1] is a deep learning based approach that provides both detection box as well as segmentation mask. Below, the output of this method applied to an image is depicted, where people are detected and segmented.

In our recent work, we are interested in extending this work for detection and segmentation of people in video sequences. To this end, we are looking for a motivated student who has experience and interest in deep learning to work on this topic.

References:
[1] arxiv.org/abs/1703.06870
[2] github.com/matterport/Mask_RCNN
Voraussetzung It is expected that the candidate has a solid knowledge in Python and Tensorflow Programming.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Person Identification Using Deep Learning

Thema
Person Identification Using Deep Learning
Typ Master, Forschungspraxis, IDP
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung As one of the biometric features for human identification, Gait (the way of walking) has drawn attention in recent years, since it is able to recognize people from a large distance in spite of other biometric features such as face, fingerprint. In gait recognition, a sequence of images showing a person walking are analyzed as input data [1].

The performance of gait recognition can be adversely affected by many sources of variations such as viewing angle. In real scenarios, people might walk in different directions toward the camera, which makes the gait recognition more challenging. Therefore, learning view-invariant gait representation is highly desirable. The gait images captured from different view angles can be transformed into their corresponding side view images, which contain more dynamic information.

Recently, Generative Adversarial Networks (GAN) [2] and its variants [3] have been successfully applied for video and image generation. In this work, we aim to deploy such neural network for our desired human identification based on Gait cues observed from multiple viewing angels [4].

References:
[1] dl.acm.org/citation.cfm
[2] arxiv.org/abs/1406.2661
[3] arxiv.org/abs/1611.07004
[4] github.com/phillipi/pix2pix
Voraussetzung Preliminary knowledge in Machine learning and deep learning as well as good programming skill in Python and Tensorflow are highly required.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Real-time Detection and classification of Dynamic Hand Gestures

Thema
Real-time Detection and classification of Dynamic Hand Gestures
Typ Forschungspraxis, Masterarbeit
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kupuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation : Detection and classification of dynamic hand gestures is a challenging task since there is no indication when an action starts in a video stream. However, most of the deep learning architectures which are working offline can also function online with proper adjustments. The topic of this thesis is convert an offline-working architecture to an online-working one.
Task : The main task is to bring an already working deep architecture, which can be seen below, to online functionality. Details of the architecture can be found in [1].
As a further reading, [2] also provides a detailed online detection architecture.


References :
[1] O. Köpüklü, N. Köse, and G. Rigoll. Motion fused frames: Data level fusion strategy for hand gesture recognition. arXiv preprint, arXiv:1804.07187, 2018.
[2] P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4207–4215, 2016.
Voraussetzung 1. Excellent coding skills in Python,
2. Experience in deep learning frameworks, preferably in Torch/PyTorch.
3. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Joint Segmentation and Tracking of Targets in Video Using Deep Learning

Thema
Joint Segmentation and Tracking of Targets in Video Using Deep Learning
Typ Master, Forschungspraxis
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung In some video surveillance applications such as activity recognition, it is required to segment objects in video as well as their tracking. Both segmentation and tracking of multi targets in video are challenging problems in computer vision. In joint segmentation and tracking approaches, much detailed information in level of pixel or super pixel is used compared to detection boxes. To track people in a video, the mapping between observations in consequent frames can be formulated as a probabilistic graphical model such as CRF (Conditional Random Field). CRF is a powerful framework in solving discrete optimization problems like tracking as well as segmentation.
Based on a research work on the semantic image segmentation [1], a CRF model can be casted to a Recurrent Neural Network (RNN). The goal is to extend this deep learning technique for joint segmentation and multi people tracking problem. To do this, these two problems would be first formulated as a unified CRF model and then we develop a deep RNN that could mimic the proposed CRF. Below you can see three frames of a video sequence captured at different times as well as their corresponding segmentation.



Ref:
[1] www.robots.ox.ac.uk/~szheng/crfasrnndemo
Voraussetzung Basic knowledge in probabilistic graphical model and neural network as well as solid programming skill are required. In case you have any question, write me an email.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

CNN Application to Video Saliency

Thema
CNN Application to Video Saliency
Typ Master, Forschungspraxis, Bachelor, Ing.prax.
Betreuer Mikhail Startsev
Tel.: +49 (0)89 289-28550
E-Mail: mikhail.startsev@tum.de
Sachgebiet Computer Vision
Beschreibung One of the important questions in computer vision is how you determine what information in a scene (represented by an image or a video) is relevant. So-called “saliency models” [1] have been used to predict informativeness in images. However for videos the ways of incorporating the temporal component of the series of frames into an attention prediction model range from being extremely computationally intensive (ex. deep neural networks using 3D convolution operators) to the ones using hand-crafted approaches (ex. the use of optical flow or using two subsequent frames as input).

In order to avoid or reduce the “hand-engineered” aspect of the features in use, different modifications of traditional 2D CNNs can be employed. The deep learning methods have already proven their worth in the image saliency task [2] and some results related to videos are starting to appear as well. In this project the candidate will work with various CNN models that work with video data in order to compare their performance. Depending on the progress, learning several models from scratch on pre-recorded data can be beneficial.

[1] en.wikipedia.org/wiki/Salience_(neuroscience)
[2] saliency.mit.edu/results_mit300.html
Voraussetzung Understanding of machine learning concepts and solid programming skills are desirable.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Sachgebiet: Human and Computer Vision

Anticipation Mechanism in Human Attention

Thema
Anticipation Mechanism in Human Attention
Typ Master, Forschungspraxis, Bachelor, Ing.prax.
Betreuer Mikhail Startsev
Tel.: +49 (0)89 289-28550
E-Mail: mikhail.startsev@tum.de
Sachgebiet Human and Computer Vision
Beschreibung The human visual system is a very complicated mechanism, but it is, naturally, imperfect: For instance, we cannot immediately switch our attention (i.e. gaze direction, in this context) to the most important event or object in our field of view as soon as it emerges. For unpredictable events, it typically takes the eyes about 150-250 ms to react and move to the new location.

In natural scenes (and in real life), however, our visual system is much more proactive, i.e. we anticipate certain events before they happen, which gives our eyes enough time to catch the most “salient” part of the action just in time. Similar observations have been made in [1], for example (also, see the figure below: the maximal consistency of saccade landing points with the predicted saliency map is achieved with a slightly negative lag on average).

The project will quantitatively measure this anticipation effect using more recently developed tools and saliency prediction techniques. The candidate will also perform data analysis to determine how different eye movement types effect the anticipatory behaviour.


[1] Vig et al., Eye Movements Show Optimal Average Anticipation with Natural Dynamic Scenes, 2010, www.gazecom.eu/FILES/ViDoMaBa10b.pdf

Voraussetzung Solid programming skills are desirable, experience with computer vision is a plus.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Temporal Saliency Propagation for Efficient Saliency Computation

Thema
Temporal Saliency Propagation for Efficient Saliency Computation
Typ Master, Forschungspraxis, Bachelor, Ing.prax.
Betreuer Mikhail Startsev
Tel.: +49 (0)89 289-28550
E-Mail: mikhail.startsev@tum.de
Sachgebiet Human and Computer Vision
Beschreibung Teaching a computer, which parts of a scene (static or changing through time) are important (ex. relevant or simply interesting), has been one of the important problems in computer vision, since this prediction can be used as a "preprocessing" step before virtually any learning procedure. For some of the applications, especially in video processing, saliency models [1] have been criticized for having high computational cost. Applying a full prediction model for each frame can be simply too slow.

When considering prediction or detection of salient regions in a continuous video, it makes sense that the shift of the average attention distribution map from one frame to another is minimal and could (in a perfect case) be mostly explained by objects simply being moved around the scene. This movement can to some extent be described as optic flow or any other form of motion description. Video codecs, for instance, implement a similar technique to efficiently store the frames [2].

In this case therefore, we could only use the saliency predictor on selected frames, morphing the resulting attention map for a few subsequent frames. The candidate would explore the extent to which such temporal propagation is useful and at which point the performance-quality trade-off is still sensible. Various motion description approaches can also be evaluated for the purposes of the project.

Image source: nickyguides.digital-digest.com/keyframes.htm

[1] en.wikipedia.org/wiki/Salience_(neuroscience)
[2] en.wikipedia.org/wiki/Key_frame
Voraussetzung Solid programming skills and/or an algorithmic background are desired. Familiarity with video codecs is a plus.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Distracting much? Curvature analysis of eye movements in dynamic natural scenes

Thema
Distracting much? Curvature analysis of eye movements in dynamic natural scenes
Typ Diplom, Master, Bachelor, Studienarbeit, IDP
Betreuer Dr.-Ing. Michael Dorr
Tel.: +49 (0)89 289-28563
E-Mail: michael.dorr@tum.de
Sachgebiet Human and Computer Vision
Beschreibung Humans make several fast eye movements (saccades) per second to serially sample informative regions in visual scenes. Because of the necessary interaction of several ocular muscles, saccade trajectories are typically not straight but slightly curved. Interestingly, experiments with simple stimulus configurations such as dots and squares on otherwise blank displays have shown that the direction and degree of curvature can be influenced by visually salient "distractors". However, little is known about saccadic curvature in dynamic natural scenes, where salient distractors abound. In this project, we want to analyse eye movement data from an existing large data set and relate saccadic curvature to the spatial distribution of salient image features in videos. Depending on the type/duration of thesis, this analysis could range from first-order statistics of the gaze data to complex image and video processing, and designing and running experiments with an eye tracker.
Voraussetzung Matlab or R experience, ideally C++; knowledge of image processing highly useful. Interest in neuroscientific questions.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Why that face? Gaze-based analysis of human observers' reaction to emotional stimuli

Thema
Why that face? Gaze-based analysis of human observers' reaction to emotional stimuli
Typ Diplom, Master, Bachelor, Studienarbeit, IDP
Betreuer Dr.-Ing. Michael Dorr
Tel.: +49 (0)89 289-28563
E-Mail: michael.dorr@tum.de
Sachgebiet Human and Computer Vision
Beschreibung Affective computing is a recent research domain which aims to bridge the gap between human emotions and machines. Although significant progress has been made in this research area, there are still unknown factors in the way humans perceive and react to emotions, especially during the course of face-to-face interactions. The main goal of this thesis will thus consist in analysing how people react to emotional stimuli displayed by humans during dyadic interactions. The stimuli will be extracted from an already existing database collected in a video-conference like situation, and an eye-tracker will be used to record the gaze of participants viewing and reacting to these stimuli. The data will be used to perform a detailed analysis of the effect of emotions on oculomotor behaviour, and to investigate which parts of the face are most looked at for specific emotions, by mapping the gaze of the participant with a 3D model of the face shown in the stimuli (already computed). Automatic prediction of the emotion from participant's gaze will also be investigated, using state-of-the-art machine learning techniques.
Voraussetzung Required: Good programming skills in C++ (at least for compiling and minor editions), knowledge of machine learning, creativity and high motivation

Recommended, but not required: Skills in other programming languages (e.g., shell, bash, matlab, java)
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

On-line pupil-size correction for eye tracking

Thema
On-line pupil-size correction for eye tracking
Typ Bachelor, Studienarbeit, IDP
Betreuer Dr.-Ing. Michael Dorr
Tel.: +49 (0)89 289-28563
E-Mail: michael.dorr@tum.de
Sachgebiet Human and Computer Vision
Beschreibung Video-oculographic eye trackers estimate gaze direction from the relative position of the pupil and one or more corneal reflexes, which are created by infrared light sources pointed at the eye. Tracking these features is relatively easy because the latter are typically the brightest parts and the former is the darkest part of the eye. However, illumination changes also induce changes in pupil size, which may lead to systematic errors in gaze estimation. This is particularly problematic when recording gaze for naturalistic stimuli such as Hollywood movies, where stimulus luminance may change dramatically over time.

In this project, we want to implement and evaluate methods to correct for these errors.
Voraussetzung Because of the strong real-time requirements of binocular eye tracking at 2000 Hz, some experience with C/C++ is required.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.