Themen für studentische Arbeiten

Sachgebiet: Speech Recognition

Deep Neural Networks for Speech Recognition

Thema
Deep Neural Networks for Speech Recognition
Typ Forschungspraxis, IDP, Master
Betreuer Ludwig Kürzinger, Dipl.-Ing. (Univ.)
Tel.: +49 (0)89 289-28562
E-Mail: ludwig.kuerzinger@tum.de
Sachgebiet Speech Recognition
Beschreibung Motivation:
Speech Recognition enables a machine to understand human voice and convert it to text. Conventional speech recognition systems are based on a combination of neural networks and hidden markov models. With the advent of deep learning and increasing computational power, deep neural networks are able to achieve the performance of the traditional systems, but do not require complex feature crafting at the same time.
Your work will be about key concepts of deep neural nets which are not yet fully understood. For example attention [1], inspired by the human ability to concentrate on important information, is a simple but powerful technique that can directly transform any audio signal directly into a sequence of characters.

Task Description:
The main task will be about applying or examining neural networks for speech recognition. The topic can be conducted in English or German. For more information about the topic, please contact the supervisor.

References:
[1] Vaswani, Ashish, et al. Attention is all you need., 2017.
[2] Graves, Alex, et al. Connectionist temporal classification, 2006
Voraussetzung - Experience with Python and/or C++
- Experience with machine learning
- Independent work style
- Motivation to learn new concepts
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Sachgebiet: Computer Vision

Distracted Driver Dataset

Thema
Distracted Driver Dataset
Typ Master
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kopuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation: According to the last National Highway Traffic Safety Administration (NHTSA) report, one in ten fatal crashes and two in ten injury crashes were reported as distracted driver crashes in the United State in 2014. Therefore detecting the drivers distraction state is utmost important to reduce driver-related accidents. For this task, properly annotated dataset for drivers actions observation is necessary. With such a dataset, state-of-the art Deep Learning Architectures can be used to recognize the distraction state of the drivers.

Task: The main task is to collect a “Distracted Driver Dataset”, and use a light-weight Convolutional Neural Networks (CNN) architecture in order to detect driver’s distractive actions. The dataset should contain the following annotations:
1. Predefined distractive actions that the drivers do
2. Drivers hand states (whether they are on the wheel or not)

During the thesis, the following steps will be followed in general:
1. State-of-the-art research
2. Dataset collection and preparation (i.e. labeling and formating)
3. Light-weight CNN Architecture design
4. Evaluation of the CNN Architecture on the prepared dataset
5. Demonstration of the working system

References:
[1] Baheti, B., Gajre, S., & Talbar, S. (2018). Detection of Distracted Driver using Convolutional Neural Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1032-1038).
[2] Hssayeni, M. D., Saxena, S., Ptucha, R., & Savakis, A. (2017). Distracted driver detection: Deep learning vs handcrafted features. Electronic Imaging, 2017(10), 20-26.
[3] G. Borghi, E. Frigieri, R. Vezzani and R. Cucchiara, "Hands on the wheel: A Dataset for Driver Hand Detection and Tracking," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi'an, 2018, pp. 564-570.
Voraussetzung 1. Excellent coding skills, preferable in Python
2. Experience in deep learning frameworks, preferably in Torch/PyTorch
3. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Spatio-temporal Action Localisation using 3D CNNs

Thema
Spatio-temporal Action Localisation using 3D CNNs
Typ Master
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kopuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation: Current state-of-the-art approaches usually work offline, and are too slow to be useful in real-world settings. Moreover, there are too many building blocks constituting the overall architecture, which makes it nearly impossible to understand the complete architecture. For example, usually architectures first have a Region Proposal Network (RPN), and a classifier for the proposed regions and finally a non-maximum suppression at the to get rid of the redundant detections. Simpler, light-weight architectures are need for real-time capability, similar to YOLO algorithm used for object detection.

Task: The main task is to create a YOLO-like architecture for spatio-temporal action localisation. Instead of 2D-CNN architecture, a 3D-CNN architecture will be used such that we can feed a video clip instead of image frame. Overall, the following steps will be followed during the thesis:
1. State-of-the-art research
2. 3D CNN Architecture design and pretraining it with a large dataset
3. Creating a YOLO-like spatio-temporal action localisation architecture using the pretrained 3D-CNN
4. Evaluation of the architecture on the various dataset
5. Demonstration of the working system

References:
[1] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, realtime object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[2] Redmon, J., & Farhadi, A. (2017). YOLO9000: better, faster, stronger. arXiv preprint.
[3] Hara, K., Kataoka, H., & Satoh, Y. (2018, June). Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA (pp. 18-22).
Voraussetzung 1. Excellent coding skills, preferable in Python,
2. Experience in deep learning frameworks, preferably in Torch/PyTorch.
3. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to "<Type of application> application for topic 'XYZ'", ex. "Master’s thesis application for topic 'XYZ'", while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Convolutional Neural Network for Human Re-Identification

Thema
Convolutional Neural Network for Human Re-Identification
Typ Master, Forschungspraxis, IDP
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung Human re-identification aims to re-identify persons across multiple viewpoints or along a crowded video. This becomes more challenging in case of having occlusion, background clutter, pose, and illumination variations. Convolutional neural networks (CNNs) have shown their promising ability in addressing many problems in computer vison such as object detection, segmentation, and recognition.
In this work, we aim to develop a deep convolutional neural network to identify the pedestrians in video(s) at different time steps or from different viewpoints. We intend to adopt semantic body part information [1] in order to have more robust and discriminative representations of persons. Below is a schematic of such approach where a CNN is to predict the similarity score of each pair of person detections along a video sequence.



Reference:
[1] github.com/MVIG-SJTU/WSHP
Voraussetzung The student is expected to have sound background in deep learning and experience in Python and Tensorflow programming.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Object Detection and Segmentation in Video Using Deep Learning

Thema
Object Detection and Segmentation in Video Using Deep Learning
Typ Master, Forschungspraxis, IDP
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung Deep learning plays a key role now in many computer vision problems including object detection and segmentation. For instance, Mask-RCNN [1] is a deep learning based approach that provides both detection box as well as segmentation mask. Below, the output of this method applied to an image is depicted, where people are detected and segmented.

In our recent work, we are interested in extending this work for detection and segmentation of people in video sequences. To this end, we are looking for a motivated student who has experience and interest in deep learning to work on this topic.

References:
[1] arxiv.org/abs/1703.06870
[2] github.com/matterport/Mask_RCNN
Voraussetzung It is expected that the candidate has a solid knowledge in Python and Tensorflow Programming.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Person Identification Using Deep Learning

Thema
Person Identification Using Deep Learning
Typ Master, Forschungspraxis, IDP
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung As one of the biometric features for human identification, Gait (the way of walking) has drawn attention in recent years, since it is able to recognize people from a large distance in spite of other biometric features such as face, fingerprint. In gait recognition, a sequence of images showing a person walking are analyzed as input data [1].

The performance of gait recognition can be adversely affected by many sources of variations such as viewing angle. In real scenarios, people might walk in different directions toward the camera, which makes the gait recognition more challenging. Therefore, learning view-invariant gait representation is highly desirable. The gait images captured from different view angles can be transformed into their corresponding side view images, which contain more dynamic information.

Recently, Generative Adversarial Networks (GAN) [2] and its variants [3] have been successfully applied for video and image generation. In this work, we aim to deploy such neural network for our desired human identification based on Gait cues observed from multiple viewing angels [4].

References:
[1] dl.acm.org/citation.cfm
[2] arxiv.org/abs/1406.2661
[3] arxiv.org/abs/1611.07004
[4] github.com/phillipi/pix2pix
Voraussetzung Preliminary knowledge in Machine learning and deep learning as well as good programming skill in Python and Tensorflow are highly required.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Real-time Detection and classification of Dynamic Hand Gestures

Thema
Real-time Detection and classification of Dynamic Hand Gestures
Typ Forschungspraxis, Masterarbeit
Betreuer Okan Köpüklü, M.Sc.
Tel.: +49 (0)89 289-28554
E-Mail: okan.kupuklu@tum.de
Sachgebiet Computer Vision
Beschreibung Motivation : Detection and classification of dynamic hand gestures is a challenging task since there is no indication when an action starts in a video stream. However, most of the deep learning architectures which are working offline can also function online with proper adjustments. The topic of this thesis is convert an offline-working architecture to an online-working one.
Task : The main task is to bring an already working deep architecture, which can be seen below, to online functionality. Details of the architecture can be found in [1].
As a further reading, [2] also provides a detailed online detection architecture.


References :
[1] O. Köpüklü, N. Köse, and G. Rigoll. Motion fused frames: Data level fusion strategy for hand gesture recognition. arXiv preprint, arXiv:1804.07187, 2018.
[2] P. Molchanov, X. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz. Online detection and classification of dynamic hand gestures with recurrent 3d convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4207–4215, 2016.
Voraussetzung 1. Excellent coding skills in Python,
2. Experience in deep learning frameworks, preferably in Torch/PyTorch.
3. Motivation to work on deep learning.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

Joint Segmentation and Tracking of Targets in Video Using Deep Learning

Thema
Joint Segmentation and Tracking of Targets in Video Using Deep Learning
Typ Master, Forschungspraxis
Betreuer Maryam Babaee, M.Sc.
Tel.: +49 (0)89 289-28543
E-Mail: maryam.babaee@tum.de
Sachgebiet Computer Vision
Beschreibung In some video surveillance applications such as activity recognition, it is required to segment objects in video as well as their tracking. Both segmentation and tracking of multi targets in video are challenging problems in computer vision. In joint segmentation and tracking approaches, much detailed information in level of pixel or super pixel is used compared to detection boxes. To track people in a video, the mapping between observations in consequent frames can be formulated as a probabilistic graphical model such as CRF (Conditional Random Field). CRF is a powerful framework in solving discrete optimization problems like tracking as well as segmentation.
Based on a research work on the semantic image segmentation [1], a CRF model can be casted to a Recurrent Neural Network (RNN). The goal is to extend this deep learning technique for joint segmentation and multi people tracking problem. To do this, these two problems would be first formulated as a unified CRF model and then we develop a deep RNN that could mimic the proposed CRF. Below you can see three frames of a video sequence captured at different times as well as their corresponding segmentation.



Ref:
[1] www.robots.ox.ac.uk/~szheng/crfasrnndemo
Voraussetzung Basic knowledge in probabilistic graphical model and neural network as well as solid programming skill are required. In case you have any question, write me an email.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.

CNN Application to Video Saliency

Thema
CNN Application to Video Saliency
Typ Master, Forschungspraxis, Bachelor, Ing.prax.
Betreuer Mikhail Startsev
Tel.: +49 (0)89 289-28550
E-Mail: mikhail.startsev@tum.de
Sachgebiet Computer Vision
Beschreibung One of the important questions in computer vision is how you determine what information in a scene (represented by an image or a video) is relevant. So-called “saliency models” [1] have been used to predict informativeness in images. However for videos the ways of incorporating the temporal component of the series of frames into an attention prediction model range from being extremely computationally intensive (ex. deep neural networks using 3D convolution operators) to the ones using hand-crafted approaches (ex. the use of optical flow or using two subsequent frames as input).

In order to avoid or reduce the “hand-engineered” aspect of the features in use, different modifications of traditional 2D CNNs can be employed. The deep learning methods have already proven their worth in the image saliency task [2] and some results related to videos are starting to appear as well. In this project the candidate will work with various CNN models that work with video data in order to compare their performance. Depending on the progress, learning several models from scratch on pre-recorded data can be beneficial.

[1] en.wikipedia.org/wiki/Salience_(neuroscience)
[2] saliency.mit.edu/results_mit300.html
Voraussetzung Understanding of machine learning concepts and solid programming skills are desirable.
Bewerbung If you are interested in this topic, we welcome the applications via the email address above. Please set the email subject to “<Type of application> application for topic 'XYZ'”, ex. “Master’s thesis application for topic 'XYZ'”, while clearly specifying why are you interested in the topic in the text of the message. Also make sure to attach your most recent CV (if you have one) and grade report.