Data mining / big data

Profile description and modules

Description

Data mining, also known as knowledge discovery in databases (KDD), is an automated process to discover new and interesting information in large quantities of data. 

Modules

Artificial intelligence methods; machine learning methods; visual data analysis

Focus areas

Data mining/big data imparts the principles of data management and analysis and advanced methods of information preparation and visualisation as well as the required basic principles of core computer science. In the seminars and internships that accompany the study programme, students will have the opportunity to gain knowledge of these methods and to apply them in practical contexts.

The area of data mining is represented by two internationally-known researchers (Professor Keim, Professor Berthold). Their many contacts to countries beyond Europe frequently provide students with the opportunity to spend part of their studies in the US, for example. 

Structure

Sample curriculum

A possible sample curriculum focusing on data mining could be as follows (lectures and classifications can change over time):

1st semester

  • Data mining 1
  • Multimedia database systems
  • Digital signal processing
  • Introduction to economics (different department)

2nd semester

  • Data mining 2
  • Information visualisation 1
  • Anorganic chemistry and analytical chemistry 1 (different department)
  • Business intelligence: from reporting to analytics

3rd semester

  • Algorithms for the analysis of large volumes of data
  • Drawing of graphs
  • Stochastics (different department)
  • Text mining
  • Master's project: Machine learning - implementing a hierarchical self-organising map

4th semester

  • Master's thesis in the field of data mining, machine learning, artificial intelligence, information visualisation, information retrieval, e.g. Visual Clustering of Finance Arrays.

Research groups involved

Professor Michael Berthold:  Bioinformatics and Information Mining 
Professor Ulrik Brandes:  Algorithmics
Professor Daniel Keim:  Data Analysis and Visualization 
Professor Marc Scholl:  Database & Information Systems (DBIS)

Area of application

As the quantity and complexity of stored data from science and industry continues to increase, the need for intelligent machine and expert-supported analysis methods of this data also increases. Due to the high demand for data mining, it has become an interface for a variety of areas of research, such as machine learning and information visualisation, artificial intelligence and human computer interaction. Naturally, the basic principles from the standard areas of computer science still apply, for instance in regard to databases, algorithms and software engineering.

Laboratories and features

The Powerwall

A 5.20 m x 2.15 m Powerwall for the visualisation of huge quantities of data

KNIME

KNIME, pronounced [naim], is a modular data exploration platform that enables data flows, so-called "pipelines", to be visually combined. These are then executed, allowing the data to be "pumped through", which in turn allows for the inspection of the results in interactive views of data and models.

KNIME was (and continues to be) developed at the Chair for Bioinformatics and Information Mining. Michael Berthold's working group utilises this platform for teaching and research purposes. Almost all the data mining methods developed by the working group have been integrated into KNIME.

KNIME

Contact persons and mentor recommendations:

Professor Michael Berthold, AG Bioinformatics & Information Mining 
Professor Daniel Keim, AG Data Analysis and Visualisation
Junior Professor Bela Gipp, AG Information Science