Jump to contentJump to search

Use Anticlustering to pack as many identical packages as possible

Zoom

Sometimes it is important to sort very different things into groups as similar as possible: Scientifically, this is called "anti-clustering". The psychologist Dr. Martin Papenberg and the computer scientist Prof. Dr. Gunnar Klau from the Heinrich-Heine-University Düsseldorf (HHU) have developed new methods for this purpose and made them available to the research community. They present their results in the journal Psychological Methods.

A cluster describes a group of elements that are similar to each other, but different clusters differ significantly from each other. To find such groups, a so-called cluster analysis is carried out. But it can also be done the other way round, in which case one speaks of "anti-clustering": In this case, one wants to divide a set of different elements in such a way that the resulting groups resemble each other.

What sounds theoretical has very concrete applications. A currently very relevant example: An exam is to be written at the university, but the available space is too small for the number of examinee candidates. So several exams have to be scheduled one after the other. This poses two challenges to the examiners: On the one hand, the different groups of exams must be given different exam questions so that the later examinee does not get tips from the earlier writers. On the other hand, however, the exams must be equally difficult so that all candidates have the same chances. The exam questions in each exam must therefore be similarly weighted.

The psychologist Dr. Martin Papenberg from the Institute of Experimental Psychology and the computer scientist Prof. Dr. Gunnar Klau from the Algorithmic Bioinformatics group at the HHU have jointly developed new algorithms for anti-clustering and successfully tested their performance and accuracy. They then published these algorithms in an R-package that is freely available to researchers; this package is already being used in various research areas. "R" is a programming language that is primarily used for statistical calculations. This language is freely usable and can be extended with additional packages like the module "anticlust" by Papenberg and Klau.

"Our new approach is applicable for many different areas", says Dr. Papenberg: "Especially in my field of expertise, psychology. We often develop tests for several groups that are in contact with each other; these tests should each have the same level of difficulty".

The researchers have also recently started working with the University Hospital Düsseldorf, where anti-clustering is to be used in genome sequencing. Here, samples are to be divided into as heterogeneous groups as possible in order to be able to assign the generated DNA fragments more easily to the original samples.

"We also see a field of application in the field of artificial intelligence research," adds Prof. Klau, "more precisely: in the division of data sets used for machine learning. This is important so that learning outcomes achieved with one part of the data can be transferred to other data sets.

Original publication

Papenberg, M., & Klau, G. W. (2020). Using anticlustering to partition data sets into equivalent parts. Psychological Methods. Advance Online Publication. https://doi.org/10.1037/met0000301.

Autor/in: Arne Claussen
Responsible for the content: