Custom Colloquia
As the feedback concerning last year's custom colloquia was very enthusiastic, we again offer the opportunity to participate in this format. In order to reduce the complexity of the registration process we will send out a separate form for registration for custom colloquia.
All participants, especially PhD students, have the opportunity to propose a colloquium on a topic of their choice during the registration phase. The HeiCAD team will then assemble groups of researchers from related disciplines to form discussion and tutorial groups.
All participants will be invited to participate in one of the colloquia. We will suggest group assignments that bring together researchers from different working groups, career stages and backgrounds, so the colloquia will also be a great opportunity for all participants to network and get to know researchers from across HeiCAD!
Planned Custom Colloquia
Host: Ran Yu
Informational Web search plays a critical role in how people acquire knowledge about the world. While search engines excel at delivering relevant results for well-established topics, they still struggle when users seek information about emerging topics—those that are new, fast-evolving, and often lack established terminology or high-quality, authoritative content. This colloquium explores the limitations of current information retrieval (IR) systems in addressing such challenges and invites discussion on what kinds of topics remain difficult to support in contemporary search technologies.
Drawing on my ongoing project EmergentIR-Information Retrieval and Fusion for Emerging Topics, we investigated how today's IR systems are fundamentally designed around stable, well-represented topics and data sources: historical queries, click-through logs, authoritative link graphs, and structured background knowledge (e.g., Wikipedia, Google Knowledge Graph). These dependencies make them vulnerable in the face of sparse, volatile, and sometimes misleading information, which is typical for emerging topics such as new technologies, societal movements, or unfolding crises.
In our research, we work with multilingual web content, search systems, and user search behavior logs to better understand and model these gaps. Our methodological approach includes topic detection, robust document retrieval and ranking in sparse data contexts, and the generation of summaries aimed at improving user knowledge gain.
The session aims to engage the audience—particularly researchers and practitioners in information retrieval, natural language processing, and human-computer interaction—in identifying which kinds of emerging topics continue to present major obstacles for IR systems. We will discuss technical limitations, societal implications, and opportunities for cross-disciplinary collaboration. Ultimately, I hope to surface new perspectives and research questions that can guide us in developing the next generation of search technologies better equipped for the information challenges of tomorrow.
Host: Stephan Linzbach
This colloquium investigates how masked language models (MLMs) overlook sociolinguistic variation in meaning. While MLMs rely on distributional semantics, their representation space captures only the signifier (the word), not the signified (the cultural or ideological concept behind it). This leads to semantic overlay—contexts with distinct meanings being embedded similarly.
We propose a structuralist, contrastive learning approach using speech-community-labeled data (e.g., Media Bias/Fact Check and FineWeb) to ground semantics in socio-cultural context. Tokens like immigrant vary in meaning across communities; modeling these differences improves semantic coherence and interpretability.
We invite discussion on integrating sociolinguistics into NLP, aiming to develop models with more human-like inductive bias. Target audience: computational linguists, sociolinguists, semanticists.
Host: Susmita Gangopadhyay
Abstract: Telegram is a globally popular instant messaging platform known for its strong emphasis on security, privacy, and unique social networking features. It has recently emerged as the host for various cross-domain analysis and research works, such as social media influence, propaganda studies, and extremism. In the first part of this talk, we introduce TeleScope, an extensive dataset suite that, to our knowledge, is the largest of its kind. It comprises metadata for about 500K Telegram channels and downloaded message metadata for about 71K public channels, accounting for around 120M crawled messages. Alongside raw message and channel data, TeleScope provides channel connections and user interaction data built using Telegram's message-forwarding feature to study multiple use cases, such as information spread and message-forwarding patterns. It also provides data enrichments such as language detection, posting activity patterns, and extracted Telegram entities, which enable online discourse analysis beyond what is possible with the original data alone. The dataset is designed for diverse applications, independent of specific research objectives, and sufficiently versatile to facilitate the replication of social media studies comparable to those conducted on platforms like X (formerly Twitter).
In the second part of the talk, we present our ongoing work on predicting the virality of Telegram messages. Using TeleScope, we examine how message-level features, channel-level attributes, and network factors influence a message's reach. Together, this work offers a foundational resource for studying Telegram at scale and insights for understanding how messages spread on Telegram.
Audience & Discussion Goals:
This talk is intended for researchers in computational social science, NLP, network analysis, as well as those interested in misinformation, online discourse and social media analysis.
I would like to welcome discussions around:
* How can large-scale datasets like TeleScope help us uncover message
propagation patterns, channel-to-channel forwarding networks, and user
interaction dynamics in the Telegram Platform?
* What factors affect the virality of messages in Telegram?
Host: Jan Meissner
Understanding and predicting the behavior of matter at the atomic scale is fundamental to advancing chemistry and materials science. Accurate simulations rely on describing the potential energy surface (PES), which governs the interactions between atoms. While quantum mechanical methods like Density Functional Theory (DFT) provide reasonable accuracy, their computational expense limits their application to relatively small to medium sized systems and short timescales. Lowering this computational barrier makes it possible to simulate longer timescales, study complex reaction pathways, and perform rapid searches through vast numbers of molecules for promising candidates. This colloquium will explore the rapidly evolving field of Machine Learning Interatomic Potentials (MLIPs), which offer a powerful approach to constructing accurate and computationally efficient models of the PES. By learning complex atomistic interactions directly from reference data (typically generated by quantum mechanical calculations), MLIPs can achieve near-quantum accuracy at a fraction of the computational cost. We will begin with an overview of the fundamental concepts, discussing how machine learning techniques are adapted to respect physical symmetries and represent atomic environments. We will then delve into the practical aspects of building these potentials, covering crucial steps like reference data generation, feature engineering, model training, and rigorous validation. This includes discussing advanced strategies like active learning and uncertainty quantification to ensure the models are robust and efficiently explore the relevant chemical space. Finally, we will discuss the current challenges, limitations, and exciting future directions for the development and application of machine learning in atomistic modeling. This presentation aims to provide a comprehensive overview of MLIPs for a broad scientific audience, showcasing their transformative potential in chemical research.