Jump to contentJump to search

Custom colloquia

Each doctoral student has the opportunity to propose a colloquium on a topic of their choice during the registration phase. The HeiCAD team will then assemble groups of researchers from related disciplines to form discussion and tutorial groups.

All participants will be invited to participate in one of the colloquia. We will suggest group assignments that bring together researchers from different working groups, career stages and backgrounds, so the colloquia will also be a great opportunity for all participants to network and get to know researchers from across HeiCAD!

 

Title: Evaluating KG Embeddings for Link Prediction: A Study on the Influence of Relational Patterns and cardinalities

Abstract: Knowledge Graphs (KGs) serve as repositories for organizing and storing data, utilizing graph structures to represent a wide array of real-world information. These graphs are comprised of nodes representing entities and edges depicting relationships between these entities. However, an inherent limitation of KGs lies in their incompleteness, demanding the development of KG construction methodologies to address this challenge. One such approach is Link Prediction, which utilizes KG embeddings (KGE). This study focuses on analyzing five state-of-the-art KGE models (TransE, Rescal, ComplEx, ConvE, DistMult) on two link prediction benchmark datasets (WN18RR, FB15K-237), particularly emphasizing cardinalities of relations and relational patterns, such as symmetry, anti-symmetry, inverse relations, inference-based patterns, and compositional structures. Additionally, we also investigate the role of the distribution of these patterns within the KG datasets, aiming to enhance our understanding of how these factors influence the overall link prediction performance of KGE models. We also aim to utilize the findings from this study to address the inherent incompleteness of our GESIS Knowledge Graph (GESIS KG), which is constructed from social science scholarly metadata, comprising of metadata about scholarly articles, research datasets, and their interconnections. Furthermore, this study will offer valuable insights into these KGE models, which can be employed to generate embeddings for various downstream tasks leveraging the knowledge graph, including question-answering systems, search engines, and entity classification.

The outcome of the colloquium: Ideas to frame the narrative of this work (Goal: creating a plan for a new paper), Ideas to formulate a method to interpret KGE models (Goal: add a novelty approach to analyze SOTA KGE models)

Target audience: Computer scientists

Title: Leveraging Prompting LLMs for fine grained Named Entity Annotation

Abstract: Fine-grained entity annotation is a pivotal aspect of scholarly text analysis, enabling deeper insights and more nuanced interpretations of academic research. Traditional annotation methods, reliant on domain experts, often lead to bottlenecks and inefficiencies, making manual annotation impractical for large datasets. Recent advancements in natural language processing, particularly in the development of large language models (LLMs), offer promising solutions for various other NLP tasks such as classifications, text completion, Named Entity Recognition using prompting techniques. Hence, this research explores leveraging these techniques with LLMs to assess the performance and consequently propose enhancement to improve the efficiency of the annotation process in Named Entity Recognition. Our approach automates scholarly text annotation using a prompting strategy, evaluates performance with models like Mistral and LLaMA variants, and proposes tuning methods to achieve optimal results. We implement and test on three datasets as our initial experimental setup, demonstrating the potential of LLMs in fine-grained scholarly named entity annotations.

The outcome of the colloquium: I would like to talk about a paper or more generally would also welcome feedback regarding what a frame for my dissertation could be.

Target audience: Computer scientists, Computational Linguists and researchers from related fields

Title: Advancing Healthcare through Conversational AI and Natural Language Processing: Enhancing Patient-Healthcare Provider Interactions

The increasing integration of Conversational AI (CAI) and Natural Language Processing (NLP) in healthcare offers significant potential to improve patient outcomes, streamline healthcare provider (HP) workflows, and enhance the overall efficiency of healthcare systems. My research focuses on the intersection of CAI and healthcare, specifically investigating the dynamics of patient-HP interactions facilitated by AI-driven chatbots, virtual assistants, and NLP applications.

Research Topic: This colloquium will delve into the development and implementation of CAI technologies in healthcare settings, examining their impact on communication between patients and HPs. We aim to explore how these technologies can be optimized to support clinical decision-making, patient engagement, screening and monitoring, treatment plans, and overall interactions between patients and HPs.

Data Types & Methods: Our methodological approach combines qualitative and quantitative techniques. The research primarily involves qualitative data from patient and HP interviews and interaction logs from AI-based conversational agents. Additional data includes quantitative metrics such as user engagement rates, response accuracy, HP workload measures, and patient health outcomes. Although no data has been collected thus far, we plan to discuss data collection strategies and identify the most useful data types. User-centered design principles guide the development of CAI, ensuring they meet the needs of both patients and HPs.

Use Cases, Applications, Motivation: Key use cases include CAI, NLP, and virtual assistants for screening, chronic disease management, and decision-making. The motivation behind this research is to enhance the accessibility and quality of healthcare services, reduce HP workload, and empower patients in managing their health. We aim to evaluate how CAI should be implemented and positioned in the patient-HP relationship to benefit both parties.

Desired Outcome: The colloquium aims to foster a collaborative discussion on the current challenges and future directions of CAI and NLP in healthcare. We seek to gather insights from the audience to refine our research approach, identify additional use cases, and explore potential collaborations. The ultimate goal is to develop a comprehensive roadmap for implementing CAI in healthcare that is both effective and ethically sound.

Preferred Target Audience: We invite healthcare professionals, AI researchers, data scientists, and policymakers to participate in this colloquium. Their diverse perspectives will be invaluable in shaping the future of AI-driven healthcare solutions and ensuring they address real-world needs and challenges effectively.

Title:  Collecting Web Data for the Social Sciences

Abstract: The GESIS Web Data for the Social Sciences service1 acts as an umbrella for different activities around collecting digital behavioral data from the Web, especially from social media platforms. It serves as an entry point to long-term samples from specific platforms (such as Twitter/X, Telegram and 4Chan) and additional data offers specifically prepared to enable research on current topics of societal relevance, or acute events. TweetsKB2 is an example data offer created by the service based on a continuous crawl of 1% Twitter sample. We are currently working on implementing new data collections and offers based on data from Telegram and 4Chan. In this colloquium, after a detailed introduction to the current state of service, we would like to understand the needs of social and political scientists and scientists of related disciplines in order to better shape existing and planned data offers.

Data types & methods: Often, raw data from social media platforms such as Telegram and 4Chan is not directly useful to researchers or cannot be shared due to the platforms’ Terms of Services (ToS), General Dat Protection Regulation (GDPR), or ethical considerations. Further, the raw data might not be in a suitable format and may exhibit or lack relevant characteristics such as user demographics, geolocations, entities, topics, and sentiments.  In this colloquium, we would like to understand which methods researchers often use in their work to infer or disguise (e.g., for anonymization) such characteristics with the goal of applying them at scale to raw data and create enriched data offers that are easy to use and can be freely shared. 

Use cases & applications: In this colloquium, we would like to identify hot topics, methods, and research questions for which scientists require Telegram and 4Chan data and potential hurdles for them when collecting and working with these data. Hurdles might include missing links from Telegram and 4Chan posts/topics to existing survey programs such as the German General Social Survey (ALLBUS), European Values Study (EVS), etc. 

Desired outcome: 

  • Feedback on existing data collections and offers, including data types and format, relevant characteristics and features inferred from raw data, 
  • Identifying relevant datasets/surveys to link Telegram and 4Chan data offers to 
  • Suggestions on ways to address the target group to collect web data requirements, 
  • Revision of the existing survey for web data requirements collection, 
  • Suggestions for relevant communities and ways to connect to them (e.g., mailing lists, working groups, conferences, etc.).

Preferred target audience: The primary target audience comprises social, political, and communication scientists interested in or already working with digital behavioral data from Twitter, Telegram or 4Chan. However, we also welcome researchers from other disciplines, e.g. economics and computer science, interested in working with and collecting digital behavioral data.

Responsible for the content: