This section points to informative material and initiatives presenting HACID achievements.
Publications
Making the wisdom of crowds efficient — with confidence
Title:Making the wisdom of crowds efficient — with confidence
Author:Julian Berger, Mehdi Moussaid, Ralph Hertwig, Stefan Herzog, Ralf Kurvers
Publication:PsyArXiv Preprint
Date:12.11.2024
Efficiently allocating individuals to work on complex decision problems is a key challenge for groups, organizations, and societies. It involves a crucial trade- off: Increasing the number of individuals working on a task typically improves accuracy, but also increases costs. Research in collective intelligence has proposed a plethora of mechanisms to pool the judgments of independent decision makers in order to improve performance. However, these mechanisms are static; because they do not adjust the number of crowd members to the challenge at hand, they incur high, fixed costs for every decision problem. We develop and test three decision rules that make it possible to benefit from the wisdom of the crowd adaptively depending on a case’s difficulty. Our rules rely on decision makers’ confidence judgments to stop crowd growth. Empirical analyses in four real-world domains (cancer diagnoses, false news classification, criminology, and forecasting) using seven datasets show that our adaptive decision rules can result in equal or higher accuracy compared to widely used static crowd aggregators, while relying on fewer individuals. Our findings present easily applicable practical decision guidelines that can substantially boost the efficiency of crowds.
Ontogenia: Ontology Generation with Metacognitive Prompting in Large Language Models
Title:Ontogenia: Ontology Generation with Metacognitive Prompting in Large Language Models
Author:Anna Sofia Lippolis, Miguel Ceriani, Sara Zuppiroli, Andrea Giovanni Nuzzolese
Publication:The Semantic Web: ESWC 2024 Satellite Events. ESWC 2024. Lecture Notes in Computer Science, vol 15344. Springer, Cham
Date:28.01.2025
Recent advancements in Large Language Models (LLMs) have primarily focused on enhancing task-specific performances by experimenting with prompt design. Despite the proven effectiveness of Metacognitive Prompting (MP), its application in the field of ontology generation remains an uncharted territory. This study addresses this gap by exploring this prompting technique in supporting the ontology design process, particularly with GPT-4, where this strategy has demonstrated consistent superiority over conventional and more direct prompting methods in recent research. Our methodology, named Ontogenia, employs a gold-standard dataset of ontology competency questions translated into SPARQL-OWL queries. This approach allows us to explore various types and stages of knowledge refinement using MP, while adhering to the eXtreme Design methodology, a well-established protocol in ontology design. Finally, the quality and performance of the resulting ontologies are assessed using both standard ontology quality metrics and evaluation by an ontology expert. This research aims to enrich the discussion on methods of ontology generation driven by LLMs by presenting concrete results on the use of metacognitive prompting and ontology design patterns.
Large Language Models Assisting Ontology Evaluation
Title:Large Language Models Assisting Ontology Evaluation
Author:Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskisärkkä, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese
Publication:Proceedings of the International Semantic Web Conference 2025, The Semantic Web – ISWC 2025. Lecture Notes in Computer Science, vol 16140. Springer, Cham
Date:29.10.2025
Ontology evaluation through functional requirements—such as testing via competency question (CQ) verification—is a well-established yet costly, labour-intensive, and error-prone endeavour, even for ontology engineering experts. In this work, we introduce OE-Assist, a novel framework designed to assist ontology evaluation through automated and semi-automated CQ verification. By presenting and leveraging a dataset of 1,393 CQs paired with corresponding ontologies and ontology stories, our contributions present, to our knowledge, the first systematic investigation into large language model (LLM)-assisted ontology evaluation, and include: (i) evaluating the effectiveness of a LLM-based approach for automatically performing CQ verification against a manually created gold standard, and (ii) developing and assessing an LLM-powered framework to assist CQ verification with Protégé, by providing suggestions. We found that automated LLM-based evaluation with o1-preview and o3-mini perform at a similar level to the average user’s performance. Through a user study on the framework with 19 knowledge engineers from eight international institutions, we also show that LLMs can assist manual CQ verification and improve user accuracy, especially when suggestions are correct. Additionally, participants reported a marked decrease in perceived task difficulty. However, we also observed a reduction in human performance when the LLM provided incorrect guidance, showing a critical trade-off between efficiency and accuracy in assisted ontology evaluation.
Boosting collective intelligence in medical diagnostics: Leveraging decision similarity as a predictor of accuracy when answers are open-ended rankings
Title:Boosting collective intelligence in medical diagnostics: Leveraging decision similarity as a predictor of accuracy when answers are open-ended rankings
Author:Nikolas Zöller, Stefan M. Herzog, RalfH.J.M. Kurvers
Publication:HCOMP-CI 2023 Works-in-Progress and Demonstrations
Date:09.06.2023
Leveraging collective intelligence and combining several decisions into a single one can outperform individual judgments in many domains. Additionally, distinguishing between high-performing and low-performing individuals can further boost accuracy and is especially important in high-stake contexts. In binary decision problems it has been shown that decision similarity to others is a predictor of accuracy and can be used to identify high performers even if no actual track record of performance is available. Here we apply and generalize this approach to open-ended medical diagnostics where diagnoses are given in the form of free-text, and incomplete rankings of varying length. We show that selecting decision makers based on prior decision similarity to others increases the average accuracy of both individual and collective diagnoses.
Assessing the Capability of Large Language Models for Domain-Specific Ontology Generation
Title:Assessing the Capability of Large Language Models for Domain-Specific Ontology Generation
Author:Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskisarkka, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese
Publication:ESWC 2025 Workshops and Tutorials Joint Proceedings
Date:16.06.2025
Large Language Models (LLMs) have shown significant potential for ontology engineering. However, it is still unclear to what extent they are applicable to the task of domain-specific ontology generation. In this study, we explore the application of LLMs for automated ontology generation and evaluate their performance across different domains. Specifically, we investigate the generalizability of two state-of-the-art LLMsDeepSeek and o1-preview, both equipped with reasoning capabilitiesby generating ontologies from a set of competency questions (CQs) and related user stories. Our experimental setup comprises six distinct domains carried out in existing ontology engineering projects and a total of 95 curated CQs designed to test the models reasoning for ontology engineering. Our findings show that with both LLMs, the performance of the experiments is remarkably consistent across all domains, indicating that these methods are capable of generalizing ontology generation tasks irrespective of the domain. These results highlight the potential of LLM-based approaches in achieving scalable and domain-agnostic ontology construction and lay the groundwork for further research into enhancing automated reasoning and knowledge representation techniques.
Collective Intelligence Increases Diagnostic Accuracy in a General Practice Setting
Title:Collective Intelligence Increases Diagnostic Accuracy in a General Practice Setting
Author:Matthew D Blanchard, Stefan M Herzog, Juliane E Kämmer, Nikolas Zöller, Olga Kostopoulou, Ralf H J M Kurvers
Publication:Medical Decision Making 44(4)
Date:12.04.2024
Background: General practitioners (GPs) work in an ill-defined environment where diagnostic errors are prevalent. Previous research indicates that aggregating independent diagnoses can improve diagnostic accuracy in a range of settings. We examined whether aggregating independent diagnoses can also improve diagnostic accuracy for GP decision making. In addition, we investigated the potential benefit of such an approach in combination with a decision support system (DSS).
Methods: We simulated virtual groups using data sets from 2 previously published studies. In study 1, 260 GPs independently diagnosed 9 patient cases in a vignette-based study. In study 2, 30 GPs independently diagnosed 12 patient actors in a patient-facing study. In both data sets, GPs provided diagnoses in a control condition and/or DSS condition(s). Each GP’s diagnosis, confidence rating, and years of experience were entered into a computer simulation. Virtual groups of varying sizes (range: 3-9) were created, and different collective intelligence rules (plurality, confidence, and seniority) were applied to determine each group’s final diagnosis. Diagnostic accuracy was used as the performance measure.
Results: Aggregating independent diagnoses by weighing them equally (i.e., the plurality rule) substantially outperformed average individual accuracy, and this effect increased with increasing group size. Selecting diagnoses based on confidence only led to marginal improvements, while selecting based on seniority reduced accuracy. Combining the plurality rule with a DSS further boosted performance.
Discussion: Combining independent diagnoses may substantially improve a GP’s diagnostic accuracy and subsequent patient outcomes. This approach did, however, not improve accuracy in all patient cases. Therefore, future work should focus on uncovering the conditions under which collective intelligence is most beneficial in general practice.
Highlights: We examined whether aggregating independent diagnoses of GPs can improve diagnostic accuracy.Using data sets of 2 previously published studies, we composed virtual groups of GPs and combined their independent diagnoses using 3 collective intelligence rules (plurality, confidence, and seniority).Aggregating independent diagnoses by weighing them equally substantially outperformed average individual GP accuracy, and this effect increased with increasing group size.Combining independent diagnoses may substantially improve GP’s diagnostic accuracy and subsequent patient outcomes.
Human–AI collectives most accurately diagnose clinical vignettes
Title:Human–AI collectives most accurately diagnose clinical vignettes
Author:Nikolas Zöller, Julian Berger, Irving Lin, Nathan Fu, Jayanth Komarneni, Gioele Barabucci, Kyle Laskowski, Victor Shia, Benjamin Harack, Eugene A. Chu, Vito Trianni, Ralf H. J. M. Kurvers, Stefan M. Herzog
Publication:Proceedings of the National Academy of Sciences, volume 122
Date:13.06.2025
AI systems, particularly large language models (LLMs), are increasingly being employed in high-stakes decisions that impact both individuals and society at large, often without adequate safeguards to ensure safety, quality, and equity. Yet LLMs hallucinate, lack common sense, and are biased—shortcomings that may reflect LLMs’ inherent limitations and thus may not be remedied by more sophisticated architectures, more data, or more human feedback. Relying solely on LLMs for complex, high-stakes decisions is therefore problematic. Here, we present a hybrid collective intelligence system that mitigates these risks by leveraging the complementary strengths of human experience and the vast information processed by LLMs. We apply our method to open-ended medical diagnostics, combining 40,762 differential diagnoses made by physicians with the diagnoses of five state-of-the art LLMs across 2,133 text-based medical case vignettes. We show that hybrid collectives of physicians and LLMs outperform both single physicians and physician collectives, as well as single LLMs and LLM ensembles. This result holds across a range of medical specialties and professional experience and can be attributed to humans’ and LLMs’ complementary contributions that lead to different kinds of errors. Our approach highlights the potential for collective human and machine intelligence to improve accuracy in complex, open-ended domains like medical diagnostics.
How large language models can reshape collective intelligence
Title:How large language models can reshape collective intelligence
Author:Jason W. Burton; Ezequiel Lopez-Lopez; Shahar Hechtlinger; Zoe Rahwan; Samuel Aeschbach; Michiel A. Bakker; Joshua A. Becker; Aleks Berditchevskaia; Julian Berger; Levin Brinkmann; Lucie Flek; Stefan M. Herzog; Saffron Huang; Sayash Kapoor; Arvind Narayanan; Anne-Marie Nussberger; Taha Yasseri; Pietro Nickl; Abdullah Almaatouq; Ulrike Hahn; Ralf H. J. M. Kurvers; Susan Leavy; Iyad Rahwan; Divya Siddarth; Alice Siu; Anita W. Woolley; Dirk U. Wulff; Ralph Hertwig
Publication:Nature Human Behaviour
Date:20.09.2024
Collective intelligence underpins the success of groups, organizations, markets and societies. Through distributed cognition and coordination, collectives can achieve outcomes that exceed the capabilities of individuals—even experts—resulting in improved accuracy and novel capabilities. Often, collective intelligence is supported by information technology, such as online prediction markets that elicit the ‘wisdom of crowds’, online forums that structure collective deliberation or digital platforms that crowdsource knowledge from the public. Large language models, however, are transforming how information is aggregated, accessed and transmitted online. Here we focus on the unique opportunities and challenges this transformation poses for collective intelligence. We bring together interdisciplinary perspectives from industry and academia to identify potential benefits, risks, policy-relevant considerations and open research questions, culminating in a call for a closer examination of how large language models affect humans’ ability to collectively tackle complex problems.
Logic Augmented Generation
Title:Logic Augmented Generation
Author:Aldo Gangemi, Andrea Giovanni Nuzzolese
Publication:Journal of Web Semantics, 85:100859, 2025
Date:01.05.2025
Semantic Knowledge Graphs (SKG) face challenges with scalability, flexibility, contextual understanding, and handling unstructured or ambiguous information. However, they offer formal and structured knowledge enabling highly interpretable and reliable results by means of reasoning and querying. Large Language Models (LLMs) may overcome those limitations, making them suitable in open-ended tasks and unstructured environments. Nevertheless, LLMs are hardly interpretable and often unreliable. To take the best out of LLMs and SKGs, we envision Logic Augmented Generation (LAG) to combine the benefits of the two worlds. LAG uses LLMs as Reactive Continuous Knowledge Graphs that can generate potentially infinite relations and tacit knowledge on-demand. LAG uses SKGs to inject a discrete heuristic dimension with clear logical and factual boundaries. We exemplify LAG in two tasks of collective intelligence, i.e., medical diagnostics and climate projections. Understanding the properties and limitations of LAG, which are still mostly unknown, is of utmost importance for enabling a variety of tasks involving tacit knowledge in order to provide interpretable and effective results.
Ontology Generation Using Large Language Models
Title:Ontology Generation Using Large Language Models
Author:Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskisärkkä, Sara Zuppiroli, Miguel Ceriani, Aldo Gangemi, Eva Blomqvist, Andrea Giovanni Nuzzolese
Publication:The Semantic Web. ESWC 2025. Lecture Notes in Computer Science, vol 15718. Springer, Cham.
Date:01.06.2025
The ontology engineering process is complex, time-consuming, and error-prone, even for experienced ontology engineers. In this work, we investigate the potential of Large Language Models (LLMs) to provide effective OWL ontology drafts directly from ontological requirements described using user stories and competency questions. Our main contribution is the presentation and evaluation of two new prompting techniques for automated ontology development: Memoryless CQbyCQ and Ontogenia. We also emphasize the importance of three structural criteria for ontology assessment, alongside expert qualitative evaluation, highlighting the need for a multi-dimensional evaluation in order to capture the quality and usability of the generated ontologies. Our experiments, conducted on a benchmark dataset of ten ontologies with 100 distinct Competency Questions (CQs) and 29 different user stories, compare the performance of three LLMs using the two prompting techniques. The results demonstrate improvements over the current state-of-the-art in LLM-supported ontology engineering. More specifically, the model OpenAI o1-preview with Ontogenia produces ontologies of sufficient quality to meet the requirements of ontology engineers, significantly outperforming novice ontology engineers in modelling ability. However, we still note some common mistakes and variability of result quality, which is important to take into account when using LLMs for ontology authoring support. We discuss these limitations and propose directions for future research.
Hybrid Collective Intelligence for Decision Support in Complex Open-Ended Domains
Title:Hybrid Collective Intelligence for Decision Support in Complex Open-Ended Domains
Author:Vito Trianni, Andrea Giovanni Nuzzolese, Jaron Porciello, Ralf H. J. M. Kurvers, Stefan M. Herzog, Gioele Barabucci, Aleksandra Berditchevskaia, Fai Fung
Publication:Volume 368 of Frontiers in Artificial Intelligence and Applications. HHAI 2023: Augmenting Human Intellect, pages 124-137, IOS Press
Date:26.06.2023
Human knowledge is growing exponentially, providing huge and sometimes contrasting evidence to support decision making in the realm of complex problems. To fight knowledge fragmentation, collective intelligence leverages groups of experts (possibly from diverse domains) that jointly provide solutions. However, to promote beneficial outcomes and avoid herding, it is necessary to (i) elicit diverse responses and (ii) suitably aggregate them in a collective solution. To this end, AI can help with dealing with large knowledge bases, as well as with reasoning on expert-provided knowledge to support decision-making. A hybrid human-artificial collective intelligence can leverage the complementarity of expert knowledge and machine processing to deal with complex problems. We discuss how such a hybrid human-artificial collective intelligence can be deployed to support decision processes, and we present case studies in two different domains: general medical diagnostics and climate change adaptation management.
Automating hybrid collective intelligence in open-ended medical diagnostics
Title:Automating hybrid collective intelligence in open-ended medical diagnostics
Author:Ralf H. J. M. Kurvers, Andrea Giovanni Nuzzolese, Alessandro Russo, Gioele Barabucci, Stefan M. Herzog, and Vito Trianni
Publication:PNAS 120(34):e2221473120
Date:28.08.2023
In the United States, an estimated 250,000 people die annually from preventable medical errors, many of which originate during the diagnostic process. A powerful approach to increase diagnostic accuracy is to combine the diagnoses of multiple diagnosticians. However, we lack methods to aggregate independent diagnoses in general medical diagnostics. Using knowledge engineering methods, we introduce a fully automated solution to this problem. We tested our solution on 1,333 medical cases, each of which was independently diagnosed by ten diagnosticians. Our solution substantially increases diagnostic accuracy: Single diagnosticians achieved 46% accuracy, pooling the decisions of ten diagnosticians increased this to 76%. These results demonstrate that collective intelligence can reduce diagnostic errors, promoting health services and trust in the global medical community.
Public deliverables
D8.5 — Annual report on dissemination and outreach activities – Final
Title:D8.5 — Annual report on dissemination and outreach activities – Final
Author:Nesta
Publication:HACID Web
Date:25.02.2026
D4.2 — Social feedback in hybrid collective problem solving
Title:D4.2 — Social feedback in hybrid collective problem solving
Author:MPG
Publication:HACID Web
Date:26.02.2026
D7.2 — Demonstration of the HACID-DSS for climate services
Title:D7.2 — Demonstration of the HACID-DSS for climate services
Author:Met Office
Publication:HACID Web
Date:27.02.2026
D7.1 — Requirements from climate services
Title:D7.1 — Requirements from climate services
Author:MetO
Publication:HACID web
Date:30.05.2023
D8.6 — Project catalogue
Title:D8.6 — Project catalogue
Author:NESTA
Publication:HACID web
Date:29.08.2025
D8.8 — Final Exploitation Plan
Title:D8.8 — Final Exploitation Plan
Author:CNR
Publication:HACID web
Date:12.02.2026
D8.7 — Exploitation Plan
Title:D8.7 — Exploitation Plan
Author:CNR
Publication:HACID web
Date:17.04.2024
D8.4 — Annual report on dissemination and outreach activities Year 2
Title:D8.4 — Annual report on dissemination and outreach activities Year 2
Author:CNR
Publication:HACID web
Date:31.08.2024
D8.3 — Annual report on dissemination and outreach activities Year 1
Title:D8.3 — Annual report on dissemination and outreach activities Year 1
Author:CNR
Publication:HACID web
Date:31.08.2023
D8.2 — Communication and Dissemination Plan
Title:D8.2 — Communication and Dissemination Plan
Author:CNR
Publication:HACID web
Date:20.12.2022
D8.1 — Project website and social media channels
Title:D8.1 — Project website and social media channels
Author:CNR
Publication:HACID web
Date:30.09.2022
D5.2 — Guidelines for participatory AI
Title:D5.2 — Guidelines for participatory AI
Author:NESTA
Publication:HACID web
Date:29.08.2025
D5.1 — Evaluation workflow and KPIs
Title:D5.1 — Evaluation workflow and KPIs
Author:NESTA
Publication:HACID web
Date:27.02.2025
D4.1 — Aggregation methods for collective solutions
Title:D4.1 — Aggregation methods for collective solutions
Author:MPG
Publication:HACID web
Date:30.05.2025
D3.2 — Dashboards for use cases
Title:D3.2 — Dashboards for use cases
Author:CNR
Publication:HACID web
Date:
D3.1 — Methods and tools for case knowledge refinement
Title:D3.1 — Methods and tools for case knowledge refinement
Author:MPG
Publication:HACID web
Date:29.02.2024
D2.2 — Domain knowledge graph instantiation and evaluation
Title:D2.2 — Domain knowledge graph instantiation and evaluation
Author:CNR
Publication:HACID web
Date:31.01.2025
D2.1 — Top-down and bottom-up approaches to domain knowledge engineering
Title:D2.1 — Top-down and bottom-up approaches to domain knowledge engineering
Author:CNR
Publication:HACID web
Date:07.03.2024
D1.9 — Ethics Check Report
Title:D1.9 — Ethics Check Report
Author:CNR
Publication:HACID web
Date:29.11.2022
D1.8 — Quality Assessment Report Year 3
Title:D1.8 — Quality Assessment Report Year 3
Author:CNR
Publication:HACID web
Date:02.09.2025
D1.7 — Quality Assessment Report Year 2
Title:D1.7 — Quality Assessment Report Year 2
Author:CNR
Publication:HACID web
Date:30.08.2024
D1.6 — Quality Assessment Report Year 1
Title:D1.6 — Quality Assessment Report Year 1
Author:CNR
Publication:HACID web
Date:31.08.2023
D1.5 — Quality Management Plan
Title:D1.5 — Quality Management Plan
Author:CNR
Publication:HACID web
Date:03.11.2022
D1.4 — Data management plan – mid term revision
Title:D1.4 — Data management plan – mid term revision
Author:MPG
Publication:HACID web
Date:21.03.2024
D1.3 — Data Management Plan
Title:D1.3 — Data Management Plan
Author:CNR
Publication:HACID web
Date:08.03.2023
D1.2 — IPR Management Plan – Mid-Term Review
Title:D1.2 — IPR Management Plan – Mid-Term Review
Author:CNR
Publication:HACID web
Date:16.02.2024
D1.1 — IPR Management Plan
Title:D1.1 — IPR Management Plan
Author:CNR
Publication:HACID web
Date:16.02.2023
Talks and presentations

Leveraging graph-structured knowledge to enhance Collective Intelligence and improve diagnostic accuracy
AI for Medicine and Healthcare in the context of the Italian National Conference on Artificial Intelligence (Ital-IA 2025)

HACID Knowledge Graph for Climate Services
AI & robotics at work: Innovations driving productivity, organized by AI on Demand

Harnessing the power of human and artificial intelligence
UKCP Science and Services network – HACID and the UKCP chatbot

Hybrid Collevtive Intelligence for Medical Diagnostics
Second Annual Meeting of the Italian Society for AI in Medicine (SIIAM)
Harnessing AI and Data for Enhanced Decision Support
Evaluation challenges for hybrid systems – interactive workshop as part of ADRF 2024.

Learning from expert elicitation for climate decision-making: Informing participatory AI in climate services
Presentation as part of the AI&CI Satellite session at CCS

Collective intelligence and the real world
Keynote talk as part of the AI&CI Satellite session at CCS
International Applied Science and Services team meeting
Does AI mean that we do Climate Services differently? Presentation and discussion

London Data Week Webinar
How participation can improve your AI or data project

TeSaCo project closing colloquium – invited speech
Quelle sagesse collective pour les technologies émergentes? Colloque de clôture du cycle d’étude TESaCo- Technologies émergentes et sagesse collective

JCEEI AI Climate Services workshop
Joint Centre for Excellence for Environmental Intelligence meeting to discuss potential for using AI for climate services
Data & Society Participatory AI Practitioners Group
Participatory AI and collective intelligence

Uses of ML and AI across the weather & climate science to service value chain
Talk to Foreign, Commonweath and Development Office, a UK Government Department, by Dr Ed Pope from Met Office

Participatory AI: Can Participation Improve the Design of AI Systems?
Workshop on Participatory AI as part of the AI & Society Forum in London

APRE: Dalla proposta alla valutazione
APRE Webinar: “Dalla proposta alla valutazione: incontro con coordinatori del Cluster 4”

Shaping priorities for investment in resilient, inclusive rural transformation (RITI)
Expert Consultation: Presentation of hybrid collective intelligence at the workshop organised at FAO

IDEAL-IST webinars: Tips from a coordinator
Webinar organised by the IDEAL-IST network.
The webinar hosted three coordinators from European Projects funded by the Horizon Europe programme under Cluster 4 (Digital, Industry and Space) to share their experience in the proposal preparation and project management. Vito Trianni presented the HACID project and the related challenges.