SER

TWON Consortium Meeting – a recap 

Last week, the TWON consortium came together in Berlin to advance ongoing work on democratic online social networks and to strengthen the project’s dialogue with policy and civil society stakeholders.

Across a series of internal and public sessions, the meeting focused on how platform design and algorithmic choices shape democratic discourse, contribute to polarization, and influence the spread of disinformation. A dedicated Policy Hackathon provided space for consortium members and external experts to explore regulatory challenges and identify priorities for future policy-oriented work.

In addition, TWON hosted a public dissemination event at Publix, bringing together participants from politics, academia, and civil society to discuss responsible platform design and the broader implications of the project’s research.

The week also included TWON’s General Assembly, where the consortium reflected on progress and lessons learned and discussed how the project’s insights can support future research and collaboration beyond TWON.

The programme concluded with a visit to the German Chancellery and further exchanges with colleagues from the policy sphere, highlighting the importance of connecting research and governance perspectives in the field of online platforms.

The TWON consortium thanks all contributors for their engagement, constructive discussions, and continued collaboration.

Research on Open Source LLM Safety at HICSS 2026

From January 6-9, 2026, TWON researcher Simon Münker presented his paper at the Hawaii International Conference on System Sciences (HICSS), one of the leading international conferences in the field of information systems and digital innovation.

The paper addresses societal risks associated with open source Large Language Models and evaluates the effectiveness of existing safety and guardrail mechanisms. Together with his co author Fabio Sartori, Simon Münker received the Best Paper Award for this research.

The study systematically examines guardrail vulnerabilities across seven widely used open source LLMs. Using advanced natural language processing classification methods, it identifies recurring patterns of harmful content generation under adversarial prompting. These vulnerabilities were first observed during earlier research activities within the TWON project, where initial experiments revealed persistent weaknesses in model safety mechanisms.

The findings show that several prominent models consistently produce content classified as hateful or offensive. This raises concerns about the potential implications of open source LLMs for democratic discourse and social cohesion. In particular, the results challenge public safety assurances by model developers and point to discrepancies between stated safeguards and observed model behavior.

The research contributes to ongoing discussions on responsible AI development and the governance of AI systems that shape online communication and public discourse. It underlines the need for more robust, transparent and empirically tested safety mechanisms in open source AI ecosystems.

The paper was presented as part of the Digital Democracy Minitrack at HICSS 2026.