2023 is likely to be remembered as a year dominated by discussions about Artificial Intelligence (AI) and Large Language Models (LLM). These technologies require data to be collected and utilized in unprecedented amounts. Large sets of Language data are owned by stakeholders that are not necessarily involved in the development of such technologies. To use these sets for AI and LLM, it is essential to repackage and repurpose them for such endeavor. Language data, despite their intangible nature, are often subject to legal constraints which need to be addressed in order to guarantee lawful access to and re-use of these data. In recent years, considerable efforts have been made to adapt legal frameworks to the advancements in technology while taking into account the interests of various stakeholders. From the technological perspective, the strict consideration of legal aspects imposes further questions besides pure recording technology and participant consent. This arises in several key elements:
What is the Intellectual Proprietary status of Large Language sets, the corresponding Large Language Models, and their potential outputs? How can identifying information used in deep learning be removed or anonymized (and is this mandatory), how reliable are predictions/ models based on anonymized data? Which impact does this have on the usability, computational costs?
The purpose of this full-day workshop is to build bridges between technology and legal framework, and discuss current legal and ethical issues in the human language technology sector.
1500-2000 words extended abstracts (by 04 March 2024) are needed at first for submission. The full papers will be published as workshop proceedings along with the LREC-Coling main conference. For these, the instructions of the main conference need to be followed. Submit via Softconf
Speech and language technology is a special kind of AI that easily transcends borders through online engagement. What are the challenges, risks, and opportunities for AI regulation in this domain? In light of recent efforts to address AI safety challenges through regulation, some areas of practice - including speech and language technology - have been thinking about this topic for years prior to the unprecedented attention toward AI from international governments and experts from other disciplines. This is due in part to speech and language experts having a fundamental insight into how the technologies work, how the technologies affect downstream users, how technical solutions are created, and their limitations. In this talk, I will explore how AI regulation in the United Kingdom is evolving since the UK government first announced its "pro-innovation approach" to AI regulation in 2022 and hosted the high-profile international AI Safety Summit at Bletchley Park in 2023.
Dr Jennifer Williams is an Assistant Professor at the University of Southampton. Her work explores creation of trustworthy, private, and secure speech/audio solutions. Dr Williams leads several large interdisciplinary projects through the UKRI Trustworthy Autonomous Systems Hub (TAS Hub) including voice anonymisation, trustworthy audio, speech paralinguistics for medical applications, and AI regulation. She also leads a Responsible AI UK International Partnership with the UK, US, and Australia on "AI Regulation Assurance for Safety-Critical Systems" across sectors. She completed her PhD at the University of Edinburgh on representation learning for speech signal disentanglement, and showed this approach is valuable for a variety of speech technology applications (voice conversion, speech synthesis, anti-spoofing, naturalness assessment, and privacy). Before that, she was a technical staff member at MIT Lincoln Laboratory for five years where she developed rapid prototyping solutions for text and speech technology. She is Chair of the ISCA special interest group on Security and Privacy in Speech Communication (SPSC-SIG) and an affiliate member of the NIST-OSAC subcommittee on speaker recognition for forensic science.
Deadline for submission of extended abstracts
Notification of acceptance
Submission of final version of accepted papers
Workshop Day
09:00 - 09:15 | Opening Session: Welcome by Workshop Chairs |
09:15 - 09:30 | Participant and Organizer Introduction |
09:30 - 10:30 | Invited Talk AI Regulation Perspectives from the UK Jennifer Williams, University of Southampton |
10:30 - 11:00 | Coffee Break |
11:00 - 13:00 | Session I Legal Frameworks and Ethical Considerations |
Compliance by Design Methodologies in the Legal Governance Schemes of European Data Spaces Kossay Talmoudi, Khalid Choukri, and Isabelle Gavanon | |
A Legal Framework for Natural Language Model Training in Portugal Ruben Almeida and Evelin Amorim | |
Intellectual property rights at the training, development and generation stages of Large Language Models Christin Kirchhübel and Georgina Brown | |
Ethical Issues in Language Resources and Language Technology – New Challenges, New Perspectives Pawel Kamocki and Andreas Witt | |
13:00 - 14:00 | Lunch Break |
14:00 - 16:00 | Session II: Considerations and Implications of AI |
Legal and Ethical Considerations that Hinder the Use of LLMs in a Finnish Institution of Higher Education Mika Hämäläinen | |
Implications of Regulations on Large Generative AI Models in the Super-Election Year and the Impact on Disinformation Vera Schmitt, Jakob Tesch, Eva Lopez, Tim Polzehl, Aljoscha Burchardt, Konstanze Neumann, Salar Mohtaj and Sebastian Möller | |
Selling Personal Information: Data Brokers and the Limits of US Regulation Denise DiPersio | |
What can I do with this data point? Towards modeling legal and ethical aspects of linguistic data collection and (re)use as a process Annett Jorschick, Paul T. Schrader and Hendrik Buschmeier | |
16:00 - 16:30 | Coffee Break |
16:30 - 17:50 | Session III: Applications and User Perspective |
Data-Envelopes for Cultural Heritage: Going beyond Datasheets Maria Eskevich and Mrinalini Luthra | |
Emotional Toll and Coping Strategies: Navigating the Effects of Annotating Hate Speech Data Maryam M. AlEmadi and Wajdi Zaghouani | |
User Perspective on Anonymity in Voice Assistants – A comparison between Germany and Finland Ingo Siegert, Silas Rech, Matthias Haase and Tom Bäckströme | |
17:50 - 18:00 | Wrap-Up of the Workshop and Closing Ceremony |
18:00 | After Workshop on-site gathering |
Ingo Siegert, Otto-von-Guericke-Universität Magdeburg, Germany
Khalid Choukri, ELRA/ELDA, France
Pawel Kamocki, IDS Mannheim, Germany
Khalid Choukri,
Mickaël Rigault,
Claudia Cevenini,
Erik Ketzan,
Prodromos Tsiavos,
Andreas Witt,
Paweł Kamocki,
Kim Nayyer,
Krister Lindèn,
Ingo Siegert,
Tom Bäckström,
Nicholas Evans,
Catherine Jasserand,
Isabel Trancoso