Legal and Ethical Issues in Human Language Technologies

full-day Workshop at LREC-Coling 2024, Turin, Italy, May 20, 2024

Image Copyright:

Submission System

About the Workshop

LEGAL 2024

2023 is likely to be remembered as a year dominated by discussions about Artificial Intelligence (AI) and Large Language Models (LLM). These technologies require data to be collected and utilized in unprecedented amounts. Large sets of Language data are owned by stakeholders that are not necessarily involved in the development of such technologies. To use these sets for AI and LLM, it is essential to repackage and repurpose them for such endeavor. Language data, despite their intangible nature, are often subject to legal constraints which need to be addressed in order to guarantee lawful access to and re-use of these data. In recent years, considerable efforts have been made to adapt legal frameworks to the advancements in technology while taking into account the interests of various stakeholders. From the technological perspective, the strict consideration of legal aspects imposes further questions besides pure recording technology and participant consent. This arises in several key elements:

What is the Intellectual Proprietary status of Large Language sets, the corresponding Large Language Models, and their potential outputs? How can identifying information used in deep learning be removed or anonymized (and is this mandatory), how reliable are predictions/ models based on anonymized data? Which impact does this have on the usability, computational costs?

The purpose of this full-day workshop is to build bridges between technology and legal framework, and discuss current legal and ethical issues in the human language technology sector.

What to submit?

1500-2000 words extended abstracts (by 04 March 2024) are needed at first for submission. The full papers will be published as workshop proceedings along with the LREC-Coling main conference. For these, the instructions of the main conference need to be followed. Submit via Softconf

Dr Jennifer Williams (University of Southampton) AI Regulation Perspectives from the UK


Speech and language technology is a special kind of AI that easily transcends borders through online engagement. What are the challenges, risks, and opportunities for AI regulation in this domain? In light of recent efforts to address AI safety challenges through regulation, some areas of practice - including speech and language technology - have been thinking about this topic for years prior to the unprecedented attention toward AI from international governments and experts from other disciplines. This is due in part to speech and language experts having a fundamental insight into how the technologies work, how the technologies affect downstream users, how technical solutions are created, and their limitations. In this talk, I will explore how AI regulation in the United Kingdom is evolving since the UK government first announced its "pro-innovation approach" to AI regulation in 2022 and hosted the high-profile international AI Safety Summit at Bletchley Park in 2023.


Dr Jennifer Williams is an Assistant Professor at the University of Southampton. Her work explores creation of trustworthy, private, and secure speech/audio solutions. Dr Williams leads several large interdisciplinary projects through the UKRI Trustworthy Autonomous Systems Hub (TAS Hub) including voice anonymisation, trustworthy audio, speech paralinguistics for medical applications, and AI regulation. She also leads a Responsible AI UK International Partnership with the UK, US, and Australia on "AI Regulation Assurance for Safety-Critical Systems" across sectors. She completed her PhD at the University of Edinburgh on representation learning for speech signal disentanglement, and showed this approach is valuable for a variety of speech technology applications (voice conversion, speech synthesis, anti-spoofing, naturalness assessment, and privacy). Before that, she was a technical staff member at MIT Lincoln Laboratory for five years where she developed rapid prototyping solutions for text and speech technology. She is Chair of the ISCA special interest group on Security and Privacy in Speech Communication (SPSC-SIG) and an affiliate member of the NIST-OSAC subcommittee on speaker recognition for forensic science.

Topics of interest include:

  • Impact of statutory exceptions on text and speech data mining practices in the field of Human Language Technologies.
  • Impact of the regulatory environment at the international level (e.g. EU Data Act, Digital Governance Act, Digital Services Act, AI Act; the Chinese “2023 draft rules on generative AI”, the USA Blueprint for an AI Bill of Rights and other international or national regulations) on the circulation and use of language data.
  • Legal issues related to the production and use of Large Language Models (Intellectual Property, Data Governance and Data Protection aspects).
  • Concrete applications as to how language technologies can help resolve legal issues related to data collection, data sharing and data reuse.
  • Ethical considerations related to personal data collection and re-use
  • Trust and transparency in language and speech technologies
  • Efficient anonymization techniques, and the related responsibility, and their impact on usability and performance
  • Re-identification issues/De-anonymization approaches and techniques
  • Harmonizing differing perspectives of data scientists and legal experts, worldwide

Important Dates


March 04, 2024

Deadline for submission of extended abstracts

March 30, 2024

Notification of acceptance

April 05, 2024

Submission of final version of accepted papers

May 20, 2024

Workshop Day


09:00 - 09:15 Opening Session: Welcome by Workshop Chairs
09:15 - 09:30 Participant and Organizer Introduction
09:30 - 10:30 Invited Talk AI Regulation Perspectives from the UK

Jennifer Williams, University of Southampton

10:30 - 11:00 Coffee Break
11:00 - 13:00 Session I Legal Frameworks and Ethical Considerations
Compliance by Design Methodologies in the Legal Governance Schemes of European Data Spaces Kossay Talmoudi, Khalid Choukri, and Isabelle Gavanon
A Legal Framework for Natural Language Model Training in Portugal Ruben Almeida and Evelin Amorim
Intellectual property rights at the training, development and generation stages of Large Language Models Christin Kirchhübel and Georgina Brown
Ethical Issues in Language Resources and Language Technology – New Challenges, New Perspectives Pawel Kamocki and Andreas Witt
13:00 - 14:00 Lunch Break
14:00 - 16:00 Session II: Considerations and Implications of AI
Legal and Ethical Considerations that Hinder the Use of LLMs in a Finnish Institution of Higher Education Mika Hämäläinen
Implications of Regulations on Large Generative AI Models in the Super-Election Year and the Impact on Disinformation Vera Schmitt, Jakob Tesch, Eva Lopez, Tim Polzehl, Aljoscha Burchardt, Konstanze Neumann, Salar Mohtaj and Sebastian Möller
Selling Personal Information: Data Brokers and the Limits of US Regulation Denise DiPersio
What can I do with this data point? Towards modeling legal and ethical aspects of linguistic data collection and (re)use as a process Annett Jorschick, Paul T. Schrader and Hendrik Buschmeier
16:00 - 16:30 Coffee Break
16:30 - 17:50 Session III: Applications and User Perspective
Data-Envelopes for Cultural Heritage: Going beyond Datasheets Maria Eskevich and Mrinalini Luthra
Emotional Toll and Coping Strategies: Navigating the Effects of Annotating Hate Speech Data Maryam M. AlEmadi and Wajdi Zaghouani
User Perspective on Anonymity in Voice Assistants – A comparison between Germany and Finland Ingo Siegert, Silas Rech, Matthias Haase and Tom Bäckströme
17:50 - 18:00 Wrap-Up of the Workshop and Closing Ceremony
18:00 After Workshop on-site gathering

Organizers and Contact of the LEGAL Workshop:

Ingo Siegert, Otto-von-Guericke-Universität Magdeburg, Germany

Khalid Choukri, ELRA/ELDA, France

Pawel Kamocki, IDS Mannheim, Germany

Program Committee

Khalid Choukri,

Mickaël Rigault,

Claudia Cevenini,

Erik Ketzan,

Prodromos Tsiavos,

Andreas Witt,

Paweł Kamocki,

Kim Nayyer,

Krister Lindèn,

Ingo Siegert,

Tom Bäckström,

Nicholas Evans,

Catherine Jasserand,

Isabel Trancoso