Research Papers

The TRUSTS initiative aims at making an impact on everyday lives with their research. Therefore, information about our research will be provided on this side.

Preparing Future Business Data Sharing via a Meta-Platform for Data Marketplaces: Exploring Antecedents and Consequences of Data Sovereignty

June 2022

Authors: Antragama Ewa Abbas, Hosea Ofe, Anneke Zuiderwijk, Mark de Reuver

Meta-platforms have received considerable Information Systems scholarly attention in recent years. Meta-platforms enable platform-to-platform openness and are especially beneficial to amplifying network effects in highly-specialized markets. A promising emerging context for applying meta-platforms is data marketplaces—a special type of digital platform designed for business data sharing that is vastly fragmented. However, data providers have sovereignty concerns: the risk of losing control over the data that they share through meta-platforms. This research aims to explore antecedents and consequences of data sovereignty concerns in meta-platforms for data marketplaces. Based on interviews with fifteen potential data providers and five data marketplace experts, we identify data sovereignty antecedents, such as (potentially) less trustworthy data marketplace participants, unclear use cases, and data provenance difficulties. Data sovereignty concerns have many consequences, including knowledge spillovers to competitors and reputational damage. This study is among the first that empirically develops a pre-conceptualization for data sovereignty in this novel context, thus laying the groundwork for designing future data marketplace meta-platform solutions.

Read the full paper here

CryptoTL: Private, efficient and secure transfer learning

May 2022

Authors: Roman Walch, Samuel Sousa, Lukas Helminger, Stefanie Lindstaedt, Christian Rechberger, Andreas Trügler

Big data has been a pervasive catchphrase in recent years, but dealing with data scarcity has become a crucial question for many real-world deep learning (DL) applications. A popular methodology to efficiently enable the training of DL models to perform tasks in scenarios where only a small dataset is available is transfer learning (TL). TL allows knowledge transfer from a general domain to a specific target one; however, such a knowledge transfer may put privacy at risk when it comes to sensitive or private data. With CryptoTL we introduce a solution to this problem, and show for the first time a cryptographic privacy-preserving TL approach based on homomorphic encryption that is efficient and feasible for real-world use cases. We demonstrate this by focusing on classification tasks with small datasets and show the applicability of our approach for sentiment analysis. Additionally we highlight how our approach can be combined with differential privacy to further increase the security guarantees. Our extensive benchmarks show that using CryptoTL leads to high accuracy while still having practical fine-tuning and classification runtimes despite using homomorphic encryption. Concretely, one forward-pass through the encrypted layers of our setup takes roughly 1s on a notebook CPU.

Read the full paper here

Business model archetypes for data marketplaces in the automotive industry

May 2022

Authors: Rômy Bergman, Antragama Ewa Abbas, Sven Jung, Claudia Werker & Mark de Reuver 

Policymakers and analysts are heavily promoting data marketplaces to foster data trading between companies. Existing business model literature covers individually owned, multilateral data marketplaces. However, these particular types of data marketplaces hardly reach commercial exploitation. This paper develops business model archetypes for the full array of data marketplace types, ranging from private to independent ownership and from a hierarchical to a market orientation. Through exploratory interviews and case analyses, we create a business model taxonomy. Patterns in our taxonomy reveal four business model archetypes. We find that privately-owned data marketplaces with hierarchical orientation apply the aggregating data marketplace archetype. Consortium-owned data marketplaces apply the archetypes of aggregating data marketplace with additional brokering service and consulting data marketplace. Independently owned data marketplaces with market orientation apply the facilitating data marketplace archetype. Our results provide a basis for configurational theory that explains the performance of data marketplace business models. Our results also provide a basis for specifying boundary conditions for theory on data marketplace business models, as, for instance, the importance of network effects differs strongly between the archetypes.

Read the full paper here

How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing

May 2022

Authors: Samuel Sousa and Roman Kern

Deep learning (DL) models for natural language processing (NLP) tasks often handle private data, demanding protection against breaches and disclosures. Data protection laws, such as the European Union’s General Data Protection Regulation (GDPR), thereby enforce the need for privacy. Although many privacy-preserving NLP methods have been proposed in recent years, no categories to organize them have been introduced yet, making it hard to follow the progress of the literature. To close this gap, this article systematically reviews over sixty DL methods for privacy-preserving NLP published between 2016 and 2020, covering theoretical foundations, privacy-enhancing technologies, and analysis of their suitability for real-world scenarios. First, we introduce a novel taxonomy for classifying the existing methods into three categories: data safeguarding methods, trusted methods, and verification methods. Second, we present an extensive summary of privacy threats, datasets for applications, and metrics for privacy evaluation. Third, throughout the review, we describe privacy issues in the NLP pipeline in a holistic view. Further, we discuss open challenges in privacy-preserving NLP regarding data traceability, computation overhead, dataset size, the prevalence of human biases in embeddings, and the privacy-utility tradeoff. Finally, this review presents future research directions to guide successive research and development of privacy-preserving NLP models.

Read the full paper here

A Systematic Review of Data Management Platforms

April 2022

Authors: Michael Boch, Stefan Gindl, Alan Barnett, George Margetis, Victor Mireles, Emmanouil Adamakis, and Petr Knoth

This paper systematically reviews a set of well-established data management platforms and compares their functionality. We de- rived an initial criteria catalogue from existing research work and ex- tended it based on the input gathered through several expert interviews. Finally, we applied this criteria catalogue to a set of data management platforms. The contribution of this work is (i) an up-to-date criteria catalogue to systematically assess the feature-richness of data management platforms, generalizable to related use-cases (e.g. data markets), and (ii) the systematic review of a selected set of data management platforms along with these criteria. This work lays the foundation for future research in this area, being subject to periodic re-evaluation to also include developments and improvements of the platforms.

Read the full paper here

Recommendations in a Multi-Domain Setting: Adapting for Customization, Scalability and Real-Time Performance

March 2022

Authors: Dominik Kowald and Emanuel Lacic

In this industry talk at ECIR’2022, we illustrate how to build a modern recommender system that can serve recommendations in real-time for a diverse set of application domains. Specifically, we present our system architecture that utilizes popular recommendation algorithms from the literature such as Collaborative Filtering, Content-based Filtering as well as various neural embedding approaches (e.g., Doc2Vec, Autoencoders, etc.). We showcase the applicability of our system architecture using two real-world use-cases, namely providing recommendations for the domains of (i) job marketplaces, and (ii) entrepreneurial start-up founding. We strongly believe that our experiences from both research- and industry-oriented settings should be of interest for practitioners in the field of real-time multi-domain recommender systems.

Read the full paper here

Popularity Bias in Collaborative Filtering-Based Multimedia Recommender Systems

March 2022

Authors: Dominik Kowald and Emanuel Lacic

Multimedia recommender systems suggest media items, e.g., songs, (digital) books and movies, to users by utilizing concepts of traditional recommender systems such as collaborative filtering. In this paper, we investigate a potential issue of such collaborative-filtering based multimedia recommender systems, namely popularity bias that leads to the underrepresentation of unpopular items in the recommendation lists. Therefore, we study four multimedia datasets, i.e., LastFm, MovieLens, BookCrossing and MyAnimeList, that we each split into three user groups differing in their inclination to popularity, i.e., LowPop, MedPop and HighPop. Using these user groups, we evaluate four collaborative filtering-based algorithms with respect to popularity bias on the item and the user level. Our findings are three-fold: firstly, we show that users with little interest into popular items tend to have large user profiles and thus, are important data sources for multimedia recommender systems. Secondly, we find that popular items are recommended more frequently than unpopular ones. Thirdly, we find that users with little interest into popular items receive significantly worse recommendations than users with medium or high interest into popularity.

Read the full paper here

Multi-Party Computation in the GDPR

January 2022

Authors: Lukas Helminger and Christian Rechberger

The EU GDPR has two main goals: Protecting individuals from personal data abuse and simplifying the free movement of personal data. Privacy-enhancing technologies promise to fulfill both goals simultaneously. A particularly effective and versatile technology solution is multi-party computation (MPC). It allows protecting data during a computation involving multiple parties. This paper aims for a better understanding of the role of MPC in the GDPR. Although MPC is relatively mature, little research was dedicated to its GDPR compliance. First, we try to give an understanding of MPC for legal scholars and policymakers. Then, we examine the GDPR relevant provisions regarding MPC with a technical audience in mind. Finally, we devise a test that can assess the impact of a given MPC solution with regard to the GDPR. The test consists of several questions, which a controller can answer without the help of a technical or legal expert. Going through the questions will classify the MPC solution as (1) a means of avoiding the GDPR, (2) Data Protection by Design, or (3) having no legal benefits. Two concrete case studies should provide a blueprint on how to apply the test. We hope that this work also contributes to an interdisciplinary discussion of MPC certification and standardization.

Read the full paper here

What Drives Readership? An Online Study on User Interface Types and Popularity Bias Mitigation in News Article Recommendations

November 2021

Authors: Emanuel Lacic, Leon Fadljevic, Franz Weissenboeck, Stefanie Lindstaedt, Dominik Kowald

Personalized news recommender systems support readers in finding the right and relevant articles in online news platforms. In this paper, we discuss the introduction of personalized, content-based news recommendations on DiePresse, a popular Austrian online news platform, focusing on two specific aspects: (i) user interface type, and (ii) popularity bias mitigation. Therefore, we conducted a two-weeks online study that started in October 2020, in which we analyzed the impact of recommendations on two user groups, i.e., anonymous and subscribed users, and three user interface types, i.e., on a desktop, mobile and tablet device. With respect to user interface types, we find that the probability of a recommendation to be seen is the highest for desktop devices, while the probability of interacting with recommendations is the highest for mobile devices. With respect to popularity bias mitigation, we find that personalized, content-based news recommendations can lead to a more balanced distribution of news articles’ readership popularity in the case of anonymous users. Apart from that, we find that significant events (e.g., the COVID-19 lockdown announcement in Austria and the Vienna terror attack) influence the general consumption behavior of popular articles for both, anonymous and subscribed users.

Read the full paper here

Privacy in Open Search: A Review of Challenges and Solutions

October 2021

Authors: Samuel Sousa, Christian Guetl, Roman Kern

Privacy is of worldwide concern regarding activities and processes that include sensitive data. For this reason, many countries and territories have been recently approving regulations controlling the extent to which organizations may exploit data provided by people. Artificial intelligence areas, such as machine learning and natural language processing, have already successfully employed privacy-preserving mechanisms in order to safeguard data privacy in a vast number of applications. Information retrieval (IR) is likewise prone to privacy threats, such as attacks and unintended disclosures of documents and search history, which may cripple the security of users and be penalized by data protection laws. This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data. Our contribution is threefold: firstly, we present an overview of privacy threats to IR tasks; secondly, we discuss applicable privacy-preserving mechanisms which may be employed in solutions to restrain privacy hazards; finally, we bring insights on the tradeoffs between privacy preservation and utility performance for IR tasks.

Read the full paper here

Position Paper on Simulating Privacy Dynamics in Recommender Systems

September 2021

Authors: Peter Müllner, Elisabeth Lex, Dominik Kowald

In this position paper, we discuss the merits of simulating privacy dynamics in recommender systems. We study this issue at hand from two perspectives: Firstly, we present a conceptual approach to integrate privacy into recommender system simulations, whose key elements are privacy agents. These agents can enhance users’ profiles with different privacy preferences, e.g., their inclination to disclose data to the recommender system. Plus, they can protect users’ privacy by guarding all actions that could be a threat to privacy. For example, agents can prohibit a user’s privacy-threatening actions or apply privacy-enhancing techniques, e.g., Differential Privacy, to make actions less threatening. Secondly, we identify three critical topics for future research in privacy-aware recommender system simulations: (i) How could we model users’ privacy preferences and protect users from performing any privacy-threatening actions? (ii) To what extent do privacy agents modify the users’ document preferences? (iii) How do privacy preferences and privacy protections impact recommendations and privacy of others? Our conceptual privacy-aware simulation approach makes it possible to investigate the impact of privacy preferences and privacy protection on the micro-level, i.e., a single user, but also on the macro-level, i.e., all recommender system users. With this work, we hope to present perspectives on how privacy-aware simulations could be realized, such that they enable researchers to study the dynamics of privacy within a recommender system.

Read the full paper here

Context dependent trade-offs around platform-to-platform openness: The case of the Internet of Things

July 2021

Authors: Lars Mosterd, Vladimir C.M. Sobota, Geertenvan de Kaa, Aaron YiDing, Mark de Reuver

As digital platforms are dominating the digital economy, complex ecologies of platforms are emerging. While the openness of digital platforms is an important theme in platform studies, the openness between platforms has hardly been studied. This paper explores factors that affect decisions by platform owners to open their platforms to other platforms. The focus is on Internet-of-Things platforms for automotive and healthcare applications. According to the findings, platform owners make trade-offs on whether to open up on a case-by-case basis. We identify a complex array of factors relating to direct benefits and costs (e.g., revenues from selling platform data), indirect benefits (e.g., attractiveness of the focal platform to users) as well as strategic consideration (e.g., improving bargaining power towards other actors). How businesses make trade-offs on these factors depends on market-level context (e.g., maturity of the market and standards) and organizational context (e.g., strategic focus and business objectives). Our findings provide a basis for future studies on the openness between platforms, which will become increasingly important as platforms proliferate in every layer of the digital industry.

Read the full paper here

Whitepaper: TRUSTS Technology – Equipping European Data Markets with Technological Innovations

June 2021

Authors: Ahmad Hemid, Ohad Arnon, Stefan Gindl, Alan Barnett, Victor Mireles-Chavez

The aim of this whitepaper is to give the project stakeholders – i.e. data providers, data consumers, similar EU project consortiums, technology providers; in general the European Data Ecosystem – an overview of the technological basis of the future data market or data market federator. TRUSTS maintains an open communication policy and would like to share its own learnings from the project activities with all interested parties.

To provide a general overview of the technological developments in the project, this whitepaper explains which reference architectures TRUSTS builds on, how these have been further developed, and which innovations are necessary for the future, and thus for the achievement of the project proposal.

Read the whitepaper here

Whitepaper on the Data Governance Act

June 2021

Authors: Julie Baloup, Charlotte Ducuing, Emre Bayamlıoğlu, Aliki Benmayor, Lidia Dutkiewicz, Yuliya Miadzvetskaya, Teodora Lalova, Bert Peeters

The whitepaper offers an academic perspective to the discussion on the Data Governance Act proposal (“DGA proposal”), as adopted by the European Commission in November 2020. It contains a legal analysis of the DGA proposal and includes recommendations to amend its shortcomings. The White Paper aims to cover the full spectrum of the DGA proposal and therefore offers an in-depth analysis of its main provisions. In conclusion, the authors identify general patterns at work with the DGA proposal, namely, first, the (new) regulation of data as an object and, even more so, as an object of rights. This approach, the authors find, may contribute to exacerbate the risk of contradictions of the DGA proposal with the GDPR on the level of principles. Second, it discusses the relationship of the DGA proposal vis-à-vis the (regulation of) European data spaces and more generally its place in the two-pillars approach of the EC, between horizontal (sector-agnostic) and sectoral regulation of data. Finally, the DGA proposal is identified as a cornerstone of the new EU ‘digital sovereignty’ policy.

Read the whitepaper here

Business Data Sharing through Data Marketplaces: A Systematic Literature Review

June 2021

Authors: Antragama Ewa Abbas, Wirawan Agahari, Montijn Van De Ven, Anneke Zuiderwijk & Mark De Reuver

Data marketplaces are expected to play a crucial role in tomorrow’s data economy but hardly achieve commercial exploitation. Currently, there is no clear understanding of the knowledge gaps in data marketplace research, especially neglected research topics that may contribute to advancing data marketplaces towards commercialization. This study provides an overview of the state of the art of data marketplace research. We employ a Systematic Literature Review (SLR) approach and structure our analysis using the Service-TechnologyOrganization-Finance (STOF) model. We find that the extant data marketplace literature is primarily dominated by technical research, such as discussions about computational pricing and architecture. To move past the first stage of the platform’s lifecycle (i.e., platform design) to the second stage (i.e., platform adoption), we call for empirical research in non-technological areas, such as customer expected value and market segmentation.

Read the full paper

Creating a Taxonomy of Business Models for Data Marketplaces

June 2021

Authors: Montijn Van de Ven, Antragama Ewa Abbas, Zenlin Kwee, & Mark De Reuver

Data marketplaces can fulfil a key role in realizing the data economy by enabling the commercial trading of data between organizations. Although data marketplace research is a quickly evolving domain, there is a lack of understanding about data marketplace business models. As data marketplaces are vastly different, a taxonomy of data marketplace business models is developed in this study. A standard taxonomy development method is followed to develop the taxonomy. The final taxonomy comprises of 4 meta-dimensions, 17 business model dimensions and 59 business model characteristics. The taxonomy can be used to classify data marketplace business models and sheds light on how data marketplaces are a unique type of digital platforms. The results of this research provide a basis for theorizing in this rapidly evolving domain that is quickly becoming important.

Read the full paper

Why open government data initiatives fail to achieve their objectives: categorizing and prioritizing barriers through a global survey

May 2021

Authors: Anneke Zuiderwijk & Mark de Reuver

Existing overviews of barriers for openly sharing and using government data are often conceptual or based on a limited number of cases. Furthermore, it is unclear what categories of barriers are most obstructive for attaining open data objectives. This paper aims to categorize and prioritize barriers for openly sharing and using government data based on many existing Open Government Data Initiatives (OGDIs).

Read the full paper on Emerald Insight

Robustness of Meta Matrix Factorization Against Strict Privacy Constraints

March 2021

Authors: Peter Muellner, Dominik Kowald, Elisabeth Lex

In this paper, we explore the reproducibility of MetaMF, a meta matrix factorization framework introduced by Lin et al. MetaMF employs meta learning for federated rating prediction to preserve users’ privacy. We reproduce the experiments of Lin et al. on five datasets, i.e., Douban, Hetrec-MovieLens, MovieLens 1M, Ciao, and Jester. Also, we study the impact of meta learning on the accuracy of MetaMF’s recommendations. Furthermore, in our work, we acknowledge that users may have different tolerances for revealing information about themselves. Hence, in a second strand of experiments, we investigate the robustness of MetaMF against strict privacy constraints. Our study illustrates that we can reproduce most of Lin et al.’s results. Plus, we provide strong evidence that meta learning is essential for MetaMF’s robustness against strict privacy constraints.

Read the full paper on Research Gate

The recent case law of the CJEU on (joint) controllership: have we lost the purpose of ‘purpose’?

September 2020

Authors: Ducuing Charlotte, Schroers Jessica 

‘Purpose’ is part of the definition of ‘controller’ and a cornerstone of the GDPR. Although the recent case law of the CJEU on (joint) controllership, Wirtschaftsakademie, Jehovan todistajat and Fashion ID, has been much discussed in the legal literature, little has been said about how it relates to ‘purpose’. Therefore, this paper analyses whether, in ruling about (joint) controllership, the Court (sufficiently) took into account the overall nature and functions of the notion of ‘purpose’ in the GDPR.

Read the full paper via KU Leuven

Practice and Challenges of (De-)Anonymisation for Data Sharing

June 2020

Authors: Alexandros Bampoulidis, Alessandro Bruni, Ioannis Markopoulos, Mihai Lupu

Personal data is a necessity in many fields for research and innovation purposes, and when such data is shared, the data controller carries the responsibility of protecting the privacy of the individuals contained in their dataset. The removal of direct identifiers, such as full name and address, is not enough to secure the privacy of individuals as shown by de-anonymisation methods in the scientific literature. Data controllers need to become aware of the risks of de-anonymisation and apply the appropriate anonymisation measures before sharing their datasets, in order to comply with privacy regulations. To address this need, we defined a procedure that makes data controllers aware of the de-anonymisation risks and helps them in deciding the anonymisation measures that need to be taken in order to comply with the General Data Protection Regulation (GDPR). We showcase this procedure with a customer relationship management (CRM) dataset provided by a telecommunications provider. Finally, we recount the challenges we identified during the definition of this procedure and by putting existing knowledge and tools into practice.

Read the full paper on Research Gate