The TRUSTS initiative aims at making an impact on everyday lives with their research. Therefore, information about our research will be provided on this side.
Toward Business Models for a Meta-Platform: Exploring Value Creation in the Case of Data Marketplaces
Authors: Antragama Ewa Abbas, Hosea Ayaba Ofe, Anneke Zuiderwijk, Mark de Reuver
Investigating meta-platforms has been a continuing concern within information system literature due to the increasingly complex constellations of platforms in ecologies of ecosystems. A meta-platform is a platform built on top of two or more platforms, hence connecting their respective ecosystems. One promising case to benefit from meta-platforms is data marketplaces: a particular type of platform that facilitates responsible (personal and non-personal) data sharing among companies. Given that business models for meta-platforms are largely unexplored in this emerging case, how they can create value for data marketplaces remain speculative. As a starting point toward business model investigations, this paper explores value creation of a meta-platform in the case of data marketplaces. We interviewed fourteen data-sharing consultants and six meta-platform experts. We identify three potential value creation archetypes of a meta-platform. The discovery aggregator archetype emphasizes searching and dispatching value, while the brokerage one focuses on promoting and supporting value. Finally, the one-stop-shop archetype creates value by standardizing, regulating, sharing, and experimenting. This study is among the first that explore value creation archetypes for a meta-platform, thus identifying core value as a base for further business model investigations.
The Openness of Data Platforms: A Research Agenda
Authors: Mark de Reuver, Hosea Ofe, Wirawan Agahari, Antragama Ewa Abbas and Anneke Zuiderwijk-van Eijk
Data platforms are the keystone of the data economy. When opened up, data platforms allow data owners, data consumers and third parties to interact. Yet, openness may also harm business and societal interests. Literature on platform openness does not cover data platforms, and data economy scholars rarely study platform openness. Therefore, this paper develops a research agenda on the openness of data platforms. We explore how data platforms differ from conventional digital platforms (e.g., software platforms). From those differentiating characteristics, we identify areas for future work: (1) The specific characteristics of data require reconceptualizing the object of platform openness; (2) New ways in which data platforms can be opened should be conceptualized; (3) As data platforms are tailored to specific industries, platform-to-platform openness should be a novel unit of analysis; (4) Because opening up data platforms create novel risks, new reasons to (not) open up data platforms should be studied.
Towards Employing Recommender Systems for Supporting Data and Algorithm Sharing
Authors: Peter Müllner, Stefan Schmerda, Dieter Theiler, Stefanie Lindstaedt, Dominik Kowald
Data and algorithm sharing is an imperative part of data and AI-driven economies. The efficient sharing of data and algorithms relies on the active interplay between users, data providers, and algorithm providers. Although recommender systems are known to effectively interconnect users and items in e-commerce settings, there is a lack of research on the applicability of recommender systems for data and algorithm sharing. To fill this gap, we identify six recommendation scenarios for supporting data and algorithm sharing, where four of these scenarios substantially differ from the traditional recommendation scenarios in e-commerce applications. We evaluate these recommendation scenarios using a novel dataset based on interaction data of the OpenML data and algorithm sharing platform, which we also provide for the scientific community. Specifically, we investigate three types of recommendation approaches, namely popularity-, collaboration-, and content-based recommendations. We find that collaboration-based recommendations provide the most accurate recommendations in all scenarios. Plus, the recommendation accuracy strongly depends on the specific scenario, e.g., algorithm recommendations for users are a more difficult problem than algorithm recommendations for datasets. Finally, the content-based approach generates the least popularity-biased recommendations that cover the most datasets and algorithms.
Privacy-Preserving Techniques for Trustworthy Data Sharing: Opportunities and Challenges for Future Research
Authors: Lidia Dutkiewicz, Yuliya Miadzvetskaya, Hosea Ofe, Alan Barnett, Lukas Helminger, Stefanie Lindstaedt, and Andreas Trügler
One of the foundations of data sharing in the European Union (EU) is trust, especially in view of the advancing digitalization and recent developments with respect to European Data Spaces. In this chapter, we argue that privacypreserving techniques, such as multi-party computation and fully homomorphic encryption, can play a positive role in enhancing trust in data sharing transactions. We therefore focus on an interdisciplinary perspective on how privacy-preserving techniques can facilitate trustworthy data sharing. We start with introducing the legal landscape of data sharing in the EU. Then, we discuss the different functions of third-party intermediaries, namely, data marketplaces. Before giving a legal perspective on privacy-preserving techniques for enhancing trust in data sharing, we briefly touch upon the Data Governance Act (DGA) proposal with relation to trust and its intersection with the General Data Protection Regulation (GDPR). We continue with an overview on the technical aspects of privacy-preserving methods in the later part, where we focus on methods based on cryptography (such as homomorphic encryption, multi-party computation, private set intersection) and link them to smart contracts. We discuss the main principles behind these methods and highlight the open challenges with respect to privacy, performance bottlenecks, and a more widespread application of privacy-preserving analytics. Finally, we suggest directions for future research by highlighting that the mutual understanding of legal frameworks and technical capabilities will form an essential building block of sustainable and secure data sharing in the future.
TRUSTS Whitepaper on Data Governance in Collaborative Data Environments
Authors: Stefan Gindl, Michael Boch, Gianna Avgousti, Christos Skoufis, Alan Barnett, Matthew Keating, Nina Popanton
Data governance is an emerging necessity in enterprise information management and should be a strategic initiative for all organizations. In this whitepaper, the TRUSTS Consortium will provide handson insights on their experiences on data governance mechanisms in one of the three use case fields: Anti-Money-Laundering.
Preparing Future Business Data Sharing via a Meta-Platform for Data Marketplaces: Exploring Antecedents and Consequences of Data Sovereignty
Authors: Antragama Ewa Abbas, Hosea Ofe, Anneke Zuiderwijk, Mark de Reuver
Meta-platforms have received considerable Information Systems scholarly attention in recent years. Meta-platforms enable platform-to-platform openness and are especially beneficial to amplifying network effects in highly-specialized markets. A promising emerging context for applying meta-platforms is data marketplaces—a special type of digital platform designed for business data sharing that is vastly fragmented. However, data providers have sovereignty concerns: the risk of losing control over the data that they share through meta-platforms. This research aims to explore antecedents and consequences of data sovereignty concerns in meta-platforms for data marketplaces. Based on interviews with fifteen potential data providers and five data marketplace experts, we identify data sovereignty antecedents, such as (potentially) less trustworthy data marketplace participants, unclear use cases, and data provenance difficulties. Data sovereignty concerns have many consequences, including knowledge spillovers to competitors and reputational damage. This study is among the first that empirically develops a pre-conceptualization for data sovereignty in this novel context, thus laying the groundwork for designing future data marketplace meta-platform solutions.
CryptoTL: Private, efficient and secure transfer learning
Authors: Roman Walch, Samuel Sousa, Lukas Helminger, Stefanie Lindstaedt, Christian Rechberger, Andreas Trügler
Big data has been a pervasive catchphrase in recent years, but dealing with data scarcity has become a crucial question for many real-world deep learning (DL) applications. A popular methodology to efficiently enable the training of DL models to perform tasks in scenarios where only a small dataset is available is transfer learning (TL). TL allows knowledge transfer from a general domain to a specific target one; however, such a knowledge transfer may put privacy at risk when it comes to sensitive or private data. With CryptoTL we introduce a solution to this problem, and show for the first time a cryptographic privacy-preserving TL approach based on homomorphic encryption that is efficient and feasible for real-world use cases. We demonstrate this by focusing on classification tasks with small datasets and show the applicability of our approach for sentiment analysis. Additionally we highlight how our approach can be combined with differential privacy to further increase the security guarantees. Our extensive benchmarks show that using CryptoTL leads to high accuracy while still having practical fine-tuning and classification runtimes despite using homomorphic encryption. Concretely, one forward-pass through the encrypted layers of our setup takes roughly 1s on a notebook CPU.
Business model archetypes for data marketplaces in the automotive industry
Authors: Rômy Bergman, Antragama Ewa Abbas, Sven Jung, Claudia Werker & Mark de Reuver
Policymakers and analysts are heavily promoting data marketplaces to foster data trading between companies. Existing business model literature covers individually owned, multilateral data marketplaces. However, these particular types of data marketplaces hardly reach commercial exploitation. This paper develops business model archetypes for the full array of data marketplace types, ranging from private to independent ownership and from a hierarchical to a market orientation. Through exploratory interviews and case analyses, we create a business model taxonomy. Patterns in our taxonomy reveal four business model archetypes. We find that privately-owned data marketplaces with hierarchical orientation apply the aggregating data marketplace archetype. Consortium-owned data marketplaces apply the archetypes of aggregating data marketplace with additional brokering service and consulting data marketplace. Independently owned data marketplaces with market orientation apply the facilitating data marketplace archetype. Our results provide a basis for configurational theory that explains the performance of data marketplace business models. Our results also provide a basis for specifying boundary conditions for theory on data marketplace business models, as, for instance, the importance of network effects differs strongly between the archetypes.
How to keep text private? A systematic review of deep learning methods for privacy-preserving natural language processing
Authors: Samuel Sousa and Roman Kern
Deep learning (DL) models for natural language processing (NLP) tasks often handle private data, demanding protection against breaches and disclosures. Data protection laws, such as the European Union’s General Data Protection Regulation (GDPR), thereby enforce the need for privacy. Although many privacy-preserving NLP methods have been proposed in recent years, no categories to organize them have been introduced yet, making it hard to follow the progress of the literature. To close this gap, this article systematically reviews over sixty DL methods for privacy-preserving NLP published between 2016 and 2020, covering theoretical foundations, privacy-enhancing technologies, and analysis of their suitability for real-world scenarios. First, we introduce a novel taxonomy for classifying the existing methods into three categories: data safeguarding methods, trusted methods, and verification methods. Second, we present an extensive summary of privacy threats, datasets for applications, and metrics for privacy evaluation. Third, throughout the review, we describe privacy issues in the NLP pipeline in a holistic view. Further, we discuss open challenges in privacy-preserving NLP regarding data traceability, computation overhead, dataset size, the prevalence of human biases in embeddings, and the privacy-utility tradeoff. Finally, this review presents future research directions to guide successive research and development of privacy-preserving NLP models.
A Systematic Review of Data Management Platforms
Authors: Michael Boch, Stefan Gindl, Alan Barnett, George Margetis, Victor Mireles, Emmanouil Adamakis, and Petr Knoth
This paper systematically reviews a set of well-established data management platforms and compares their functionality. We de- rived an initial criteria catalogue from existing research work and ex- tended it based on the input gathered through several expert interviews. Finally, we applied this criteria catalogue to a set of data management platforms. The contribution of this work is (i) an up-to-date criteria catalogue to systematically assess the feature-richness of data management platforms, generalizable to related use-cases (e.g. data markets), and (ii) the systematic review of a selected set of data management platforms along with these criteria. This work lays the foundation for future research in this area, being subject to periodic re-evaluation to also include developments and improvements of the platforms.
Recommendations in a Multi-Domain Setting: Adapting for Customization, Scalability and Real-Time Performance
Authors: Dominik Kowald and Emanuel Lacic
In this industry talk at ECIR’2022, we illustrate how to build a modern recommender system that can serve recommendations in real-time for a diverse set of application domains. Specifically, we present our system architecture that utilizes popular recommendation algorithms from the literature such as Collaborative Filtering, Content-based Filtering as well as various neural embedding approaches (e.g., Doc2Vec, Autoencoders, etc.). We showcase the applicability of our system architecture using two real-world use-cases, namely providing recommendations for the domains of (i) job marketplaces, and (ii) entrepreneurial start-up founding. We strongly believe that our experiences from both research- and industry-oriented settings should be of interest for practitioners in the field of real-time multi-domain recommender systems.
Popularity Bias in Collaborative Filtering-Based Multimedia Recommender Systems
Authors: Dominik Kowald and Emanuel Lacic
Multimedia recommender systems suggest media items, e.g., songs, (digital) books and movies, to users by utilizing concepts of traditional recommender systems such as collaborative filtering. In this paper, we investigate a potential issue of such collaborative-filtering based multimedia recommender systems, namely popularity bias that leads to the underrepresentation of unpopular items in the recommendation lists. Therefore, we study four multimedia datasets, i.e., LastFm, MovieLens, BookCrossing and MyAnimeList, that we each split into three user groups differing in their inclination to popularity, i.e., LowPop, MedPop and HighPop. Using these user groups, we evaluate four collaborative filtering-based algorithms with respect to popularity bias on the item and the user level. Our findings are three-fold: firstly, we show that users with little interest into popular items tend to have large user profiles and thus, are important data sources for multimedia recommender systems. Secondly, we find that popular items are recommended more frequently than unpopular ones. Thirdly, we find that users with little interest into popular items receive significantly worse recommendations than users with medium or high interest into popularity.
Multi-Party Computation in the GDPR
Authors: Lukas Helminger and Christian Rechberger
The EU GDPR has two main goals: Protecting individuals from personal data abuse and simplifying the free movement of personal data. Privacy-enhancing technologies promise to fulfill both goals simultaneously. A particularly effective and versatile technology solution is multi-party computation (MPC). It allows protecting data during a computation involving multiple parties. This paper aims for a better understanding of the role of MPC in the GDPR. Although MPC is relatively mature, little research was dedicated to its GDPR compliance. First, we try to give an understanding of MPC for legal scholars and policymakers. Then, we examine the GDPR relevant provisions regarding MPC with a technical audience in mind. Finally, we devise a test that can assess the impact of a given MPC solution with regard to the GDPR. The test consists of several questions, which a controller can answer without the help of a technical or legal expert. Going through the questions will classify the MPC solution as (1) a means of avoiding the GDPR, (2) Data Protection by Design, or (3) having no legal benefits. Two concrete case studies should provide a blueprint on how to apply the test. We hope that this work also contributes to an interdisciplinary discussion of MPC certification and standardization.
What Drives Readership? An Online Study on User Interface Types and Popularity Bias Mitigation in News Article Recommendations
Authors: Emanuel Lacic, Leon Fadljevic, Franz Weissenboeck, Stefanie Lindstaedt, Dominik Kowald
Personalized news recommender systems support readers in finding the right and relevant articles in online news platforms. In this paper, we discuss the introduction of personalized, content-based news recommendations on DiePresse, a popular Austrian online news platform, focusing on two specific aspects: (i) user interface type, and (ii) popularity bias mitigation. Therefore, we conducted a two-weeks online study that started in October 2020, in which we analyzed the impact of recommendations on two user groups, i.e., anonymous and subscribed users, and three user interface types, i.e., on a desktop, mobile and tablet device. With respect to user interface types, we find that the probability of a recommendation to be seen is the highest for desktop devices, while the probability of interacting with recommendations is the highest for mobile devices. With respect to popularity bias mitigation, we find that personalized, content-based news recommendations can lead to a more balanced distribution of news articles’ readership popularity in the case of anonymous users. Apart from that, we find that significant events (e.g., the COVID-19 lockdown announcement in Austria and the Vienna terror attack) influence the general consumption behavior of popular articles for both, anonymous and subscribed users.
Privacy in Open Search: A Review of Challenges and Solutions
Authors: Samuel Sousa, Christian Guetl, Roman Kern
Privacy is of worldwide concern regarding activities and processes that include sensitive data. For this reason, many countries and territories have been recently approving regulations controlling the extent to which organizations may exploit data provided by people. Artificial intelligence areas, such as machine learning and natural language processing, have already successfully employed privacy-preserving mechanisms in order to safeguard data privacy in a vast number of applications. Information retrieval (IR) is likewise prone to privacy threats, such as attacks and unintended disclosures of documents and search history, which may cripple the security of users and be penalized by data protection laws. This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data. Our contribution is threefold: firstly, we present an overview of privacy threats to IR tasks; secondly, we discuss applicable privacy-preserving mechanisms which may be employed in solutions to restrain privacy hazards; finally, we bring insights on the tradeoffs between privacy preservation and utility performance for IR tasks.
Position Paper on Simulating Privacy Dynamics in Recommender Systems
Authors: Peter Müllner, Elisabeth Lex, Dominik Kowald
In this position paper, we discuss the merits of simulating privacy dynamics in recommender systems. We study this issue at hand from two perspectives: Firstly, we present a conceptual approach to integrate privacy into recommender system simulations, whose key elements are privacy agents. These agents can enhance users’ profiles with different privacy preferences, e.g., their inclination to disclose data to the recommender system. Plus, they can protect users’ privacy by guarding all actions that could be a threat to privacy. For example, agents can prohibit a user’s privacy-threatening actions or apply privacy-enhancing techniques, e.g., Differential Privacy, to make actions less threatening. Secondly, we identify three critical topics for future research in privacy-aware recommender system simulations: (i) How could we model users’ privacy preferences and protect users from performing any privacy-threatening actions? (ii) To what extent do privacy agents modify the users’ document preferences? (iii) How do privacy preferences and privacy protections impact recommendations and privacy of others? Our conceptual privacy-aware simulation approach makes it possible to investigate the impact of privacy preferences and privacy protection on the micro-level, i.e., a single user, but also on the macro-level, i.e., all recommender system users. With this work, we hope to present perspectives on how privacy-aware simulations could be realized, such that they enable researchers to study the dynamics of privacy within a recommender system.
Context dependent trade-offs around platform-to-platform openness: The case of the Internet of Things
Authors: Lars Mosterd, Vladimir C.M. Sobota, Geertenvan de Kaa, Aaron YiDing, Mark de Reuver
As digital platforms are dominating the digital economy, complex ecologies of platforms are emerging. While the openness of digital platforms is an important theme in platform studies, the openness between platforms has hardly been studied. This paper explores factors that affect decisions by platform owners to open their platforms to other platforms. The focus is on Internet-of-Things platforms for automotive and healthcare applications. According to the findings, platform owners make trade-offs on whether to open up on a case-by-case basis. We identify a complex array of factors relating to direct benefits and costs (e.g., revenues from selling platform data), indirect benefits (e.g., attractiveness of the focal platform to users) as well as strategic consideration (e.g., improving bargaining power towards other actors). How businesses make trade-offs on these factors depends on market-level context (e.g., maturity of the market and standards) and organizational context (e.g., strategic focus and business objectives). Our findings provide a basis for future studies on the openness between platforms, which will become increasingly important as platforms proliferate in every layer of the digital industry.
Whitepaper: TRUSTS Technology – Equipping European Data Markets with Technological Innovations
Authors: Ahmad Hemid, Ohad Arnon, Stefan Gindl, Alan Barnett, Victor Mireles-Chavez
The aim of this whitepaper is to give the project stakeholders – i.e. data providers, data consumers, similar EU project consortiums, technology providers; in general the European Data Ecosystem – an overview of the technological basis of the future data market or data market federator. TRUSTS maintains an open communication policy and would like to share its own learnings from the project activities with all interested parties.
To provide a general overview of the technological developments in the project, this whitepaper explains which reference architectures TRUSTS builds on, how these have been further developed, and which innovations are necessary for the future, and thus for the achievement of the project proposal.
Whitepaper on the Data Governance Act
Authors: Julie Baloup, Charlotte Ducuing, Emre Bayamlıoğlu, Aliki Benmayor, Lidia Dutkiewicz, Yuliya Miadzvetskaya, Teodora Lalova, Bert Peeters
The whitepaper offers an academic perspective to the discussion on the Data Governance Act proposal (“DGA proposal”), as adopted by the European Commission in November 2020. It contains a legal analysis of the DGA proposal and includes recommendations to amend its shortcomings. The White Paper aims to cover the full spectrum of the DGA proposal and therefore offers an in-depth analysis of its main provisions. In conclusion, the authors identify general patterns at work with the DGA proposal, namely, first, the (new) regulation of data as an object and, even more so, as an object of rights. This approach, the authors find, may contribute to exacerbate the risk of contradictions of the DGA proposal with the GDPR on the level of principles. Second, it discusses the relationship of the DGA proposal vis-à-vis the (regulation of) European data spaces and more generally its place in the two-pillars approach of the EC, between horizontal (sector-agnostic) and sectoral regulation of data. Finally, the DGA proposal is identified as a cornerstone of the new EU ‘digital sovereignty’ policy.
Business Data Sharing through Data Marketplaces: A Systematic Literature Review
Authors: Antragama Ewa Abbas, Wirawan Agahari, Montijn Van De Ven, Anneke Zuiderwijk & Mark De Reuver
Data marketplaces are expected to play a crucial role in tomorrow’s data economy but hardly achieve commercial exploitation. Currently, there is no clear understanding of the knowledge gaps in data marketplace research, especially neglected research topics that may contribute to advancing data marketplaces towards commercialization. This study provides an overview of the state of the art of data marketplace research. We employ a Systematic Literature Review (SLR) approach and structure our analysis using the Service-TechnologyOrganization-Finance (STOF) model. We find that the extant data marketplace literature is primarily dominated by technical research, such as discussions about computational pricing and architecture. To move past the first stage of the platform’s lifecycle (i.e., platform design) to the second stage (i.e., platform adoption), we call for empirical research in non-technological areas, such as customer expected value and market segmentation.
Creating a Taxonomy of Business Models for Data Marketplaces
Authors: Montijn Van de Ven, Antragama Ewa Abbas, Zenlin Kwee, & Mark De Reuver
Data marketplaces can fulfil a key role in realizing the data economy by enabling the commercial trading of data between organizations. Although data marketplace research is a quickly evolving domain, there is a lack of understanding about data marketplace business models. As data marketplaces are vastly different, a taxonomy of data marketplace business models is developed in this study. A standard taxonomy development method is followed to develop the taxonomy. The final taxonomy comprises of 4 meta-dimensions, 17 business model dimensions and 59 business model characteristics. The taxonomy can be used to classify data marketplace business models and sheds light on how data marketplaces are a unique type of digital platforms. The results of this research provide a basis for theorizing in this rapidly evolving domain that is quickly becoming important.
Why open government data initiatives fail to achieve their objectives: categorizing and prioritizing barriers through a global survey
Authors: Anneke Zuiderwijk & Mark de Reuver
Existing overviews of barriers for openly sharing and using government data are often conceptual or based on a limited number of cases. Furthermore, it is unclear what categories of barriers are most obstructive for attaining open data objectives. This paper aims to categorize and prioritize barriers for openly sharing and using government data based on many existing Open Government Data Initiatives (OGDIs).
Robustness of Meta Matrix Factorization Against Strict Privacy Constraints
Authors: Peter Muellner, Dominik Kowald, Elisabeth Lex
In this paper, we explore the reproducibility of MetaMF, a meta matrix factorization framework introduced by Lin et al. MetaMF employs meta learning for federated rating prediction to preserve users’ privacy. We reproduce the experiments of Lin et al. on five datasets, i.e., Douban, Hetrec-MovieLens, MovieLens 1M, Ciao, and Jester. Also, we study the impact of meta learning on the accuracy of MetaMF’s recommendations. Furthermore, in our work, we acknowledge that users may have different tolerances for revealing information about themselves. Hence, in a second strand of experiments, we investigate the robustness of MetaMF against strict privacy constraints. Our study illustrates that we can reproduce most of Lin et al.’s results. Plus, we provide strong evidence that meta learning is essential for MetaMF’s robustness against strict privacy constraints.
The recent case law of the CJEU on (joint) controllership: have we lost the purpose of ‘purpose’?
Authors: Ducuing Charlotte, Schroers Jessica
‘Purpose’ is part of the definition of ‘controller’ and a cornerstone of the GDPR. Although the recent case law of the CJEU on (joint) controllership, Wirtschaftsakademie, Jehovan todistajat and Fashion ID, has been much discussed in the legal literature, little has been said about how it relates to ‘purpose’. Therefore, this paper analyses whether, in ruling about (joint) controllership, the Court (sufficiently) took into account the overall nature and functions of the notion of ‘purpose’ in the GDPR.
Practice and Challenges of (De-)Anonymisation for Data Sharing
Authors: Alexandros Bampoulidis, Alessandro Bruni, Ioannis Markopoulos, Mihai Lupu
Personal data is a necessity in many fields for research and innovation purposes, and when such data is shared, the data controller carries the responsibility of protecting the privacy of the individuals contained in their dataset. The removal of direct identifiers, such as full name and address, is not enough to secure the privacy of individuals as shown by de-anonymisation methods in the scientific literature. Data controllers need to become aware of the risks of de-anonymisation and apply the appropriate anonymisation measures before sharing their datasets, in order to comply with privacy regulations. To address this need, we defined a procedure that makes data controllers aware of the de-anonymisation risks and helps them in deciding the anonymisation measures that need to be taken in order to comply with the General Data Protection Regulation (GDPR). We showcase this procedure with a customer relationship management (CRM) dataset provided by a telecommunications provider. Finally, we recount the challenges we identified during the definition of this procedure and by putting existing knowledge and tools into practice.