fbpx
Wikipedia

Open scientific data

Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results,[1] and to allow data from many sources to be integrated to give new knowledge.[2]

The modern concept of scientific data emerged in the second half of the 20th century, with the development of large knowledge infrastructure to compute scientific information and observation. The sharing and distribution of data has been early identified as an important stake but was impeded by the technical limitations of the infrastructure and the lack of common standards for data communication. The World Wide Web was immediately conceived as a universal protocol for the sharing of scientific data, especially coming from high-energy physics.

Definition edit

Scientific data edit

The concept of open scientific data has developed in parallel with the concept of scientific data.

Scientific data was not formally defined until the late 20th century. Before the generalization of computational analysis, data has been mostly an informal terms, frequently used interchangeably with knowledge or information.[3] Institutional and epistemological discourses favored alternative concepts and outlooks on scientific activities: "Even histories of science and epistemology comments, mention data only in passing. Other foundational works on the making of meaning in science discuss facts, representations, inscriptions, and publications, with little attention to data per se."[4]

The first influential policy definition of scientific data appeared as late as 1999, when the National Academies of Science described data as "facts, letters, numbers or symbols that describe an object, condition, situation or other factors".[5] Terminologies have continued to evolve: in 2011, the National Academies updated the definition to include a large variety of dataified objects such as "spectrographic, genomic sequencing, and electron microscopy data; observational data, such as remote sensing, geospatial, and socioeconomic data; and other forms of data either generated or compiled, by humans or machines" as well as "digital representation of literature"[5]

While the forms and shapes of data remain expansive and unsettled, standard definitions and policies have recently tended to restrict scientific data to computational or digital data.[6] The open data pilot of Horizon 2020 has been voluntarily restricted to digital research: "'Digital research data' is information in digital form (in particular facts or numbers), collected to be examined and used as a basis for reasoning, discussion or calculation; this includes statistics, results of experiments, measurements, observations resulting from fieldwork, survey results, interview recordings and images"[7]

Overall, the status scientific data remains a flexible point of discussion among individual researchers, communities and policy-makers: "In broader terms, whatever 'data' is of interest to researchers should be treated as 'research data'"[6] Important policy reports, like the 2012 collective synthesis of the National Academies of science on data citation, have intentionally adopted a relative and nominalist definition of data: "we will devote little time to definitional issues (e.g., what are data?), except to acknowledge that data often exist in the eyes of the beholder."[8] For Christine Borgman, the main issue is not to define scientific data ("what are data") but to contextualize the point where data became a focal point of discussion within a discipline, an institution or a national research program ("when are data").[9] In the 2010s, the expansion of available data sources and the sophistication of data analysis method has expanded the range of disciplines primarily affected by data management issues to "computational social science, digital humanities, social media data, citizen science research projects, and political science."[10]

Open scientific data edit

Opening and sharing have both been major topic of discussion in regard to scientific data management, but also a motivation to make data emerge as a relevant issue within an institution, a discipline or a policy framework.

For Paul Edwards, whether or not to share the data, to what extent it should be shared and to whom have been major causes of data friction, that revealed the otherwise hidden infrastructures of science: "Edwards' metaphor of data friction describes what happens at the interfaces between data 'surfaces': the points where data move between people, substrates, organizations, or machines (...) Every movement of data across an interface comes at some cost in time, energy, and human attention. Every interface between groups and organizations, as well as between machines, represents a point of resistance where data can be garbled, misinterpreted, or lost. In social systems, data friction consumes energy and produces turbulence and heat – that is, conflicts, disagreements, and inexact, unruly processes."[11] The opening of scientific data is both a data friction in itself and a way to collectively manage data frictions by weakening complex issues of data ownership. Scientific or epistemic cultures have been acknowledged as primary factors in the adoption of open data policies: "data sharing practices would be expected to be community-bound and largely determined by epistemic culture."[12]

In the 2010s, new concepts have been introduced by scientist and policy-makers to more accurately define what open scientific data. Since its introduction in 2016, FAIR data has become a major focus of open research policies. The acronym describe an ideal-type of Findable, Accessible, Interoperable, and Reusable data. Open scientific data has been categorized as a commons or a public good, which is primarily maintained, enriched and preserved by collective rather than individual action: "What makes collective action useful in understanding scientific data sharing is its focus on how the appropriation of individual gains is determined by adjusting the costs and benefits that accrue with contributions to a common resource"[13]

History edit

Development of knowledge infrastructures (1945-1960) edit

 
Punch-card storage in US National Weather Records Center in Asheville (early 1960s). Data holding have expanded so much that the entrance hall has to be used as a storage facility.

The emergence of scientific data is associated with a semantic shift in the way core scientific concepts like data, information and knowledge are commonly understood.[14] Following the development of computing technologies, data and information are increasingly described as "things":[15] "Like computation, data always have a material aspect. Data are things. They are not just numbers but also numerals, with dimensionality, weight, and texture".[16]

After the Second World War large scientific projects have increasingly relied on knowledge infrastructure to collect, process and analyze important amount of data. Punch-cards system were first used experimentally on climate data in the 1920s and were applied on a large scale in the following decade: "In one of the first Depression-era government make-work projects, Civil Works Administration workers punched some 2 million ship log observations for the period 1880–1933."[17] By 1960, the meteorological data collections of the US National Weather Records Center has expanded to 400 millions cards and had a global reach. The physically of scientific data was by then fully apparent and threatened the stability of entire buildings: "By 1966 the cards occupied so much space that the Center began to fill its main entrance hall with card storage cabinets (figure 5.4). Officials became seriously concerned that the building might collapse under their weight".[18]

By the end of the 1960s, knowledge infrastructure have been embedded in a various set of disciplines and communities. The first initiative to create a database of electronic bibliography of open access data was the Educational Resources Information Center (ERIC) in 1966. In the same year, MEDLINE was created – a free access online database managed by the National Library of Medicine and the National Institute of Health (USA) with bibliographical citations from journals in the biomedical area, which later would be called PubMed, currently with over 14 million complete articles.[19] Knowledge infrastructures were also set up in space engineering (with NASA/RECON), library search (with OCLC Worldcat) or the social sciences: "The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection".[20]

Opening and sharing data: early attempts (1960-1990) edit

Early discourses and policy frameworks on open scientific data emerged immediately in the wake of the creation of the first large knowledge infrastructure. The World Data Center system (now the World Data System), aimed to make observation data more readily available in preparation for the International Geophysical Year of 1957–1958.[21] The International Council of Scientific Unions (now the International Council for Science) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.[22] In 1966, the International Council for Science created CODATA, an initiative to "promote cooperation in data management and use".[23]

These early forms of open scientific data did not develop much further. There were too many data frictions and technical resistance to the integration of external data to implement a durable ecosystem of data sharing. Data infrastructures were mostly invisible to researchers, as most of the research was done by professional librarians. Not only were the search operating systems complicated to use, but the search has to be performed very efficiently given the prohibitive cost of long-distance telecommunication.[24] While their conceptors have originally anticipated direct uses by researcher, that could not really emerge due to technical and economic impediment:

The designers of the first online systems had presumed that searching would be done by end users; that assumption undergirded system design. MEDLINE was intended to be used by medical researchers and clinicians, NASA/RECON was designed for aerospace engineers and scientists. For many reasons, however, most users through the seventies were librarians and trained intermediaries working on behalf of end users. In fact, some professional searchers worried that even allowing eager end users to get at the terminals was a bad idea.[25]

Christine Borgman does not recall any significant policy debates over the meaning, the production and the circulation of scientific data save for a few specific fields (like climatology) after 1966.[23] The insulated scientific infrastructures could hardly be connected before the advent of the web.[26] Projects, and communities relied on their own unconnected networks at a national or institutional level: "the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols".[27] Communication between scientific infrastructures was not only challenging across space, but also across time. Whenever a communication protocol was no longer maintained, the data and knowledge it disseminated was likely to disappear as well: "the relationship between historical research and computing has been durably affected by aborted projects, data loss and unrecoverable formats".[28]

Sharing scientific data on the web (1990-1995) edit

The World Wide Web was originally conceived as an infrastructure for open scientific data. Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 : "The WWW project was started to allow high energy physicists to share data, news, and documentation. We are very interested in spreading the web to other areas, and having gateway servers for other data".[29]

The project stemmed from a close knowledge infrastructure, ENQUIRE. It was an information management software commissioned to Tim Berners-Lee by the CERN for the specific needs of high energy physics. The structure of ENQUIRE was closer to an internal web of data: it connected "nodes" that "could refer to a person, a software module, etc. and that could be interlined with various relations such as made, include, describes and so forth".[30] While it "facilitated some random linkage between information" Enquire was not able to "facilitate the collaboration that was desired for in the international high-energy physics research community".[31] Like any significant computing scientific infrastructure before the 1990s, the development of ENQUIRE was ultimately impeded by the lack of interoperability and the complexity of managing network communications: "although Enquire provided a way to link documents and databases, and hypertext provided a common format in which to display them, there was still the problem of getting different computers with different operating systems to communicate with each other".[27]

The web rapidly superseded pre-existing closed infrastructure for scientific data, even when they included more advanced computing features. From 1991 to 1994, users of the Worm Community System, a major biology database on worms, switched to the Web and Gopher. While the Web did not include many advanced functions for data retrieval and collaboration, it was easily accessible. Conversely, the Worm Community System could only be browsed on specific terminals shared across scientific institutions: "To take on board the custom-designed, powerful WCS (with its convenient interface) is to suffer inconvenience at the intersection of work habits, computer use, and lab resources (…) The World-Wide Web, on the other hand, can be accessed from a broad variety of terminals and connections, and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services."[32]

Publication on the web completely changed the economics of data publishing. While in print "the cost of reproducing large datasets is prohibitive", the storage expenses of most datasets is low.[33] In this new editorial environment, the main limiting factors for data sharing becomes no longer technical or economic but social and cultural.

Defining open scientific data (1995-2010) edit

The development and the generalization of the World Wide Web lifted numerous technical barriers and frictions had constrained the free circulation of data. Yet, scientific data had yet to be defined and new research policy had to be implemented to realize the original vision laid out by Tim Berners-Lee of a web of data. At this point, scientific data has been largely defined through the process of opening scientific data, as the implementation of open policies created new incentives for setting up actionable guidelines, principles and terminologies.

Climate research has been a pioneering field in the conceptual definition of open scientific data, as it has been in the construction of the first large knowledge infrastructure in the 1950s and the 1960s. In 1995 the GCDIS articulated a clear commitment On the Full and Open Exchange of Scientific Data: "International programs for global change research and environmental monitoring crucially depend on the principle of full and open data exchange (i.e., data and information are made available without restriction, on a non-discriminatory basis, for no more than the cost of reproduction and distribution).[34] The expansion of the scope and the management of knowledge infrastructures also created to incentives to share data, as the "allocation of data ownership" between a large number of individual and institutional stakeholders has become increasingly complex.[35] Open data creates a simplified framework to ensure that all contributors and users of the data have access to it.[35]

Open data has been rapidly identified as a key objective of the emerging open science movement. While initially focused on publications and scholarly articles, the international initiatives in favor of open access expanded their scope to all the main scientific productions.[36] In 2003 the Berlin Declaration supported the diffusion of "original scientific research results, raw data and metadata, source materials and digital representations of pictorial and graphical and scholarly multimedia materials"

After 2000, international organizations, like the OECD (Organisation for Economic Co-operation and Development), have played an instrumental role in devising generic and transdisciplinary definitions of scientific data, as open data policies have to be implemented beyond the specific scale of a discipline of a country.[5] One of the first influential definition of scientific data was coined in 1999[5] by a report of the National Academies of Science: "Data are facts, numbers, letters, and symbols that describe an object, idea, condition, situation, or other factors".[37] In 2004, the Science Ministers of all nations of the OECD signed a declaration which essentially states that all publicly funded archive data should be made publicly available.[38] In 2007 the OECD "codified the principles for access to research data from public funding"[39] through the Principles and Guidelines for Access to Research Data from Public Funding which defined scientific data as "factual records (numerical scores, textual records, images and sounds) used as primary sources for scientific research, and that are commonly accepted in the scientific community as necessary to validate research findings."[40] The Principles acted as soft-law recommendation and affirmed that "access to research data increases the returns from public investment in this area; reinforces open scientific inquiry; encourages diversity of studies and opinion; promotes new areas of work and enables the exploration of topics not envisioned by the initial investigators."[41]

Policy implementations (2010-…) edit

After 2010, national and supra-national institutions took a more interventionist stance. New policies have been implemented not only to ensure and incentivize the opening of scientific data, usually in continuation to existing open data program. In Europe, the "European Union Commissioner for Research, Science, and Innovation, Carlos Moedas made open research data one of the EU's priorities in 2015."[10]

First published in 2016, the FAIR Guiding Principles[2] have become an influential framework for opening scientific data.[10] The principles have been originally designed two years earlier during a policy ad research workshop at Lorentz, Jointly Designing a Data FAIRport.[42] During the deliberations of the workshop, "the notion emerged that, through the definition of, and widespread support for, a minimal set of community-agreed guiding principles and practice"[43]

The principles do not attempt to define scientific data, which remains a relatively plastic concept, but strive to describe "what constitutes 'good data management'".[44] They cover four foundational principles, "that serve to guide data producer": Findability, Accessibility, Interoperability, and Reusability.[44] and also aim to provide a step toward machine-actionability by expliciting the underlying semantics of data.[43] As it fully acknowledge the complexity of data management, the principles do not claim to introduce a set of rigid recommendations but rather "degrees of FAIRness", that can be adjusted depending on the organizational costs but also external restrictions in regards to copyright or privacy.[45]

The FAIR principles have immediately been coopted by major international organization: "FAIR experienced rapid development, gaining recognition from the European Union, G7, G20 and US-based Big Data to Knowledge (BD2K)"[46] In August 2016, the European Commission set up an expert group to turn "FAIR Data into reality".[47] As of 2020, the FAIR principles remain "the most advanced technical standards for open scientific data to date"[48]

In 2022, the French Open Science Monitor started to publish an experimental survey of research data publications from text mining tools. Retrospective analysis showed that the rate of publications mentioning sharing of their associated has nearly doubled in 10 years, from 13% (in 2013) to 22% (in 2021).[49]

By the end of the 2010s, open data policy are well supported by scientific communities. Two large surveys commissioned by the European Commission in 2016 and 2018 find a commonly perceived benefit: "74% of researchers say that having access to other data would benefit them"[50] Yet, more qualitative observations gathered in the same investigation also showed that "what scientists proclaim ideally, versus what they actually practice, reveals a more ambiguous situation."[50]

Diffusion of scientific data edit

Publication and edition edit

Until the 2010s, the publication of scientific data referred mostly to "the release of datasets associated with an individual journal article"[51] This release is documented by a Data Accessibility Statement or DAS. Several typologies or data accessibility statements have been proposed.[52][53] In 2021, Colavizza et al. identified three categories or levels of access:

  • DAS 1: "Data available on request or similar"[54]
  • DAS 2: "Data available with the paper and its supplementary files"[54]
  • DAS 3: "Data available in a repository"[54]

Supplementary data files have appeared in the early phase of the transition to scientific digital publishing. While the format of publications have largely kept the constraints of the printing format, additional materials could be included in "supplementary information".[33] As a publication supplementary data files have an ambiguous status. In theory they are meant to be raw documents, giving access to the background of research. In practice, the released datasets have often to be specially curated for publication. They will usually focus on the primary data sources, not on the entire range of observations or measurements done for the purpose of the research: "Identifying what are "the data" associated with any individual article, conference paper, book, or other publication is often difficult [as] investigators collect data continually."[55] The selection of the data is also further influenced by the publisher. Editorial policy of the journal largely determines "goes in the main text, what in the supplemental information" and editors are especially weary on including large datasets which may be difficult to maintain in the long run.[55]

Scientific datasets have been increasingly acknowledged as an autonomous scientific publication. The assimilation of data to academic articles aimed to increase the prestige and recognition of published datasets: "implicit in this argument is that familiarity will encourage data release".[51] This approach has been favored by several publishers and repositories as it made it possible to easily integrate data in existing publishing infrastructure and to extensively reuse editorial concepts initially created around articles[51] Data papers were explicitly introduced as "a mechanism to incentivize data publishing in biodiversity science".[56]

Citation and indexation edit

The first digital databases of the 1950s and the 1960s have immediately raised issues of citability and bibliographic descriptions.[57] The mutability of computer memory was especially challenging: in contrast with printed publications, digital data could not be expected to remain stable on the long run. In 1965, Ralph Bisco underlined that this uncertainty affected all the associated documents like code notebooks, which may become increasingly out of date. Data management have to find a middle ground between continuous enhancements and some form of generic stability: "the concept of a fluid, changeable, continually improving data archive means that study cleaning and other processing must be carried to such a point that changes will not significantly affect prior analyses"[58]

Structured bibliographic metadata for database has been a debated topic since the 1960s.[57] In 1977, the American Standard for Bibliographic Reference adopted a definition of "data file" with a strong focus on the materiability and the mutability of the dataset: neither dates nor authors were indicated but the medium or "Packaging Method" had to be specified.[59] Two years later, Sue Dodd introduced an alternative convention, that brought the citation of data closer to the standard of references of other scientific publications:[57] Dodd's recommendation included the use of titles, author, editions and date, as well as alternative mentions for sub-documentations like code notebook.[60]

The indexation of dataset has been radically transformed by the development of the web, as barriers to data sharing were substantially reduced.[57] In this process, data archiving, sustainability and persistence have become critical issues. Permanent digital object identifiers (or DOI) have been introduced for scientific articles to avoid broken links, as website structures continuously evolved. In the early 2000s, pilot programs started to allocate DOIs to dataset as well[61] While it solves concrete issues of link sustainability, the creation of data DOI and norms of data citation is also part of legitimization process, that assimilate dataset to standard scientific publications and can draw from similar sources of motivation (like the bibliometric indexes)[62]

Accessible and findable datasets yield a significant citation advantage. A 2021 study of 531,889 articles published by PLOS estimated that there is a "25.36% relative gain in citation counts in general" for a journal article with "a link to archived data in a public repository".[63] Diffusion of data as a supplementary materials does not yield a significant citation advantage which suggest that "the citation advantage of DAS [Data Availability Statement] is not as much related to their mere presence, but to their contents"[64]

As of 2022, the recognition of open scientific data is still an ongoing process. The leading reference software Zotero does not have yet a specific item for dataset.

Reuse and economic impact edit

Within academic research, storage and redundancy has proven to be a significant benefit of open scientific data. In contrast, non-open scientific data is weakly preserved and can only "be retrieved only with considerable effort by the authors" if not completely lost.[65]

Analysis of the uses of open scientific data run into the same issues as for any open content: while free, universal and indiscriminate access has demonstrably expanded the scope, range and intensity of the reception it has also made it harder to track, due to the lack of transaction process.

These issues are further complicated by the novelty of data as a scientific publication: "In practice, it can be difficult to monitor data reuse, mainly because researchers rarely cite the repository"[66]

In 2018, a report of the European Commission estimated the cost of not opening scientific data in accordance with the FAIR principles: it amounted at 10.2 billion annually in direct impact and 16 billions in indirect impact over the entire innovation economy.[67] Implementing open scientific open data at a global scale "would have a considerable impact on the time we spent manipulating data and the way we store data."[67]

Practices and data culture edit

The sharing of scientific data is rooted in scientific cultures or communities of practice. As digital tools have become widespread, the infrastructures, the practices and the common representations of research communities have increasingly relied of shared meanings of what is data and what can be done with it.[12]

Pre-existing epistemic machineries can be more or less predisposed to data sharing. Important factors may include shared values (individualistic or collective), data ownership allocation and frequent collaborations with external actors which may be reluctant to data sharing.[68]

The emergence of an open data culture edit

The development of scientific open data is not limited to scientific research. It involves a diverse set of stakeholders: "Arguments for sharing data come from many quarters: funding agencies—both public and private—policy bodies such as national academies and funding councils, journal publishers, educators, the public at large, and from researchers themselves."[69] As such, the movement for scientific open data largely intersects with more global movements for open data.[70] Standards definition of open data used by a wide range of public nd private actors have been partly elaborated by researchers around concrete scientific issues.[71] The concept of transparency has especially contributed to create convergences between open science, open data and open government. In 2015, the OECD describe transparency as a common "rationale for open science and open data".[72]

Christine Borgman has identified four major rationales for sharing data commonly used across the entire regulatory and public debate over scientific open data:[69]

  • Research reproducibility: lack of reproducibility is frequently attributed to deficiencies in research transparency and data analysis process. Consequently, as "a rationale for sharing research data, [research reproducibility] is powerful yet problematic".[73] Reproducibility only applies to "certain kinds of research", mostly in regards to experimental sciences.[73]
  • Public accessibility: this rationale that "products of public funding should be available to the public" is "found in arguments for open government".[74] While directly inspired by similar arguments made in favor of open access to publications, its range is more limited as scientific open data "has direct benefits to far fewer people, and those benefits vary by stakeholder"[75]
  • Research valorization: open scientific data may bring a substantial value to the private sector. This argument is especially used to support "the need for more repositories that can accept and curate research data, for better tools and services to exploit data, and for other investments in knowledge infrastructure".[75]
  • Increased research and innovation: open scientific data may significantly enhanced the quality of private and public research. This argument aims for "investing in knowledge infrastructure to sustain research data, curated to high standards of professional practices"[75]

Yet collaboration between the different actors and stakeholders of the data lifecycle is partial. Even within academic institution, cooperation remains limited: "most researchers are making [data related search] without consulting a data manager or librarian."[76]

The global open data movement has partly lost its cohesiveness and identity during the 2010s, as debates over data availability and licensing have been overcome by domain specific issues: "When the focus shifts from calling for access to data to creating data infrastructure and putting data to work, the divergent goals of those who formed an initial open data movement come clearly into view and managing the tensions that emerge can be complex."[77] The very generic scope of open data definition that aims to embrace a very wide set of preexisting data cultures does not well take into account the higher threshold of accessibility and contextualization necessitated by scientific research: "open data in the sense of being free for reuse is a necessary but not sufficient condition for research purposes."[78]

Ideal and implementation: the paradox of data sharing edit

Since the 2000s, surveys of scientific communities have underlined a consistent discrepancy between the ideals of data sharing and their implementation in practice: "When present-day researchers are asked whether they are willing to share their data, most say yes, they are willing to do so. When the same researchers are asked if they do release their data, they typically acknowledge that they have not done so"[79] Open data culture does not emerge in a vacuum and has to content with preexisting culture of scientific data and a range of systemic factors that can discourage data sharing: "In some fields, scholars are actively discouraged from reusing data. (…) Careers are made by charting territory that was previously uncharted."[80]

In 2011, 67% of 1329 scientist agree that lack of data sharing is a "major impediment to progress in science."[81] and yet "only about a third (36%) of the respondents agree that others can access their data easily"[82] In 2016, a survey of researchers in the environment science find overwhelming support easily accessible open data (99% as at least somewhat important) and institutional mandates for open data (88%).[83] Yet, "even with willingness to share data there are discrepancies with common practices, e.g. willingness to spend time and resources preparing and up-loading data".[83] A 2022 study of 1792 data sharing statements from BioMed Central found that less 7% of the authors (123) actually provided the data upon requests.[84]

The prevalence of accessible and findable data is even lower: "Despite several decades of policy moves toward open access to data, the few statistics available reflect low rates of data release or deposit"[85] In a 2011 poll for Science, only 7.6% of researchers shared their data on community repositories with local websites hosted by universities or laboratories being favored instead.[86] Consequently "many bemoaned the lack of common metadata and archives as a main impediment to using and storing data".[86]

According to Borgmann, the paradox of data sharing is partly due to the limitation of open data policies which tends to focus on "mandating or encouraging investigators to release their data" without meeting the "expected demand for data or the infrastructure necessary to support release and reuse"[87]

Incentives and barriers to scientific open data edit

In 2022, Pujol Priego, Wareham and Romasanta stressed that incentives for the sharing of scientific data were primarily collective and include reproducibility, scientific efficiency, scientific quality, along with more individual retributions such as personal credit[88] Individual benefits include increased visibility: open dataset yield a significant citation advantage but only when they have been shared on an open repository[63]

Important barriers include the need to publish first, legal constraints and concerns about loss of credit of recognition.[89] For individual researchers, datasets may be major assets to barter for "new jobs or new collaborations"[33] and their publication may be difficult to justify unless they "get something of value in return".[33]

Lack of familiarity with data sharing, rather than a straight rejection of the principles of open science is also ultimately a leading obstacle. Several surveys in the early 2010s have shown that researchers "rarely seek data from other investigators and (…) they rarely are asked for their own data."[80] This creates a negative feedback loop as researchers make little effort to ensure data sharing which in turns discouraged effective use whereas "the heaviest demand for reusing data exists in fields with high mutual dependence."[80] The reality of data reuse may also be underestimated as data is not considered to be a prestigious data publication and the original sources are not quoted.[90]

According to a 2021 empirical study of 531,889 articles published by PLOS show that soft incentives and encouragements have a limited impact on data sharing: "journal policies that encourage rather than require or mandate DAS [Data Availability Statement] have only a small effect".[91]

Legal status edit

The opening of scientific data has raised a variety of legal issues in regards to ownership rights, copyrights, privacy and ethics. While it is commonly considered that researchers "own the data they collect in the course of their research", this "view is incorrect":[92] the creation of dataset involves potentially the rights of numerous additional actors such as institutions (research agencies, funders, public bodies), associated data producers, personal data on private citizens.[92] The legal situation of digital data has been consequently described as a "bundle of rights" due to the fact that the "legal category of "property" (...) is not a suitable model for dealing with the complexity of data governance problems"[93]

Copyright edit

Copyright has been the primary focus of the legal literature of open scientific data until the 2010s. The legality of data sharing was early on identified a crucial issue. In contrast with the sharing of scientific publication, the main impediment was not copyright but uncertainty: "the concept of 'data' [was] a new concept, created in the computer age, while copyright law emerged at the time of printed publications."[94] In theory, copyright and author rights provisions do not apply to simple collections of facts and figures. In practice, the notion of data is much more expansive and could include protected content or creative arrangement of non-copyrightable contents.

The status of data in international conventions on intellectual property is ambiguous. According to the Article 2 of the Berne Convention "every production in the literary, scientific and artistic domain" are protected.[95] Yet, research data is often not an original creation entirely produced by one or several authors, but rather a "collection of facts, typically collated using automated or semiautomated instruments or scientific equipment."[95] Consequently, there are no universal convention on data copyright and debates over "the extent to which copyright applies" are still prevalent, with different outcomes depending on the jurisdiction or the specifics of the dataset.[95] This lack of harmonization stems logically from the novelty of "research data" as a key concept of scientific research: "the concept of 'data' is a new concept, created in the computer age, while copyright law emerged at the time of printed publications."[95]

In the United States, the European Union and several other jurisdictions, copyright laws have acknowledged a distinction between data itself (which can be an unprotected "fact") and the compilation of the data (which can be a creative arrangement).[95] This principle largely predates the contemporary policy debate over scientific data, as the earliest court cases ruled in favor of compilation rights go back to the 19th century.

In the United States compilation rights have been defined in the Copyright Act of 1976 with an explicit mention of datasets: "a work formed by the collection and assembling of pre-existing materials or of data" (Par 101).[96] In its 1991 decision, Feist Publications, Inc., v. Rural Telephone Service Co., the Supreme Court has clarified the extents and the limitations on database copyrights, as the "assembling" should be demonstrably original and the "raw facts" contained in the compilation are still unprotected.[96]

Even in the jurisdiction where the application of the copyright to data outputs remains unsettled and partly theoretical, it has nevertheless created significant legal uncertainties. The frontier between a set of raw facts and an original compilation is not clearly delineated.[97] Although scientific organizations are usually well aware of copyright laws, the complexity of data rights create unprecedented challenges.[98] After 2010, national and supra-national jurisdiction have partly changed their stance in regard to the copyright protection of research data. As the sharing is encouraged, scientific data has been also acknowledged as an informal public good: "policymakers, funders, and academic institutions are working to increase awareness that, while the publications and knowledge derived from research data pertain to the authors, research data needs to be considered a public good so that its potential social and scientific value can be realised"[12]

Database rights edit

The European Union provides one of the strongest intellectual property framework for data, with a double layer of rights: copyrights for original compilations (similarly to the United States) and sui generis database rights.[97] Criteria for the originality of compilations have been harmonized across the membership states, by the 1996 Database Directive and by several major case laws settled by the European court of justice such as Infopaq International A/S v Danske Dagblades Forening c or Football Dataco Ltd et al. v Yahoo! UK Ltd. Overall, it has been acknowledged that significant efforts in the making of the dataset are not sufficient to claim compilation rights, as the structure has to "express his creativity in an original manner"[99] The Database Directive has also introduced an original framework of protection for dataset, the sui generis rights that are conferred to any dataset that required a "substantial investment".[100] While they last 15 year, sui generis rights have the potential to become permanent, as they can be renewed for every update of the dataset.

Due to their large scope in length and protection, sui generis rights have initially not been largely acknowledged by the European jurisprudence, which has raised a high bar its enforcement. This cautious approach has been reversed in the 2010s, as the 2013 decision Innoweb BV v Wegener ICT Media BV and Wegener Mediaventions strengthened the positions of database owners and condemned the reuse of non-protected data in web search engines.[101] The consolidation and expansion of database rights remain a controversial topic in European regulations, as it is partly at odds with the commitment of the European Union in favor of data-driven economy and open science.[101] While a few exceptions exists for scientific and pedagogic uses, they are limited in scope (no rights for further reutilization) and they have not been activated in all member states.[101]

Ownership edit

Copyright issues with scientific datasets have been further complicated by uncertainties regarding ownership. Research is largely a collaborative activity that involves a wide range of contributions. Initiatives like CRediT (Contributor Roles Taxonomy) have identified 14 different roles, of which 4 are explicitly related to data management (Formal Analysis, Investigation, Data curation and Visualization).[102]

In the United States, ownership of research data is usually "determined by the employer of the researcher", with the principal investigator acting as the caretaker of the data rather than the owner.[103] Until the development of research open data, US institutions have been usually more reluctant to waive copyrights on data than on publications, as they are considered strategic assets.[104] In the European Union, there is no largely agreed framework on the ownership of data.[105]

The additional rights of external stakeholders has also been raised, especially in the context of medical research. Since the 1970s, patients have claimed some form of ownership of the data produced in the context of clinical trials, notably with important controversies concerning 'whether research subjects and patients actually own their own tissue or DNA."[104]

Privacy edit

Numerous scientific projects rely on data collection of persons, notably in medical research and the social sciences. In such cases, any policy of data sharing has to be necessarily balanced with the preservation and protection of personal data.[106]

Researchers and, most specifically, principal investigators have been subjected to obligations of confidentiality in several jurisdictions.[106] Health data has been increasingly regulated since the late 20th century, either by law or by sectorial agreements. In 2014, the European Medicines Agency have introduced important changes to the sharing of clinical trial data, in order to prevent the release of all personal details and all commercially relevant information. Such evolution of the European regulation "are likely to influence the global practice of sharing clinical trial data as open data".[107]

Research management plans and practices have to be open, transparent and confidential by design.

Free licenses edit

Open licenses have been the preferred legal framework to clear the restrictions and ambiguities in the legal definition of scientific data. In 2003, the Berlin Declaration called for a universal waiver of reuse rights on scientific contributions that explicitly included "raw data and metadata".[108]

In contrast with the development of open licenses for publications which occurred on short time frame, the creation of licenses for open scientific data has been a complicated process. Specific rights, like the sui generis database rights in the European Union or specific legal principles, like the distinction between simple facts and original compilation have not been initially anticipated. Until the 2010s, free licenses could paradoxically add more restrictions to the reuse of datasets, especially in regard with attributions (which is not required for non-copyrighted objects like raw facts): "in such cases, when no rights are attached to research data, then there is no ground for licencing the data"[109]

To circumvent the issue several institutions like the Harvard-MIT Data Center started to share the data in the Public Domain.[110] This approach ensures that no right is applied on non-copyrighted items. Yet, the public domain and some associated tools like the Public Domain Mark are not a properly defined legal contract and varies significantly from one jurisdiction to another.[110] First introduced in 2009, the Creative Commons Zero (or CC0) license has been immediately contemplated for data licensing.[111] It has since become "the recommended tool for releasing research data into the public domain".[112] In accordance with the principles of the Berlin Declaration it is not a license but a waiver, as the producer of the data "overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer's Copyright and Related Rights".

Alternative approaches have included the design of new free license to disentangle the attribution stacking specific to database rights. In 2009, the Open Knowledge Foundation published the Open Database License which has been adopted by major online projects like OpenStreetMap. Since 2015, all the different Creative Commons licenses have been updated to become fully effective on dataset, as database rights have been explicitly anticipated in the 4.0 version.[109]

Open scientific data management edit

Data management has recently become a primary focus of the policy and research debate on open scientific data. The influential FAIR principles are voluntarily centered on the key features of "good data management" in a scientific context.[44] In a research context, data management is frequently associated to data lifecycles. Various models of lifecycles in different stage have been theorized by institutions, infrastructures and scientific communities, although "such lifecycles are a simplification of real life, which is far less linear and more iterative in practice."[113]

Integration to the research workflow edit

In contrast with the broad incitations for data sharing included in the early policies in favor of open scientific data, the complexity and the underlying costs and requirements of scientific data management are increasingly acknowledged: "Data sharing is difficult to do and to justify by the return on investment."[114] Open data is not simply a supplementary task but has to envisioned throughout the entire research process as it "requires changes in methods and practices of research."[114]

The opening of research data creates a new settlement of costs and benefits. Public data sharing introduces a new communication setting that largely contrasts with private exchange of data with research collaborators or partners. The collection, the purpose and the limitation of data has to be explicited as it is not possible to rely on pre-existing informal knowledge: "the documentation and representations are the only means of communicating between data creator and user."[115] Lack of proper documentation means that the burden of recontextualization fall on the potential users and may render the dataset ultimately useless.[116]

Publication requires additionally further verification in regards to the ownership of the data and the potential legal liability if the data is potentially misused. This clarification phase becomes even more complex in international research projects that may overlap several jurisdictions.[117] Data sharing and the application of open science principles also bring significant long term advantages that may not be immediately visible. Documentation of dataset helps to clarify their chain of provenance and ensure that the original data has not been significantly altered or, if this is the case, that all the further treaments are fully documented.[118] Publication under a free license also makes it possible to delegate some tasks such as long term preservation to external actors.

By the end of the 2010s, a new specialized literature on data management for research has emerged to codify the existing practices and regulatory principles.[119][120][121]

Storage and preservation edit

The availability of non-open scientific data decays rapidly: in 2014 a retrospective study of biological datasets showed that "the odds of a data set being reported as extant fell by 17% per year"[122] Consequently, the "proportion of data sets that still existed dropped from 100% in 2011 to 33% in 1991".[65] Data loss has also been singled out as a significant issue in major journals like Nature or Science[123]

Surveys of research practices have consistently shown that storage norms, infrastructures and workflow remain insastifying in most disciplines. Storage and preservation of scientific data have been early on identified as critical issues, especially in relation to observational data which are considered essential to preserve, because they are the most difficult to replicate.[35] A 2017-2018 survey of 1372 researchers contacted through the American Geophysical Union shows that only "a quarter and a fifth of the respondents" report good data storage practices.[124] Short term and unsustainable storage remains widespread with 61% of the respondents storing most or all of their data on personal computers.[124] Due to their ease of use at an individual scale, unsustainable storage solution are viewed favorably in most disciplines: "This mismatch between good practices and satisfaction may show that data storage is less important to them than data collection and analysis".[124]

First published in 2012, the reference model of Open Archival Information System state that scientific infrastructure should seek for long term preservation, that is "long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community".[125] Consequently, good practices of data management imply both on storage (to materially preserve the data) and, even more crucially on curation, "to preserve knowledge about the data to facilitate reuse".[126]

Data sharing on public repository has contributed to mitigate preservation risks due to the long-term commitment of data infrastructures and the potential redundancy of open data. A 2021 study of 50,000 data availability statement published in PLOS One showed that 80% of the dataset could be retrieved automatically and 98% of dataset with a data DOI could be retrieved either automatically or manually. Moreover, accessibility did not decay significantly for older publications: "URLs and DOIs make the data and code associated with papers more likely to be available over time".[127] Significant benefits have not been found when the open data was not properly linked or documented: "Simply requiring that data be shared in some form may not have the desired impact of making scientific data FAIR, as studies have repeatedly demonstrated that many datasets that are ostensibly shared may not actually be accessible."[128]

Plan and governance edit

Research data management can be laid out in a data management plan or DMP.

Data management plans were incepted in 1966 for the specific needs of aeronautic and engineering research, which already faced increasingly complex data frictions.[129] These first examples were focused on material issues associated with the access, transfert and storage of the data: "Until the early 2000s, DMPs were utilised in this manner: in limited fields, for projects of great technical complexity, and for limited mid-study data collection and processing purposes"[130]

After 2000, the implementation of large research infrastructure and the development of open science have changed the scope and the purpose of data management plans. Policy-makers, rather than scientists, have been instrumental in this development: "The first publications to provide general advice and guidance to researchers around the creation of DMPs were published from 2009 following the publications from JISC and the OECD (…) DMP use, we infer, has been imposed onto the research community through external forces"[131]

Empirical studies of data practices in research have "highlighted the need for organizations to offer more formal training and assistance in data management to scientists"[132] In a 2017-2018 international survey of 1372 scientist, most requests for help and formalization were associated with data management plan: "creating data management plans (33.3%); training on best practices in data management (31.3%); assistance on creating metadata to describe data or datasets (27.6%)"[132] The expansion of data collection and data analysis processes have increasingly strained a large range of unformal and non-codified data practices.

The implication of external shareholders in research projects create significant potential tensions with the principles of sharing open data. Contributions from commercial actors can especially rely on some form of exclusivity and appropriation of the final research results. In 2022, Pujol Priego, Wareham and Romasanta created several accommodation strategies to overcome these issues, such as data modularity (with sharing limited to some part of the data) and time delay (with year-long embargoes before the final release of the data).[133]

Open science infrastructures edit

The Unesco recommendation of Open Science approved in November 2021 define open science infrastructures as "shared research infrastructures that are needed to support open science and serve the needs of different communities"[134] Open science infrastructures have been recognized has major factor in the implementation and the development of data sharing policies.[135]

Leading forms of infrastructures for open scientific data include data repositories, data analysis platform, indexes, digitized library or digitized archives.[136][137] Infrastructures ensure that the costs of publishing, maintaining, and indexing datasets is not entirely supported by individual researchers and institutions. They are additionally key stakeholders in the definition and adoption of open data standards, especially in regards to licensing or documentation.

By the end of the 1990s, the creation of public scientific computing infrastructure became a major policy issue:[138] "The lack of infrastructure to support release and reuse was acknowledged in some of the earliest policy reports on data sharing."[135] The first wave of web-based scientific projects in the 1990s and the early 2000s revealed critical issues of sustainability. As funding was allocated on a specific time period, critical databases, online tools or publishing platforms could hardly be maintained[28] and project managers were faced with a valley of death "between grant funding and ongoing operational funding".[139] After 2010, the consolidation and expansion of commercial scientific infrastructure such as the acquisition of the open repositories Digital Commons and SSRN by Elsevie had further entailed calls to secure "community-controlled infrastructure".[140] In 2015, Cameron Neylon, Geoffrey Bilder and Jenifer Lin defined an influential series of Principles for Open Scholarly Infrastructure[141] that has been endorsed by leading infrastructures such as Crossref,[142] OpenCitations[143] or Data Dryad[144] By 2021, public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity: "open science is the dominant discourse to which new online services for research refer."[145] According to the 2021 Roadmap of the European Strategy Forum on Research Infrastructures (ESFRI), major legacy infrastructures in Europe have embraced open science principles. "Most of the Research Infrastructures on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm."[146]

Open science infrastructure represents a higher level of commitment on data sharing. They rely on significant and recurrent investments to ensure that data is effectively maintained and documented and "add value to data through metadata, provenance, classification, standards for data structures, and migration".[147] Furthermore, infrastructures need to be integrated to the norms and expected uses of the scientific communities they mean to serve: "The most successful become reference collections that attract longer-term funding and can set standards for their communities"[137] Maintaining open standards is one of the main challenge identified by leading European open infrastructures, as it implies choosing among competing standards in some case, as well as ensuring that the standards are correctly updated and accessible through APIs or other endpoints.[148]

The conceptual definition of open science infrastructures has been largely influenced by the analysis of Elinor Ostrom on the commons and more specifically on the knowledge commons. In accordance with Ostrom, Cameron Neylon understates that open infrastructures are not only characterized by the management of a pool of common resources but also by the elaboration of common governance and norms.[149] The diffusion of open scientific data also raise stringent issues of governance. In regards to the determination of the ownership of the data, the adoption of free license and the enforcement of regulations in regard to privacy, "continual negotiation is necessary" and involve a wide range of stakeholders.[150]

Beyond their integration in specific scientific communities, open science infrastructure have strong ties with the open source and the open data movements. 82% of the European infrastructures surveyed by SPARC claim to have partially built open source software and 53% have their entire technological infrastructure in open source.[151] Open science infrastructures preferably integrate standards from other open science infrastructures. Among European infrastructures: "The most commonly cited systems – and thus essential infrastructure for many – are ORCID, Crossref, DOAJ, BASE, OpenAIRE, Altmetric, and Datacite, most of which are not-for-profit".[152] Open science infrastructure are then part of an emerging "truly interoperable Open Science commons" that hold the premise of "researcher-centric, low-cost, innovative, and interoperable tools for research, superior to the present, largely closed system."[153]

See also edit

References edit

  1. ^ Spiegelhalter, D. Open data and trust in the literature. The Scholarly Kitchen. Retrieved 7 September 2018.
  2. ^ a b Wilkinson et al. 2016.
  3. ^ Lipton 2020, p. 19.
  4. ^ Borgman 2015, p. 18.
  5. ^ a b c d Lipton 2020, p. 59.
  6. ^ a b Lipton 2020, p. 61.
  7. ^ ARTICLE 29 — DISSEMINATION OF RESULTS — OPEN ACCESS — VISIBILITY OF EU FUNDING 2022-09-13 at the Wayback Machine, Draft of the H2020 Model Grant Agreement
  8. ^ National Academies 2012, p. 1.
  9. ^ Borgman 2015, pp. 4–5.
  10. ^ a b c Pujol Priego, Wareham & Romasanta 2022, p. 220.
  11. ^ Edwards et al. 2011, p. 669.
  12. ^ a b c Pujol Priego, Wareham & Romasanta 2022, p. 224.
  13. ^ Pujol Priego, Wareham & Romasanta 2022, p. 225.
  14. ^ Rosenberg 2018, pp. 557–558
  15. ^ Buckland 1991
  16. ^ Edwards 2010, p. 84
  17. ^ Edwards 2010, p. 99
  18. ^ Edwards 2010, p. 102
  19. ^ Machado, Jorge. "Open data and open science". In Albagli, Maciel, Abdo. "Open Science, Open Questions", 2015[dead link]
  20. ^ Shankar, Eschenfelder & Downey 2016, p. 63
  21. ^ Committee on Scientific Accomplishments of Earth Observations from Space, National Research Council (2008). Earth Observations from Space: The First 50 Years of Scientific Achievements. The National Academies Press. p. 6. ISBN 978-0-309-11095-2. Retrieved 2010-11-24.
  22. ^ World Data Center System (2009-09-18). "About the World Data Center System". NOAA, National Geophysical Data Center. Retrieved 2010-11-24.
  23. ^ a b Borgman 2015, p. 7
  24. ^ Regazzi 2015, p. 128
  25. ^ Bourne & Hahn 2003, p. 397.
  26. ^ Campbell-Kelly & Garcia-Swartz 2013.
  27. ^ a b Berners-Lee & Fischetti 2008, p. 17.
  28. ^ a b Dacos 2013.
  29. ^ Tim Berners-Lee, "Qualifiers on Hypertext Links", mail sent on August 6, 1991 to the alt.hypertext
  30. ^ Hogan 2014, p. 20
  31. ^ Bygrave & Bing 2009, p. 30.
  32. ^ Star & Ruhleder 1996, p. 131.
  33. ^ a b c d Borgman 2015, p. 217.
  34. ^ National Research Council (1995). On the Full and Open Exchange of Scientific Data. Washington, DC: The National Academies Press. doi:10.17226/18769. ISBN 978-0-309-30427-6.
  35. ^ a b c Pujol Priego, Wareham & Romasanta 2022, p. 223.
  36. ^ Lipton 2020, p. 16.
  37. ^ National Research Council 1999, p. 16.
  38. ^ OECD Declaration on Open Access to publicly funded data 20 April 2010 at the Wayback Machine
  39. ^ Lipton 2020, p. 17.
  40. ^ OECD 2007, p. 13.
  41. ^ OECD 2007, p. 4.
  42. ^ Wilkinson et al. 2016, p. 8.
  43. ^ a b Wilkinson et al. 2016, p. 3.
  44. ^ a b c Wilkinson et al. 2016, p. 1.
  45. ^ Wilkinson et al. 2016, p. 4.
  46. ^ van Reisen et al. 2020.
  47. ^ Horizon 2020 Commission expert group on Turning FAIR data into reality (E03464)
  48. ^ Lipton 2020, p. 66.
  49. ^ The French Open Science Monitor, last updated on December 1st, 2022
  50. ^ a b Pujol Priego, Wareham & Romasanta 2022, p. 241.
  51. ^ a b c Borgman 2015, p. 48.
  52. ^ Federer et al. 2018.
  53. ^ Colavizza et al. 2020.
  54. ^ a b c Colavizza et al. 2020, p. 5.
  55. ^ a b Borgman 2015, p. 216.
  56. ^ Chavan & Penev 2011.
  57. ^ a b c d Crosas 2014, p. 63.
  58. ^ Bisco 1965, p. 148.
  59. ^ Dodd 1979, p. 78.
  60. ^ Dodd 1979.
  61. ^ Brase 2004.
  62. ^ Borgman 2015, p. 47.
  63. ^ a b Colavizza et al. 2020, p. 12.
  64. ^ Colavizza et al. 2020, p. 10.
  65. ^ a b Vines et al. 2014, p. 96.
  66. ^ Lipton 2020, p. 65.
  67. ^ a b European Commission 2018, p. 31.
  68. ^ Pujol Priego, Wareham & Romasanta 2022, p. 224-225.
  69. ^ a b Borgman 2015, p. 208.
  70. ^ Davies et al. 2019, p. 1.
  71. ^ Borgman 2015, p. 44.
  72. ^ Lyon, Jeng & Mattern 2017, p. 47.
  73. ^ a b Borgman 2015, p. 209.
  74. ^ Borgman 2015, p. 211.
  75. ^ a b c Borgman 2015, p. 212.
  76. ^ Tenopir et al. 2020, p. 12.
  77. ^ Davies et al. 2019, p. 6.
  78. ^ Borgman 2015, p. 283.
  79. ^ Borgman 2015, p. 205.
  80. ^ a b c Borgman 2015, p. 213.
  81. ^ Tenopir et al. 2011, p. 7.
  82. ^ Tenopir et al. 2011, p. 9.
  83. ^ a b Schmidt, Gemeinholzer & Treloar 2016.
  84. ^ Gabelica, Bojčić & Puljak 2022.
  85. ^ Borgman 2015, p. 206.
  86. ^ a b Science 2011.
  87. ^ Borgman 2015, p. 207.
  88. ^ Pujol Priego, Wareham & Romasanta 2022, p. 226.
  89. ^ Tenopir et al. 2020, p. 5.
  90. ^ Borgman 2015, p. 223.
  91. ^ Colavizza et al. 2020, p. 13.
  92. ^ a b Lipton 2020, p. 127.
  93. ^ Kerber 2021, p. 1.
  94. ^ Lipton 2020, p. 119
  95. ^ a b c d e Lipton 2020, p. 119.
  96. ^ a b Lipton 2020, p. 122.
  97. ^ a b Lipton 2020, p. 123.
  98. ^ Lipton 2020, p. 126.
  99. ^ Article 6, Directive 2006/116/EC
  100. ^ Lipton 2020, p. 124.
  101. ^ a b c Lipton 2020, p. 125.
  102. ^ Allen, O’Connell & Kiermer 2019, p. 73.
  103. ^ Lipton 2020, p. 129.
  104. ^ a b Lipton 2020, p. 130.
  105. ^ Lipton 2020, p. 131.
  106. ^ a b Lipton 2020, p. 138.
  107. ^ Lipton 2020, p. 139.
  108. ^ Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities
  109. ^ a b Lipton 2020, p. 133.
  110. ^ a b Lipton 2020, p. 134.
  111. ^ Schofield et al. 2009.
  112. ^ Lipton 2020, p. 132.
  113. ^ Cox & Verbaan 2018, p. 26-27.
  114. ^ a b Borgman 2015, p. 214.
  115. ^ Borgman 2015, p. 220.
  116. ^ Borgman 2015, p. 222.
  117. ^ Borgman 2015, p. 218.
  118. ^ Borgman 2015, p. 221.
  119. ^ Briney 2015.
  120. ^ Cox & Verbaan 2018.
  121. ^ Tibor 2021.
  122. ^ Vines et al. 2014.
  123. ^ Tedersoo et al. 2021.
  124. ^ a b c Tenopir et al. 2020, p. 11.
  125. ^ CCSDS 2012, p. 1.
  126. ^ Lipton 2020, p. 73.
  127. ^ Federer 2022, p. 9.
  128. ^ Federer 2022, p. 11.
  129. ^ Smale et al. 2020, p. 3.
  130. ^ Smale et al. 2020, p. 4.
  131. ^ Smale et al. 2020, p. 9.
  132. ^ a b Tenopir et al. 2020, p. 13.
  133. ^ Pujol Priego, Wareham & Romasanta 2022, p. 239-240.
  134. ^ UNESCO Recommendation on Open Science, 2021, CL/4363
  135. ^ a b Borgman 2015, p. 224.
  136. ^ Ficarra et al. 2020, p. 16.
  137. ^ a b Borgman 2015, p. 225.
  138. ^ Borgman 2007, p. 21.
  139. ^ Skinner 2019, p. 6.
  140. ^ Joseph 2018, p. 1.
  141. ^ Neylon et al. 2015.
  142. ^ Crossref's Board votes to adopt the Principles of Open Scholarly Infrastructure
  143. ^ OpenCitations' compliance with the Principles of Open Scholarly Infrastructure
  144. ^ Dryad's Commitment to the Principles of Open Scholarly Infrastructure
  145. ^ Fecher et al. 2021, p. 505
  146. ^ ESFRI Roadmap 2021, p. 159.
  147. ^ Borgman 2015, p. 226.
  148. ^ Ficarra et al. 2020, p. 23.
  149. ^ Neylon 2017, p. 7.
  150. ^ Borgman 2015, p. 229.
  151. ^ Ficarra et al. 2020, p. 29.
  152. ^ Ficarra et al. 2020, p. 50.
  153. ^ Ross-Hellauer et al. 2020, p. 13.

Bibliography edit

Reports edit

  • National Research Council (1999). A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases (Report). National Academies Press. Retrieved 2022-05-18.
  • OECD (2007). OECD Principles and Guidelines for Access to Research Data from Public Funding (Report). Paris: Organisation for Economic Co-operation and Development. Retrieved 2022-05-18.
  • CCSDS (2012). Reference Model for an Open Archival Information System (OAIS) (Report). p. 135.
  • European Commission (2018). Cost-benefit analysis for FAIR research data: cost of not having FAIR research data (Report). LU: Office des publications de l'Union européenne. doi:10.2777/02999. Retrieved 2022-06-18.
  • Astell, Mathias; Hrynaszkiewicz, Iain; Allin, Katie; Penny, Dan; Mithu Lucraft; Baynes, Grace; Springer Nature Admin (2018). Practical challenges for researchers in data sharing - Springer Nature survey data (anonymised) (Report). Springer Nature. Retrieved 2022-09-11.
  • Skinner, Katherine (2019). Mapping the Scholarly Communication Landscape: 2019 Census (Report). Educopia Institute. S2CID 201314019.
  • European Commission (2019). Horizon 2020 Annotated Model Grant A greements (Report). European Commission.
  • Ficarra, Victoria; Fosci, Mattia; Chiarelli, Andrea; Kramer, Bianca; Proudman, Vanessa (2020-10-30). Scoping the Open Science Infrastructure Landscape in Europe (Report). Retrieved 2021-10-31.
  • ESFRI (2021). ESFRI Roadmap (PDF) (Report). ESFRI.
  • Ross-Hellauer, Tony; Fecher, Benedikt; Shearer, Kathleen; Rodrigues, Eloy (2019-09-03). Pubfair: a framework for sustainable, distributed, open science publishing services (Report). Retrieved 2021-12-12.

Journal articles edit

  • Bisco, Ralph L. (1965-09-01). "Social Science Data Archives Technical Considerations". Social Science Information. 4 (3): 129–150. doi:10.1177/053901846500400311. ISSN 0539-0184. S2CID 144164959.
  • Dodd, Sue A. (1979). "Bibliographic references for numeric social science data files: Suggested guidelines". Journal of the American Society for Information Science. 30 (2): 77–82. doi:10.1002/asi.4630300203. ISSN 1097-4571. Retrieved 2022-05-15.
  • Buckland, Michael K. (1991). "Information as thing". Journal of the American Society for Information Science. 42 (5): 351–360. doi:10.1002/(SICI)1097-4571(199106)42:5<351::AID-ASI5>3.0.CO;2-3. ISSN 1097-4571. Retrieved 2022-03-22.
  • Star, Susan Leigh; Ruhleder, Karen (1996-03-01). "Steps Toward an Ecology of Infrastructure: Design and Access for Large Information Spaces". Information Systems Research. 7 (1): 111–134. doi:10.1287/isre.7.1.111. ISSN 1047-7047. S2CID 10520480. Retrieved 2021-12-22.
  • Brase, Jan (2004). "Using Digital Library Techniques – Registration of Scientific Primary Data". In Heery, Rachel; Lyon, Liz (eds.). Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer. pp. 488–494. doi:10.1007/978-3-540-30230-8_44. ISBN 978-3-540-30230-8.
  • Barateiro, José; Antunes, Gonçalo; Cabral, Manuel; Borbinha, José; Rodrigues, Rodrigo (2008). "Digital Preservation of Scientific Data". In Christensen-Dalsgaard, Birte; Castelli, Donatella; Bolette Ammitzbøll Jurik; Lippincott, Joan (eds.). Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science. Vol. 5173. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 388–391. doi:10.1007/978-3-540-87599-4_41. ISBN 978-3-540-87598-7. Retrieved 2022-06-21.
  • Schofield, Paul N.; Bubela, Tania; Weaver, Thomas; Portilla, Lili; Brown, Stephen D.; Hancock, John M.; Einhorn, David; Tocchini-Valentini, Glauco; Hrabe de Angelis, Martin; Rosenthal, Nadia (2009-09-10). "Post-publication sharing of data and tools". Nature. 461 (7261): 171–173. Bibcode:2009Natur.461..171.. doi:10.1038/461171a. ISSN 0028-0836. PMC 6711854. PMID 19741686.
  • Korsmo, F. L. (2010). "The Origins and Principles of the World Data Center System". Data Science Journal. 8: –55–IGY65. doi:10.2481/dsj.SS_IGY-011.
  • Edwards, Paul N.; Mayernik, Matthew S.; Batcheller, Archer L.; Bowker, Geoffrey C.; Borgman, Christine L. (2011-10-01). "Science friction: Data, metadata, and collaboration". Social Studies of Science. 41 (5): 667–690. doi:10.1177/0306312711413314. ISSN 0306-3127. PMID 22164720. S2CID 33973392.
  • Science Staff (2011-02-11). "Challenges and Opportunities". Science. 331 (6018): 692–693. Bibcode:2011Sci...331..692.. doi:10.1126/science.331.6018.692. PMID 21311002. S2CID 109422723.
  • Tenopir, Carol; Allard, Suzie; Douglass, Kimberly; Aydinoglu, Arsev Umur; Wu, Lei; Read, Eleanor; Manoff, Maribeth; Frame, Mike (2011). "Data Sharing by Scientists: Practices and Perceptions". PLOS ONE. 6 (6): –21101. Bibcode:2011PLoSO...621101T. doi:10.1371/journal.pone.0021101. ISSN 1932-6203. PMC 3126798. PMID 21738610.
  • Chavan, Vishwas; Penev, Lyubomir (2011-12-15). "The data paper: a mechanism to incentivize data publishing in biodiversity science". BMC Bioinformatics. 12 (Suppl 15): –2. doi:10.1186/1471-2105-12-S15-S2. ISSN 1471-2105. PMC 3287445. PMID 22373175.
  • Campbell-Kelly, Martin; Garcia-Swartz, Daniel D (2013). "The History of the Internet: The Missing Narratives". Journal of Information Technology. 28 (1): 18–33. doi:10.1057/jit.2013.4. ISSN 0268-3962. S2CID 41013. Retrieved 2022-01-04.
  • Dacos, Marin (2013). "Cyberclio : vers une cyberinfrastructure au cœur de la discipline historique". In Frédéric Clavert, Serge Noiret (ed.). L'histoire contemporaine à l'ère contemporain (Peter Lang ed.). Berne. pp. 29–41.{{cite book}}: CS1 maint: location missing publisher (link)
  • Wallis, Jillian C.; Rolando, Elizabeth; Borgman, Christine L. (2013). "If We Share Data, Will Anyone Use Them? Data Sharing and Reuse in the Long Tail of Science and Technology". PLOS ONE. 8 (7): –67332. Bibcode:2013PLoSO...867332W. doi:10.1371/journal.pone.0067332. ISSN 1932-6203. PMC 3720779. PMID 23935830.
  • Vines, Timothy H.; Albert, Arianne Y. K.; Andrew, Rose L.; Débarre, Florence; Bock, Dan G.; Franklin, Michelle T.; Gilbert, Kimberly J.; Moore, Jean-Sébastien; Renaut, Sébastien; Rennison, Diana J. (2014-01-06). "The Availability of Research Data Declines Rapidly with Article Age". Current Biology. 24 (1): 94–97. doi:10.1016/j.cub.2013.11.014. ISSN 0960-9822. PMID 24361065. S2CID 7799662. Retrieved 2022-09-11.
  • Crosas, Mercè (2014-05-26). "The Evolution of Data Citation: From Principles to Implementation". IASSIST Quarterly. 37 (1–4): 62. doi:10.29173/iq504. ISSN 0739-1137. Retrieved 2022-05-15.
  • Tenopir, Carol; Dalton, Elizabeth D.; Allard, Suzie; Frame, Mike; Pjesivac, Ivanka; Birch, Ben; Pollock, Danielle; Dorsett, Kristina (2015). "Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide". PLOS ONE. 10 (8): –0134826. Bibcode:2015PLoSO..1034826T. doi:10.1371/journal.pone.0134826. ISSN 1932-6203. PMC 4550246. PMID 26308551.
  • Shankar, Kalpana; Eschenfelder, Kristin R.; Downey, Greg (2016-05-13). "Studying the History of Social Science Data Archives as Knowledge Infrastructure". Science & Technology Studies. 29 (2): 62–73. doi:10.23987/sts.55691. ISSN 2243-4690. Retrieved 2021-12-23.
  • Neylon, Cameron; Chan, Leslie (2016-04-18). "Exploring the opportunities and challenges of implementing open research strategies within development institutions". Research Ideas and Outcomes. 2: –8880. doi:10.3897/rio.2.e8880. ISSN 2367-7163. Retrieved 2021-11-01.
  • Schmidt, Birgit; Gemeinholzer, Birgit; Treloar, Andrew (2016-01-15). "Open Data in Global Environmental Research: The Belmont Forum's Open Data Survey". PLOS ONE. 11 (1): –0146695. Bibcode:2016PLoSO..1146695S. doi:10.1371/journal.pone.0146695. ISSN 1932-6203. PMC 4714918. PMID 26771577.
  • Wilkinson, Mark D.; Dumontier, Michel; Aalbersberg, IJsbrand Jan; Appleton, Gabrielle; Axton, Myles; Baak, Arie; Blomberg, Niklas; Boiten, Jan-Willem; Santos, Luiz Bonino da Silva; Bourne, Philip E.; Bouwman, Jildau; Brookes, Anthony J.; Clark, Tim; Crosas, Mercè; Dillo, Ingrid; Dumon, Olivier; Edmunds, Scott; Evelo, Chris T.; Finkers, Richard; Gonzalez-Beltran, Alejandra; Gray, Alasdair J. G.; Groth, Paul; Goble, Carole; Grethe, Jeffrey S.; Heringa, Jaap; Hoen, Peter A. C. 't; Hooft, Rob; Kuhn, Tobias; Kok, Ruben; Kok, Joost; Lusher, Scott J.; Martone, Maryann E.; Mons, Albert; Packer, Abel L.; Persson, Bengt; Rocca-Serra, Philippe; Roos, Marco; Schaik, Rene van; Sansone, Susanna-Assunta; Schultes, Erik; Sengstag, Thierry; Slater, Ted; Strawn, George; Swertz, Morris A.; Thompson, Mark; Lei, Johan van der; Mulligen, Erik van; Velterop, Jan; Waagmeester, Andra; Wittenburg, Peter; Wolstencroft, Katherine; Zhao, Jun; Mons, Barend (2016). "The FAIR Guiding Principles for scientific data management and stewardship". Scientific Data. 3: 160018. Bibcode:2016NatSD...360018W. doi:10.1038/sdata.2016.18. PMC 4792175. PMID 26978244.
  • Lyon, Liz; Jeng, Wei; Mattern, Eleanor (2017-09-16). "Research Transparency: A Preliminary Study of Disciplinary Conceptualisation, Drivers, Tools and Support Services". International Journal of Digital Curation. 12 (1): 46–64. doi:10.2218/ijdc.v12i1.530. ISSN 1746-8256. Retrieved 2022-06-10.
  • Witkowski, Tomasz (2017). . Skeptical Inquirer. 41 (4): 6–7. Archived from the original on 2018-09-15. although some scientists now agree that doing so could help prevent future retractions of scientific manuscripts.
  • Besançon, Lonni; Peiffer-Smadja, Nathan; Segalas, Corentin; Jiang, Haiting; Masuzzo, Paola; Smout, Cooper; Billy, Eric; Deforet, Maxime; Leyrat, Clémence (2020). "Open Science Saves Lives: Lessons from the COVID-19 Pandemic". BMC Medical Research Methodology. 21 (1): 117. doi:10.1186/s12874-021-01304-y. PMC 8179078. PMID 34090351.
  • Rosenberg, Daniel (2018-11-01). "Data as Word". Historical Studies in the Natural Sciences. 48 (5): 557–567. doi:10.1525/hsns.2018.48.5.557. hdl:21.11116/0000-0002-C567-C. ISSN 1939-1811. S2CID 149765492. Retrieved 2022-03-21.
  • Joseph, Heather (2018-09-05). "Securing community-controlled infrastructure: SPARC's plan of action". College & Research Libraries News. 79 (8): 426. doi:10.5860/crln.79.8.426. S2CID 116057034.
  • Federer, Lisa M.; Belter, Christopher W.; Joubert, Douglas J.; Livinski, Alicia; Lu, Ya-Ling; Snyders, Lissa N.; Thompson, Holly (2018-05-02). "Data sharing in PLOS ONE: An analysis of Data Availability Statements". PLOS ONE. 13 (5): –0194768. Bibcode:2018PLoSO..1394768F. doi:10.1371/journal.pone.0194768. ISSN 1932-6203. PMC 5931451. PMID 29719004.
  • Ross-Hellauer, Tony; Schmidt, Birgit; Kramer, Bianca (2018). "Are funder Open Access platforms a good idea?". SAGE Open. 8 (4): 2158244018816717. doi:10.1177/2158244018816717. S2CID 220987901.
  • Neylon, Cameron (2017-12-27). "Sustaining Scholarly Infrastructures through Collective Action: The Lessons that Olson can Teach us". KULA: Knowledge Creation, Dissemination, and Preservation Studies. 1: 3. doi:10.5334/kula.7. ISSN 2398-4112. Retrieved 2022-01-09.
  • Allen, Liz; O’Connell, Alison; Kiermer, Veronique (2019). "How can we ensure visibility and diversity in research contributions? How the Contributor Role Taxonomy (CRediT) is helping the shift from authorship to contributorship". Learned Publishing. 32 (1): 71–74. doi:10.1002/leap.1210. ISSN 1741-4857. S2CID 67868432. Retrieved 2022-05-14.
  • Smale, Nicholas Andrew; Unsworth, Kathryn; Denyer, Gareth; Magatova, Elise; Barr, Daniel (2020-01-01). "A Review of the History, Advocacy and Efficacy of Data Management Plans". International Journal of Digital Curation. 15 (1): 30. doi:10.2218/ijdc.v15i1.525. ISSN 1746-8256. Retrieved 2022-06-21.
  • Tenopir, Carol; Rice, Natalie M.; Allard, Suzie; Baird, Lynn; Borycz, Josh; Christian, Lisa; Grant, Bruce; Olendorf, Robert; Sandusky, Robert J. (2020-03-11). "Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide". PLOS ONE. 15 (3): –0229003. Bibcode:2020PLoSO..1529003T. doi:10.1371/journal.pone.0229003. ISSN 1932-6203. PMC 7065823. PMID 32160189.
  • van Reisen, Mirjam; Stokmans, Mia; Basajja, Mariam; Ong'ayo, Antony Otieno; Kirkpatrick, Christine; Mons, Barend (2020-01-01). "Towards the Tipping Point for FAIR Implementation". Data Intelligence. 2 (1–2): 264–275. doi:10.1162/dint_a_00049. ISSN 2641-435X. S2CID 207828428.
  • Colavizza, Giovanni; Hrynaszkiewicz, Iain; Staden, Isla; Whitaker, Kirstie; McGillivray, Barbara (2020-04-22). "The citation advantage of linking publications to research data". PLOS ONE. 15 (4): –0230416. arXiv:1907.02565. Bibcode:2020PLoSO..1530416C. doi:10.1371/journal.pone.0230416. ISSN 1932-6203. PMC 7176083. PMID 32320428.
  • Kerber, Wolfgang (2021). "Specifying and Assigning "Bundles of Rights" on Data: An Economic Perspective". SSRN Electronic Journal. doi:10.2139/ssrn.3847620. hdl:10419/234876. ISSN 1556-5068. S2CID 235457824. Retrieved 2022-05-14.
  • Tedersoo, Leho; Küngas, Rainer; Oras, Ester; Köster, Kajar; Eenmaa, Helen; Leijen, Äli; Pedaste, Margus; Raju, Marju; Astapova, Anastasiya; Lukner, Heli; Kogermann, Karin; Sepp, Tuul (2021-07-27). "Data sharing practices and data availability upon request differ across scientific disciplines". Scientific Data. 8 (1): 192. Bibcode:2021NatSD...8..192T. doi:10.1038/s41597-021-00981-0. ISSN 2052-4463. PMC 8381906. PMID 34315906.
  • Fecher, Benedikt; Kahn, Rebecca; Sokolovska, Nataliia; Völker, Teresa; Nebe, Philip (2021-08-01). "Making a Research Infrastructure: Conditions and Strategies to Transform a Service into an Infrastructure". Science and Public Policy. 48 (4): 499–507. doi:10.1093/scipol/scab026. ISSN 0302-3427. Retrieved 2021-12-22.
  • Pujol Priego, Laia; Wareham, Jonathan; Romasanta, Angelo Kenneth S. (2022-02-07). "The puzzle of sharing scientific data". Industry and Innovation. 29 (2): 219–250. doi:10.1080/13662716.2022.2033178. ISSN 1366-2716. S2CID 246795400. Retrieved 2022-06-18.
  • Federer, Lisa M. (2022-08-24). "Long-term availability of data associated with articles in PLOS ONE". PLOS ONE. 17 (8): –0272845. Bibcode:2022PLoSO..1772845F. doi:10.1371/journal.pone.0272845. ISSN 1932-6203. PMC 9401135. PMID 36001577.
  • Gabelica, Mirko; Bojčić, Ružica; Puljak, Livia (2022-10-01). "Many researchers were not compliant with their published data sharing statement: a mixed-methods study". Journal of Clinical Epidemiology. 150: 33–41. doi:10.1016/j.jclinepi.2022.05.019. ISSN 0895-4356. PMID 35654271. S2CID 249213574. Retrieved 2023-09-07.

Books & thesis edit

  • Bourne, Charles P.; Hahn, Trudi Bellardo (2003-08-01). A History of Online Information Services, 1963-1976. MIT Press. ISBN 978-0-262-26175-3.
  • Borgman, Christine L. (2007-10-12). Scholarship in the Digital Age: Information, Infrastructure, and the Internet. Cambridge, MA, USA: MIT Press. ISBN 978-0-262-02619-2.
  • Berners-Lee, Tim; Fischetti, Mark (2008). Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor. Paw Prints. ISBN 978-1-4395-0036-1.
  • Bygrave, Lee A.; Bing, Jon (2009-01-22). Internet Governance: Infrastructure and Institutions. OUP Oxford. ISBN 978-0-19-956113-1.
  • Edwards, Paul N. (2010-03-12). A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming. Infrastructures. Cambridge, MA, USA: MIT Press. ISBN 978-0-262-01392-5.
  • National Research Council (2012). Uhlir, Paul E. (ed.). For Attribution: Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop. Washington, DC: The National Academies Press. ISBN 978-0-309-26728-1. Retrieved 2022-03-22.
  • Gaillard, Rémi (2014). De l'Open data à l'Open research data: quelle(s) politique(s) pour les données de recherche ? (Thesis). ENSSIB.
  • Hogan, A. (2014-04-09). Reasoning Techniques for the Web of Data. IOS Press. ISBN 978-1-61499-383-4.
  • Borgman, Christine L. (2015-01-02). Big Data, Little Data, No Data: Scholarship in the Networked World. Cambridge, MA, USA: MIT Press. ISBN 978-0-262-02856-1.
  • Briney, Kristin (2015-09-01). Data Management for Researchers: Organize, maintain and share your data for research success. Pelagic Publishing Ltd. ISBN 978-1-78427-013-1.
  • Regazzi, John J. (2015-02-12). Scholarly Communications: A History from Content as King to Content as Kingmaker. Rowman & Littlefield. ISBN 978-0-8108-9088-6.
  • Cox, Andrew; Verbaan, Eddy (2018-05-11). Exploring Research Data Management. Facet Publishing. ISBN 978-1-78330-280-2.
  • Davies, Tim; Walker, Stephen B.; Rubinstein, M.; Perini, F. (2019). Davies, Tim; Walker, Stephen B.; Rubinstein, Mor; Perini, Fernando (eds.). The State of Open Data: Histories and Horizons. African Minds. doi:10.5281/zenodo.2668475. S2CID 202295750. Retrieved 2022-09-11.
  • Lipton, Vera (2020-01-22). Open Scientific Data: Why Choosing and Reusing the RIGHT DATA Matters. BoD – Books on Demand. ISBN 978-1-83880-984-3.[unreliable source?]
  • Tibor, Koltay (2021-10-31). Research Data Management and Data Literacies. Chandos Publishing. ISBN 978-0-323-86002-4.

Other sources edit

  • Neylon, Cameron; Bilder, Geoffrey; Lin, Jennifer (2015). "Principles for Open Scholarly Infrastructures". Science in the open. Retrieved 2021-11-01.

External links edit

  • Research Data Canada
  • Open Data In Science article (P Murray-Rust)
  • Open Data about monitoring of deforestation in the Brazilian Amazon Rainforest
  • OpenWetWare
  • LinkedScience.org
  • Collective Mind Repository for computer engineering

open, scientific, data, open, research, data, type, open, data, focused, publishing, observations, results, scientific, activities, available, anyone, analyze, reuse, major, purpose, drive, open, data, allow, verification, scientific, claims, allowing, others,. Open scientific data or open research data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse A major purpose of the drive for open data is to allow the verification of scientific claims by allowing others to look at the reproducibility of results 1 and to allow data from many sources to be integrated to give new knowledge 2 The modern concept of scientific data emerged in the second half of the 20th century with the development of large knowledge infrastructure to compute scientific information and observation The sharing and distribution of data has been early identified as an important stake but was impeded by the technical limitations of the infrastructure and the lack of common standards for data communication The World Wide Web was immediately conceived as a universal protocol for the sharing of scientific data especially coming from high energy physics Contents 1 Definition 1 1 Scientific data 1 2 Open scientific data 2 History 2 1 Development of knowledge infrastructures 1945 1960 2 2 Opening and sharing data early attempts 1960 1990 2 3 Sharing scientific data on the web 1990 1995 2 4 Defining open scientific data 1995 2010 2 5 Policy implementations 2010 3 Diffusion of scientific data 3 1 Publication and edition 3 2 Citation and indexation 3 3 Reuse and economic impact 4 Practices and data culture 4 1 The emergence of an open data culture 4 2 Ideal and implementation the paradox of data sharing 4 3 Incentives and barriers to scientific open data 5 Legal status 5 1 Copyright 5 2 Database rights 5 3 Ownership 5 4 Privacy 5 5 Free licenses 6 Open scientific data management 6 1 Integration to the research workflow 6 2 Storage and preservation 6 3 Plan and governance 6 4 Open science infrastructures 7 See also 8 References 9 Bibliography 9 1 Reports 9 2 Journal articles 9 3 Books amp thesis 9 4 Other sources 10 External linksDefinition editScientific data edit The concept of open scientific data has developed in parallel with the concept of scientific data Scientific data was not formally defined until the late 20th century Before the generalization of computational analysis data has been mostly an informal terms frequently used interchangeably with knowledge or information 3 Institutional and epistemological discourses favored alternative concepts and outlooks on scientific activities Even histories of science and epistemology comments mention data only in passing Other foundational works on the making of meaning in science discuss facts representations inscriptions and publications with little attention to data per se 4 The first influential policy definition of scientific data appeared as late as 1999 when the National Academies of Science described data as facts letters numbers or symbols that describe an object condition situation or other factors 5 Terminologies have continued to evolve in 2011 the National Academies updated the definition to include a large variety of dataified objects such as spectrographic genomic sequencing and electron microscopy data observational data such as remote sensing geospatial and socioeconomic data and other forms of data either generated or compiled by humans or machines as well as digital representation of literature 5 While the forms and shapes of data remain expansive and unsettled standard definitions and policies have recently tended to restrict scientific data to computational or digital data 6 The open data pilot of Horizon 2020 has been voluntarily restricted to digital research Digital research data is information in digital form in particular facts or numbers collected to be examined and used as a basis for reasoning discussion or calculation this includes statistics results of experiments measurements observations resulting from fieldwork survey results interview recordings and images 7 Overall the status scientific data remains a flexible point of discussion among individual researchers communities and policy makers In broader terms whatever data is of interest to researchers should be treated as research data 6 Important policy reports like the 2012 collective synthesis of the National Academies of science on data citation have intentionally adopted a relative and nominalist definition of data we will devote little time to definitional issues e g what are data except to acknowledge that data often exist in the eyes of the beholder 8 For Christine Borgman the main issue is not to define scientific data what are data but to contextualize the point where data became a focal point of discussion within a discipline an institution or a national research program when are data 9 In the 2010s the expansion of available data sources and the sophistication of data analysis method has expanded the range of disciplines primarily affected by data management issues to computational social science digital humanities social media data citizen science research projects and political science 10 Open scientific data edit Opening and sharing have both been major topic of discussion in regard to scientific data management but also a motivation to make data emerge as a relevant issue within an institution a discipline or a policy framework For Paul Edwards whether or not to share the data to what extent it should be shared and to whom have been major causes of data friction that revealed the otherwise hidden infrastructures of science Edwards metaphor of data friction describes what happens at the interfaces between data surfaces the points where data move between people substrates organizations or machines Every movement of data across an interface comes at some cost in time energy and human attention Every interface between groups and organizations as well as between machines represents a point of resistance where data can be garbled misinterpreted or lost In social systems data friction consumes energy and produces turbulence and heat that is conflicts disagreements and inexact unruly processes 11 The opening of scientific data is both a data friction in itself and a way to collectively manage data frictions by weakening complex issues of data ownership Scientific or epistemic cultures have been acknowledged as primary factors in the adoption of open data policies data sharing practices would be expected to be community bound and largely determined by epistemic culture 12 In the 2010s new concepts have been introduced by scientist and policy makers to more accurately define what open scientific data Since its introduction in 2016 FAIR data has become a major focus of open research policies The acronym describe an ideal type of Findable Accessible Interoperable and Reusable data Open scientific data has been categorized as a commons or a public good which is primarily maintained enriched and preserved by collective rather than individual action What makes collective action useful in understanding scientific data sharing is its focus on how the appropriation of individual gains is determined by adjusting the costs and benefits that accrue with contributions to a common resource 13 History editDevelopment of knowledge infrastructures 1945 1960 edit nbsp Punch card storage in US National Weather Records Center in Asheville early 1960s Data holding have expanded so much that the entrance hall has to be used as a storage facility The emergence of scientific data is associated with a semantic shift in the way core scientific concepts like data information and knowledge are commonly understood 14 Following the development of computing technologies data and information are increasingly described as things 15 Like computation data always have a material aspect Data are things They are not just numbers but also numerals with dimensionality weight and texture 16 After the Second World War large scientific projects have increasingly relied on knowledge infrastructure to collect process and analyze important amount of data Punch cards system were first used experimentally on climate data in the 1920s and were applied on a large scale in the following decade In one of the first Depression era government make work projects Civil Works Administration workers punched some 2 million ship log observations for the period 1880 1933 17 By 1960 the meteorological data collections of the US National Weather Records Center has expanded to 400 millions cards and had a global reach The physically of scientific data was by then fully apparent and threatened the stability of entire buildings By 1966 the cards occupied so much space that the Center began to fill its main entrance hall with card storage cabinets figure 5 4 Officials became seriously concerned that the building might collapse under their weight 18 By the end of the 1960s knowledge infrastructure have been embedded in a various set of disciplines and communities The first initiative to create a database of electronic bibliography of open access data was the Educational Resources Information Center ERIC in 1966 In the same year MEDLINE was created a free access online database managed by the National Library of Medicine and the National Institute of Health USA with bibliographical citations from journals in the biomedical area which later would be called PubMed currently with over 14 million complete articles 19 Knowledge infrastructures were also set up in space engineering with NASA RECON library search with OCLC Worldcat or the social sciences The 1960s and 1970s saw the establishment of over a dozen services and professional associations to coordinate quantitative data collection 20 Opening and sharing data early attempts 1960 1990 edit Early discourses and policy frameworks on open scientific data emerged immediately in the wake of the creation of the first large knowledge infrastructure The World Data Center system now the World Data System aimed to make observation data more readily available in preparation for the International Geophysical Year of 1957 1958 21 The International Council of Scientific Unions now the International Council for Science established several World Data Centers to minimize the risk of data loss and to maximize data accessibility further recommending in 1955 that data be made available in machine readable form 22 In 1966 the International Council for Science created CODATA an initiative to promote cooperation in data management and use 23 These early forms of open scientific data did not develop much further There were too many data frictions and technical resistance to the integration of external data to implement a durable ecosystem of data sharing Data infrastructures were mostly invisible to researchers as most of the research was done by professional librarians Not only were the search operating systems complicated to use but the search has to be performed very efficiently given the prohibitive cost of long distance telecommunication 24 While their conceptors have originally anticipated direct uses by researcher that could not really emerge due to technical and economic impediment The designers of the first online systems had presumed that searching would be done by end users that assumption undergirded system design MEDLINE was intended to be used by medical researchers and clinicians NASA RECON was designed for aerospace engineers and scientists For many reasons however most users through the seventies were librarians and trained intermediaries working on behalf of end users In fact some professional searchers worried that even allowing eager end users to get at the terminals was a bad idea 25 Christine Borgman does not recall any significant policy debates over the meaning the production and the circulation of scientific data save for a few specific fields like climatology after 1966 23 The insulated scientific infrastructures could hardly be connected before the advent of the web 26 Projects and communities relied on their own unconnected networks at a national or institutional level the Internet was nearly invisible in Europe because people there were pursuing a separate set of network protocols 27 Communication between scientific infrastructures was not only challenging across space but also across time Whenever a communication protocol was no longer maintained the data and knowledge it disseminated was likely to disappear as well the relationship between historical research and computing has been durably affected by aborted projects data loss and unrecoverable formats 28 Sharing scientific data on the web 1990 1995 edit The World Wide Web was originally conceived as an infrastructure for open scientific data Sharing of data and data documentation was a major focus in the initial communication of the World Wide Web when the project was first unveiled in August 1991 The WWW project was started to allow high energy physicists to share data news and documentation We are very interested in spreading the web to other areas and having gateway servers for other data 29 The project stemmed from a close knowledge infrastructure ENQUIRE It was an information management software commissioned to Tim Berners Lee by the CERN for the specific needs of high energy physics The structure of ENQUIRE was closer to an internal web of data it connected nodes that could refer to a person a software module etc and that could be interlined with various relations such as made include describes and so forth 30 While it facilitated some random linkage between information Enquire was not able to facilitate the collaboration that was desired for in the international high energy physics research community 31 Like any significant computing scientific infrastructure before the 1990s the development of ENQUIRE was ultimately impeded by the lack of interoperability and the complexity of managing network communications although Enquire provided a way to link documents and databases and hypertext provided a common format in which to display them there was still the problem of getting different computers with different operating systems to communicate with each other 27 The web rapidly superseded pre existing closed infrastructure for scientific data even when they included more advanced computing features From 1991 to 1994 users of the Worm Community System a major biology database on worms switched to the Web and Gopher While the Web did not include many advanced functions for data retrieval and collaboration it was easily accessible Conversely the Worm Community System could only be browsed on specific terminals shared across scientific institutions To take on board the custom designed powerful WCS with its convenient interface is to suffer inconvenience at the intersection of work habits computer use and lab resources The World Wide Web on the other hand can be accessed from a broad variety of terminals and connections and Internet computer support is readily available at most academic institutions and through relatively inexpensive commercial services 32 Publication on the web completely changed the economics of data publishing While in print the cost of reproducing large datasets is prohibitive the storage expenses of most datasets is low 33 In this new editorial environment the main limiting factors for data sharing becomes no longer technical or economic but social and cultural Defining open scientific data 1995 2010 edit The development and the generalization of the World Wide Web lifted numerous technical barriers and frictions had constrained the free circulation of data Yet scientific data had yet to be defined and new research policy had to be implemented to realize the original vision laid out by Tim Berners Lee of a web of data At this point scientific data has been largely defined through the process of opening scientific data as the implementation of open policies created new incentives for setting up actionable guidelines principles and terminologies Climate research has been a pioneering field in the conceptual definition of open scientific data as it has been in the construction of the first large knowledge infrastructure in the 1950s and the 1960s In 1995 the GCDIS articulated a clear commitment On the Full and Open Exchange of Scientific Data International programs for global change research and environmental monitoring crucially depend on the principle of full and open data exchange i e data and information are made available without restriction on a non discriminatory basis for no more than the cost of reproduction and distribution 34 The expansion of the scope and the management of knowledge infrastructures also created to incentives to share data as the allocation of data ownership between a large number of individual and institutional stakeholders has become increasingly complex 35 Open data creates a simplified framework to ensure that all contributors and users of the data have access to it 35 Open data has been rapidly identified as a key objective of the emerging open science movement While initially focused on publications and scholarly articles the international initiatives in favor of open access expanded their scope to all the main scientific productions 36 In 2003 the Berlin Declaration supported the diffusion of original scientific research results raw data and metadata source materials and digital representations of pictorial and graphical and scholarly multimedia materials After 2000 international organizations like the OECD Organisation for Economic Co operation and Development have played an instrumental role in devising generic and transdisciplinary definitions of scientific data as open data policies have to be implemented beyond the specific scale of a discipline of a country 5 One of the first influential definition of scientific data was coined in 1999 5 by a report of the National Academies of Science Data are facts numbers letters and symbols that describe an object idea condition situation or other factors 37 In 2004 the Science Ministers of all nations of the OECD signed a declaration which essentially states that all publicly funded archive data should be made publicly available 38 In 2007 the OECD codified the principles for access to research data from public funding 39 through the Principles and Guidelines for Access to Research Data from Public Funding which defined scientific data as factual records numerical scores textual records images and sounds used as primary sources for scientific research and that are commonly accepted in the scientific community as necessary to validate research findings 40 The Principles acted as soft law recommendation and affirmed that access to research data increases the returns from public investment in this area reinforces open scientific inquiry encourages diversity of studies and opinion promotes new areas of work and enables the exploration of topics not envisioned by the initial investigators 41 Policy implementations 2010 edit After 2010 national and supra national institutions took a more interventionist stance New policies have been implemented not only to ensure and incentivize the opening of scientific data usually in continuation to existing open data program In Europe the European Union Commissioner for Research Science and Innovation Carlos Moedas made open research data one of the EU s priorities in 2015 10 First published in 2016 the FAIR Guiding Principles 2 have become an influential framework for opening scientific data 10 The principles have been originally designed two years earlier during a policy ad research workshop at Lorentz Jointly Designing a Data FAIRport 42 During the deliberations of the workshop the notion emerged that through the definition of and widespread support for a minimal set of community agreed guiding principles and practice 43 The principles do not attempt to define scientific data which remains a relatively plastic concept but strive to describe what constitutes good data management 44 They cover four foundational principles that serve to guide data producer Findability Accessibility Interoperability and Reusability 44 and also aim to provide a step toward machine actionability by expliciting the underlying semantics of data 43 As it fully acknowledge the complexity of data management the principles do not claim to introduce a set of rigid recommendations but rather degrees of FAIRness that can be adjusted depending on the organizational costs but also external restrictions in regards to copyright or privacy 45 The FAIR principles have immediately been coopted by major international organization FAIR experienced rapid development gaining recognition from the European Union G7 G20 and US based Big Data to Knowledge BD2K 46 In August 2016 the European Commission set up an expert group to turn FAIR Data into reality 47 As of 2020 the FAIR principles remain the most advanced technical standards for open scientific data to date 48 In 2022 the French Open Science Monitor started to publish an experimental survey of research data publications from text mining tools Retrospective analysis showed that the rate of publications mentioning sharing of their associated has nearly doubled in 10 years from 13 in 2013 to 22 in 2021 49 By the end of the 2010s open data policy are well supported by scientific communities Two large surveys commissioned by the European Commission in 2016 and 2018 find a commonly perceived benefit 74 of researchers say that having access to other data would benefit them 50 Yet more qualitative observations gathered in the same investigation also showed that what scientists proclaim ideally versus what they actually practice reveals a more ambiguous situation 50 Diffusion of scientific data editPublication and edition edit See also Data publishing Until the 2010s the publication of scientific data referred mostly to the release of datasets associated with an individual journal article 51 This release is documented by a Data Accessibility Statement or DAS Several typologies or data accessibility statements have been proposed 52 53 In 2021 Colavizza et al identified three categories or levels of access DAS 1 Data available on request or similar 54 DAS 2 Data available with the paper and its supplementary files 54 DAS 3 Data available in a repository 54 Supplementary data files have appeared in the early phase of the transition to scientific digital publishing While the format of publications have largely kept the constraints of the printing format additional materials could be included in supplementary information 33 As a publication supplementary data files have an ambiguous status In theory they are meant to be raw documents giving access to the background of research In practice the released datasets have often to be specially curated for publication They will usually focus on the primary data sources not on the entire range of observations or measurements done for the purpose of the research Identifying what are the data associated with any individual article conference paper book or other publication is often difficult as investigators collect data continually 55 The selection of the data is also further influenced by the publisher Editorial policy of the journal largely determines goes in the main text what in the supplemental information and editors are especially weary on including large datasets which may be difficult to maintain in the long run 55 Scientific datasets have been increasingly acknowledged as an autonomous scientific publication The assimilation of data to academic articles aimed to increase the prestige and recognition of published datasets implicit in this argument is that familiarity will encourage data release 51 This approach has been favored by several publishers and repositories as it made it possible to easily integrate data in existing publishing infrastructure and to extensively reuse editorial concepts initially created around articles 51 Data papers were explicitly introduced as a mechanism to incentivize data publishing in biodiversity science 56 Citation and indexation edit The first digital databases of the 1950s and the 1960s have immediately raised issues of citability and bibliographic descriptions 57 The mutability of computer memory was especially challenging in contrast with printed publications digital data could not be expected to remain stable on the long run In 1965 Ralph Bisco underlined that this uncertainty affected all the associated documents like code notebooks which may become increasingly out of date Data management have to find a middle ground between continuous enhancements and some form of generic stability the concept of a fluid changeable continually improving data archive means that study cleaning and other processing must be carried to such a point that changes will not significantly affect prior analyses 58 Structured bibliographic metadata for database has been a debated topic since the 1960s 57 In 1977 the American Standard for Bibliographic Reference adopted a definition of data file with a strong focus on the materiability and the mutability of the dataset neither dates nor authors were indicated but the medium or Packaging Method had to be specified 59 Two years later Sue Dodd introduced an alternative convention that brought the citation of data closer to the standard of references of other scientific publications 57 Dodd s recommendation included the use of titles author editions and date as well as alternative mentions for sub documentations like code notebook 60 The indexation of dataset has been radically transformed by the development of the web as barriers to data sharing were substantially reduced 57 In this process data archiving sustainability and persistence have become critical issues Permanent digital object identifiers or DOI have been introduced for scientific articles to avoid broken links as website structures continuously evolved In the early 2000s pilot programs started to allocate DOIs to dataset as well 61 While it solves concrete issues of link sustainability the creation of data DOI and norms of data citation is also part of legitimization process that assimilate dataset to standard scientific publications and can draw from similar sources of motivation like the bibliometric indexes 62 Accessible and findable datasets yield a significant citation advantage A 2021 study of 531 889 articles published by PLOS estimated that there is a 25 36 relative gain in citation counts in general for a journal article with a link to archived data in a public repository 63 Diffusion of data as a supplementary materials does not yield a significant citation advantage which suggest that the citation advantage of DAS Data Availability Statement is not as much related to their mere presence but to their contents 64 As of 2022 the recognition of open scientific data is still an ongoing process The leading reference software Zotero does not have yet a specific item for dataset Reuse and economic impact edit Within academic research storage and redundancy has proven to be a significant benefit of open scientific data In contrast non open scientific data is weakly preserved and can only be retrieved only with considerable effort by the authors if not completely lost 65 Analysis of the uses of open scientific data run into the same issues as for any open content while free universal and indiscriminate access has demonstrably expanded the scope range and intensity of the reception it has also made it harder to track due to the lack of transaction process These issues are further complicated by the novelty of data as a scientific publication In practice it can be difficult to monitor data reuse mainly because researchers rarely cite the repository 66 In 2018 a report of the European Commission estimated the cost of not opening scientific data in accordance with the FAIR principles it amounted at 10 2 billion annually in direct impact and 16 billions in indirect impact over the entire innovation economy 67 Implementing open scientific open data at a global scale would have a considerable impact on the time we spent manipulating data and the way we store data 67 Practices and data culture editThe sharing of scientific data is rooted in scientific cultures or communities of practice As digital tools have become widespread the infrastructures the practices and the common representations of research communities have increasingly relied of shared meanings of what is data and what can be done with it 12 Pre existing epistemic machineries can be more or less predisposed to data sharing Important factors may include shared values individualistic or collective data ownership allocation and frequent collaborations with external actors which may be reluctant to data sharing 68 The emergence of an open data culture edit The development of scientific open data is not limited to scientific research It involves a diverse set of stakeholders Arguments for sharing data come from many quarters funding agencies both public and private policy bodies such as national academies and funding councils journal publishers educators the public at large and from researchers themselves 69 As such the movement for scientific open data largely intersects with more global movements for open data 70 Standards definition of open data used by a wide range of public nd private actors have been partly elaborated by researchers around concrete scientific issues 71 The concept of transparency has especially contributed to create convergences between open science open data and open government In 2015 the OECD describe transparency as a common rationale for open science and open data 72 Christine Borgman has identified four major rationales for sharing data commonly used across the entire regulatory and public debate over scientific open data 69 Research reproducibility lack of reproducibility is frequently attributed to deficiencies in research transparency and data analysis process Consequently as a rationale for sharing research data research reproducibility is powerful yet problematic 73 Reproducibility only applies to certain kinds of research mostly in regards to experimental sciences 73 Public accessibility this rationale that products of public funding should be available to the public is found in arguments for open government 74 While directly inspired by similar arguments made in favor of open access to publications its range is more limited as scientific open data has direct benefits to far fewer people and those benefits vary by stakeholder 75 Research valorization open scientific data may bring a substantial value to the private sector This argument is especially used to support the need for more repositories that can accept and curate research data for better tools and services to exploit data and for other investments in knowledge infrastructure 75 Increased research and innovation open scientific data may significantly enhanced the quality of private and public research This argument aims for investing in knowledge infrastructure to sustain research data curated to high standards of professional practices 75 Yet collaboration between the different actors and stakeholders of the data lifecycle is partial Even within academic institution cooperation remains limited most researchers are making data related search without consulting a data manager or librarian 76 The global open data movement has partly lost its cohesiveness and identity during the 2010s as debates over data availability and licensing have been overcome by domain specific issues When the focus shifts from calling for access to data to creating data infrastructure and putting data to work the divergent goals of those who formed an initial open data movement come clearly into view and managing the tensions that emerge can be complex 77 The very generic scope of open data definition that aims to embrace a very wide set of preexisting data cultures does not well take into account the higher threshold of accessibility and contextualization necessitated by scientific research open data in the sense of being free for reuse is a necessary but not sufficient condition for research purposes 78 Ideal and implementation the paradox of data sharing edit Since the 2000s surveys of scientific communities have underlined a consistent discrepancy between the ideals of data sharing and their implementation in practice When present day researchers are asked whether they are willing to share their data most say yes they are willing to do so When the same researchers are asked if they do release their data they typically acknowledge that they have not done so 79 Open data culture does not emerge in a vacuum and has to content with preexisting culture of scientific data and a range of systemic factors that can discourage data sharing In some fields scholars are actively discouraged from reusing data Careers are made by charting territory that was previously uncharted 80 In 2011 67 of 1329 scientist agree that lack of data sharing is a major impediment to progress in science 81 and yet only about a third 36 of the respondents agree that others can access their data easily 82 In 2016 a survey of researchers in the environment science find overwhelming support easily accessible open data 99 as at least somewhat important and institutional mandates for open data 88 83 Yet even with willingness to share data there are discrepancies with common practices e g willingness to spend time and resources preparing and up loading data 83 A 2022 study of 1792 data sharing statements from BioMed Central found that less 7 of the authors 123 actually provided the data upon requests 84 The prevalence of accessible and findable data is even lower Despite several decades of policy moves toward open access to data the few statistics available reflect low rates of data release or deposit 85 In a 2011 poll for Science only 7 6 of researchers shared their data on community repositories with local websites hosted by universities or laboratories being favored instead 86 Consequently many bemoaned the lack of common metadata and archives as a main impediment to using and storing data 86 According to Borgmann the paradox of data sharing is partly due to the limitation of open data policies which tends to focus on mandating or encouraging investigators to release their data without meeting the expected demand for data or the infrastructure necessary to support release and reuse 87 Incentives and barriers to scientific open data edit In 2022 Pujol Priego Wareham and Romasanta stressed that incentives for the sharing of scientific data were primarily collective and include reproducibility scientific efficiency scientific quality along with more individual retributions such as personal credit 88 Individual benefits include increased visibility open dataset yield a significant citation advantage but only when they have been shared on an open repository 63 Important barriers include the need to publish first legal constraints and concerns about loss of credit of recognition 89 For individual researchers datasets may be major assets to barter for new jobs or new collaborations 33 and their publication may be difficult to justify unless they get something of value in return 33 Lack of familiarity with data sharing rather than a straight rejection of the principles of open science is also ultimately a leading obstacle Several surveys in the early 2010s have shown that researchers rarely seek data from other investigators and they rarely are asked for their own data 80 This creates a negative feedback loop as researchers make little effort to ensure data sharing which in turns discouraged effective use whereas the heaviest demand for reusing data exists in fields with high mutual dependence 80 The reality of data reuse may also be underestimated as data is not considered to be a prestigious data publication and the original sources are not quoted 90 According to a 2021 empirical study of 531 889 articles published by PLOS show that soft incentives and encouragements have a limited impact on data sharing journal policies that encourage rather than require or mandate DAS Data Availability Statement have only a small effect 91 Legal status editThe opening of scientific data has raised a variety of legal issues in regards to ownership rights copyrights privacy and ethics While it is commonly considered that researchers own the data they collect in the course of their research this view is incorrect 92 the creation of dataset involves potentially the rights of numerous additional actors such as institutions research agencies funders public bodies associated data producers personal data on private citizens 92 The legal situation of digital data has been consequently described as a bundle of rights due to the fact that the legal category of property is not a suitable model for dealing with the complexity of data governance problems 93 Copyright edit Copyright has been the primary focus of the legal literature of open scientific data until the 2010s The legality of data sharing was early on identified a crucial issue In contrast with the sharing of scientific publication the main impediment was not copyright but uncertainty the concept of data was a new concept created in the computer age while copyright law emerged at the time of printed publications 94 In theory copyright and author rights provisions do not apply to simple collections of facts and figures In practice the notion of data is much more expansive and could include protected content or creative arrangement of non copyrightable contents The status of data in international conventions on intellectual property is ambiguous According to the Article 2 of the Berne Convention every production in the literary scientific and artistic domain are protected 95 Yet research data is often not an original creation entirely produced by one or several authors but rather a collection of facts typically collated using automated or semiautomated instruments or scientific equipment 95 Consequently there are no universal convention on data copyright and debates over the extent to which copyright applies are still prevalent with different outcomes depending on the jurisdiction or the specifics of the dataset 95 This lack of harmonization stems logically from the novelty of research data as a key concept of scientific research the concept of data is a new concept created in the computer age while copyright law emerged at the time of printed publications 95 In the United States the European Union and several other jurisdictions copyright laws have acknowledged a distinction between data itself which can be an unprotected fact and the compilation of the data which can be a creative arrangement 95 This principle largely predates the contemporary policy debate over scientific data as the earliest court cases ruled in favor of compilation rights go back to the 19th century In the United States compilation rights have been defined in the Copyright Act of 1976 with an explicit mention of datasets a work formed by the collection and assembling of pre existing materials or of data Par 101 96 In its 1991 decision Feist Publications Inc v Rural Telephone Service Co the Supreme Court has clarified the extents and the limitations on database copyrights as the assembling should be demonstrably original and the raw facts contained in the compilation are still unprotected 96 Even in the jurisdiction where the application of the copyright to data outputs remains unsettled and partly theoretical it has nevertheless created significant legal uncertainties The frontier between a set of raw facts and an original compilation is not clearly delineated 97 Although scientific organizations are usually well aware of copyright laws the complexity of data rights create unprecedented challenges 98 After 2010 national and supra national jurisdiction have partly changed their stance in regard to the copyright protection of research data As the sharing is encouraged scientific data has been also acknowledged as an informal public good policymakers funders and academic institutions are working to increase awareness that while the publications and knowledge derived from research data pertain to the authors research data needs to be considered a public good so that its potential social and scientific value can be realised 12 Database rights edit The European Union provides one of the strongest intellectual property framework for data with a double layer of rights copyrights for original compilations similarly to the United States and sui generis database rights 97 Criteria for the originality of compilations have been harmonized across the membership states by the 1996 Database Directive and by several major case laws settled by the European court of justice such as Infopaq International A S v Danske Dagblades Forening c or Football Dataco Ltd et al v Yahoo UK Ltd Overall it has been acknowledged that significant efforts in the making of the dataset are not sufficient to claim compilation rights as the structure has to express his creativity in an original manner 99 The Database Directive has also introduced an original framework of protection for dataset the sui generis rights that are conferred to any dataset that required a substantial investment 100 While they last 15 year sui generis rights have the potential to become permanent as they can be renewed for every update of the dataset Due to their large scope in length and protection sui generis rights have initially not been largely acknowledged by the European jurisprudence which has raised a high bar its enforcement This cautious approach has been reversed in the 2010s as the 2013 decision Innoweb BV v Wegener ICT Media BV and Wegener Mediaventions strengthened the positions of database owners and condemned the reuse of non protected data in web search engines 101 The consolidation and expansion of database rights remain a controversial topic in European regulations as it is partly at odds with the commitment of the European Union in favor of data driven economy and open science 101 While a few exceptions exists for scientific and pedagogic uses they are limited in scope no rights for further reutilization and they have not been activated in all member states 101 Ownership edit Copyright issues with scientific datasets have been further complicated by uncertainties regarding ownership Research is largely a collaborative activity that involves a wide range of contributions Initiatives like CRediT Contributor Roles Taxonomy have identified 14 different roles of which 4 are explicitly related to data management Formal Analysis Investigation Data curation and Visualization 102 In the United States ownership of research data is usually determined by the employer of the researcher with the principal investigator acting as the caretaker of the data rather than the owner 103 Until the development of research open data US institutions have been usually more reluctant to waive copyrights on data than on publications as they are considered strategic assets 104 In the European Union there is no largely agreed framework on the ownership of data 105 The additional rights of external stakeholders has also been raised especially in the context of medical research Since the 1970s patients have claimed some form of ownership of the data produced in the context of clinical trials notably with important controversies concerning whether research subjects and patients actually own their own tissue or DNA 104 Privacy edit Numerous scientific projects rely on data collection of persons notably in medical research and the social sciences In such cases any policy of data sharing has to be necessarily balanced with the preservation and protection of personal data 106 Researchers and most specifically principal investigators have been subjected to obligations of confidentiality in several jurisdictions 106 Health data has been increasingly regulated since the late 20th century either by law or by sectorial agreements In 2014 the European Medicines Agency have introduced important changes to the sharing of clinical trial data in order to prevent the release of all personal details and all commercially relevant information Such evolution of the European regulation are likely to influence the global practice of sharing clinical trial data as open data 107 Research management plans and practices have to be open transparent and confidential by design Free licenses edit Open licenses have been the preferred legal framework to clear the restrictions and ambiguities in the legal definition of scientific data In 2003 the Berlin Declaration called for a universal waiver of reuse rights on scientific contributions that explicitly included raw data and metadata 108 In contrast with the development of open licenses for publications which occurred on short time frame the creation of licenses for open scientific data has been a complicated process Specific rights like the sui generis database rights in the European Union or specific legal principles like the distinction between simple facts and original compilation have not been initially anticipated Until the 2010s free licenses could paradoxically add more restrictions to the reuse of datasets especially in regard with attributions which is not required for non copyrighted objects like raw facts in such cases when no rights are attached to research data then there is no ground for licencing the data 109 To circumvent the issue several institutions like the Harvard MIT Data Center started to share the data in the Public Domain 110 This approach ensures that no right is applied on non copyrighted items Yet the public domain and some associated tools like the Public Domain Mark are not a properly defined legal contract and varies significantly from one jurisdiction to another 110 First introduced in 2009 the Creative Commons Zero or CC0 license has been immediately contemplated for data licensing 111 It has since become the recommended tool for releasing research data into the public domain 112 In accordance with the principles of the Berlin Declaration it is not a license but a waiver as the producer of the data overtly fully permanently irrevocably and unconditionally waives abandons and surrenders all of Affirmer s Copyright and Related Rights Alternative approaches have included the design of new free license to disentangle the attribution stacking specific to database rights In 2009 the Open Knowledge Foundation published the Open Database License which has been adopted by major online projects like OpenStreetMap Since 2015 all the different Creative Commons licenses have been updated to become fully effective on dataset as database rights have been explicitly anticipated in the 4 0 version 109 Open scientific data management editData management has recently become a primary focus of the policy and research debate on open scientific data The influential FAIR principles are voluntarily centered on the key features of good data management in a scientific context 44 In a research context data management is frequently associated to data lifecycles Various models of lifecycles in different stage have been theorized by institutions infrastructures and scientific communities although such lifecycles are a simplification of real life which is far less linear and more iterative in practice 113 Integration to the research workflow edit In contrast with the broad incitations for data sharing included in the early policies in favor of open scientific data the complexity and the underlying costs and requirements of scientific data management are increasingly acknowledged Data sharing is difficult to do and to justify by the return on investment 114 Open data is not simply a supplementary task but has to envisioned throughout the entire research process as it requires changes in methods and practices of research 114 The opening of research data creates a new settlement of costs and benefits Public data sharing introduces a new communication setting that largely contrasts with private exchange of data with research collaborators or partners The collection the purpose and the limitation of data has to be explicited as it is not possible to rely on pre existing informal knowledge the documentation and representations are the only means of communicating between data creator and user 115 Lack of proper documentation means that the burden of recontextualization fall on the potential users and may render the dataset ultimately useless 116 Publication requires additionally further verification in regards to the ownership of the data and the potential legal liability if the data is potentially misused This clarification phase becomes even more complex in international research projects that may overlap several jurisdictions 117 Data sharing and the application of open science principles also bring significant long term advantages that may not be immediately visible Documentation of dataset helps to clarify their chain of provenance and ensure that the original data has not been significantly altered or if this is the case that all the further treaments are fully documented 118 Publication under a free license also makes it possible to delegate some tasks such as long term preservation to external actors By the end of the 2010s a new specialized literature on data management for research has emerged to codify the existing practices and regulatory principles 119 120 121 Storage and preservation edit The availability of non open scientific data decays rapidly in 2014 a retrospective study of biological datasets showed that the odds of a data set being reported as extant fell by 17 per year 122 Consequently the proportion of data sets that still existed dropped from 100 in 2011 to 33 in 1991 65 Data loss has also been singled out as a significant issue in major journals like Nature or Science 123 Surveys of research practices have consistently shown that storage norms infrastructures and workflow remain insastifying in most disciplines Storage and preservation of scientific data have been early on identified as critical issues especially in relation to observational data which are considered essential to preserve because they are the most difficult to replicate 35 A 2017 2018 survey of 1372 researchers contacted through the American Geophysical Union shows that only a quarter and a fifth of the respondents report good data storage practices 124 Short term and unsustainable storage remains widespread with 61 of the respondents storing most or all of their data on personal computers 124 Due to their ease of use at an individual scale unsustainable storage solution are viewed favorably in most disciplines This mismatch between good practices and satisfaction may show that data storage is less important to them than data collection and analysis 124 First published in 2012 the reference model of Open Archival Information System state that scientific infrastructure should seek for long term preservation that is long enough to be concerned with the impacts of changing technologies including support for new media and data formats or with a changing user community 125 Consequently good practices of data management imply both on storage to materially preserve the data and even more crucially on curation to preserve knowledge about the data to facilitate reuse 126 Data sharing on public repository has contributed to mitigate preservation risks due to the long term commitment of data infrastructures and the potential redundancy of open data A 2021 study of 50 000 data availability statement published in PLOS One showed that 80 of the dataset could be retrieved automatically and 98 of dataset with a data DOI could be retrieved either automatically or manually Moreover accessibility did not decay significantly for older publications URLs and DOIs make the data and code associated with papers more likely to be available over time 127 Significant benefits have not been found when the open data was not properly linked or documented Simply requiring that data be shared in some form may not have the desired impact of making scientific data FAIR as studies have repeatedly demonstrated that many datasets that are ostensibly shared may not actually be accessible 128 Plan and governance edit See also Data Management Plan Research data management can be laid out in a data management plan or DMP Data management plans were incepted in 1966 for the specific needs of aeronautic and engineering research which already faced increasingly complex data frictions 129 These first examples were focused on material issues associated with the access transfert and storage of the data Until the early 2000s DMPs were utilised in this manner in limited fields for projects of great technical complexity and for limited mid study data collection and processing purposes 130 After 2000 the implementation of large research infrastructure and the development of open science have changed the scope and the purpose of data management plans Policy makers rather than scientists have been instrumental in this development The first publications to provide general advice and guidance to researchers around the creation of DMPs were published from 2009 following the publications from JISC and the OECD DMP use we infer has been imposed onto the research community through external forces 131 Empirical studies of data practices in research have highlighted the need for organizations to offer more formal training and assistance in data management to scientists 132 In a 2017 2018 international survey of 1372 scientist most requests for help and formalization were associated with data management plan creating data management plans 33 3 training on best practices in data management 31 3 assistance on creating metadata to describe data or datasets 27 6 132 The expansion of data collection and data analysis processes have increasingly strained a large range of unformal and non codified data practices The implication of external shareholders in research projects create significant potential tensions with the principles of sharing open data Contributions from commercial actors can especially rely on some form of exclusivity and appropriation of the final research results In 2022 Pujol Priego Wareham and Romasanta created several accommodation strategies to overcome these issues such as data modularity with sharing limited to some part of the data and time delay with year long embargoes before the final release of the data 133 Open science infrastructures edit See also Open science infrastructure The Unesco recommendation of Open Science approved in November 2021 define open science infrastructures as shared research infrastructures that are needed to support open science and serve the needs of different communities 134 Open science infrastructures have been recognized has major factor in the implementation and the development of data sharing policies 135 Leading forms of infrastructures for open scientific data include data repositories data analysis platform indexes digitized library or digitized archives 136 137 Infrastructures ensure that the costs of publishing maintaining and indexing datasets is not entirely supported by individual researchers and institutions They are additionally key stakeholders in the definition and adoption of open data standards especially in regards to licensing or documentation By the end of the 1990s the creation of public scientific computing infrastructure became a major policy issue 138 The lack of infrastructure to support release and reuse was acknowledged in some of the earliest policy reports on data sharing 135 The first wave of web based scientific projects in the 1990s and the early 2000s revealed critical issues of sustainability As funding was allocated on a specific time period critical databases online tools or publishing platforms could hardly be maintained 28 and project managers were faced with a valley of death between grant funding and ongoing operational funding 139 After 2010 the consolidation and expansion of commercial scientific infrastructure such as the acquisition of the open repositories Digital Commons and SSRN by Elsevie had further entailed calls to secure community controlled infrastructure 140 In 2015 Cameron Neylon Geoffrey Bilder and Jenifer Lin defined an influential series of Principles for Open Scholarly Infrastructure 141 that has been endorsed by leading infrastructures such as Crossref 142 OpenCitations 143 or Data Dryad 144 By 2021 public services and infrastructures for research have largely endorsed open science as an integral part of their activity and identity open science is the dominant discourse to which new online services for research refer 145 According to the 2021 Roadmap of the European Strategy Forum on Research Infrastructures ESFRI major legacy infrastructures in Europe have embraced open science principles Most of the Research Infrastructures on the ESFRI Roadmap are at the forefront of Open Science movement and make important contributions to the digital transformation by transforming the whole research process according to the Open Science paradigm 146 Open science infrastructure represents a higher level of commitment on data sharing They rely on significant and recurrent investments to ensure that data is effectively maintained and documented and add value to data through metadata provenance classification standards for data structures and migration 147 Furthermore infrastructures need to be integrated to the norms and expected uses of the scientific communities they mean to serve The most successful become reference collections that attract longer term funding and can set standards for their communities 137 Maintaining open standards is one of the main challenge identified by leading European open infrastructures as it implies choosing among competing standards in some case as well as ensuring that the standards are correctly updated and accessible through APIs or other endpoints 148 The conceptual definition of open science infrastructures has been largely influenced by the analysis of Elinor Ostrom on the commons and more specifically on the knowledge commons In accordance with Ostrom Cameron Neylon understates that open infrastructures are not only characterized by the management of a pool of common resources but also by the elaboration of common governance and norms 149 The diffusion of open scientific data also raise stringent issues of governance In regards to the determination of the ownership of the data the adoption of free license and the enforcement of regulations in regard to privacy continual negotiation is necessary and involve a wide range of stakeholders 150 Beyond their integration in specific scientific communities open science infrastructure have strong ties with the open source and the open data movements 82 of the European infrastructures surveyed by SPARC claim to have partially built open source software and 53 have their entire technological infrastructure in open source 151 Open science infrastructures preferably integrate standards from other open science infrastructures Among European infrastructures The most commonly cited systems and thus essential infrastructure for many are ORCID Crossref DOAJ BASE OpenAIRE Altmetric and Datacite most of which are not for profit 152 Open science infrastructure are then part of an emerging truly interoperable Open Science commons that hold the premise of researcher centric low cost innovative and interoperable tools for research superior to the present largely closed system 153 See also editCODATA Data archive Data publishing Dataverse Journal Article Tag Suite JATS Open science Science Commons Open StandardReferences edit Spiegelhalter D Open data and trust in the literature The Scholarly Kitchen Retrieved 7 September 2018 a b Wilkinson et al 2016 Lipton 2020 p 19 Borgman 2015 p 18 a b c d Lipton 2020 p 59 a b Lipton 2020 p 61 ARTICLE 29 DISSEMINATION OF RESULTS OPEN ACCESS VISIBILITY OF EU FUNDING Archived 2022 09 13 at the Wayback Machine Draft of the H2020 Model Grant Agreement National Academies 2012 p 1 Borgman 2015 pp 4 5 a b c Pujol Priego Wareham amp Romasanta 2022 p 220 Edwards et al 2011 p 669 a b c Pujol Priego Wareham amp Romasanta 2022 p 224 Pujol Priego Wareham amp Romasanta 2022 p 225 Rosenberg 2018 pp 557 558 Buckland 1991 Edwards 2010 p 84 Edwards 2010 p 99 Edwards 2010 p 102 Machado Jorge Open data and open science In Albagli Maciel Abdo Open Science Open Questions 2015 dead link Shankar Eschenfelder amp Downey 2016 p 63 Committee on Scientific Accomplishments of Earth Observations from Space National Research Council 2008 Earth Observations from Space The First 50 Years of Scientific Achievements The National Academies Press p 6 ISBN 978 0 309 11095 2 Retrieved 2010 11 24 World Data Center System 2009 09 18 About the World Data Center System NOAA National Geophysical Data Center Retrieved 2010 11 24 a b Borgman 2015 p 7 Regazzi 2015 p 128 Bourne amp Hahn 2003 p 397 Campbell Kelly amp Garcia Swartz 2013 a b Berners Lee amp Fischetti 2008 p 17 a b Dacos 2013 Tim Berners Lee Qualifiers on Hypertext Links mail sent on August 6 1991 to the alt hypertext Hogan 2014 p 20 Bygrave amp Bing 2009 p 30 Star amp Ruhleder 1996 p 131 a b c d Borgman 2015 p 217 National Research Council 1995 On the Full and Open Exchange of Scientific Data Washington DC The National Academies Press doi 10 17226 18769 ISBN 978 0 309 30427 6 a b c Pujol Priego Wareham amp Romasanta 2022 p 223 Lipton 2020 p 16 National Research Council 1999 p 16 OECD Declaration on Open Access to publicly funded data Archived 20 April 2010 at the Wayback Machine Lipton 2020 p 17 OECD 2007 p 13 OECD 2007 p 4 Wilkinson et al 2016 p 8 a b Wilkinson et al 2016 p 3 a b c Wilkinson et al 2016 p 1 Wilkinson et al 2016 p 4 van Reisen et al 2020 Horizon 2020 Commission expert group on Turning FAIR data into reality E03464 Lipton 2020 p 66 The French Open Science Monitor last updated on December 1st 2022 a b Pujol Priego Wareham amp Romasanta 2022 p 241 a b c Borgman 2015 p 48 Federer et al 2018 Colavizza et al 2020 a b c Colavizza et al 2020 p 5 a b Borgman 2015 p 216 Chavan amp Penev 2011 a b c d Crosas 2014 p 63 Bisco 1965 p 148 Dodd 1979 p 78 Dodd 1979 Brase 2004 Borgman 2015 p 47 a b Colavizza et al 2020 p 12 Colavizza et al 2020 p 10 a b Vines et al 2014 p 96 Lipton 2020 p 65 a b European Commission 2018 p 31 Pujol Priego Wareham amp Romasanta 2022 p 224 225 a b Borgman 2015 p 208 Davies et al 2019 p 1 Borgman 2015 p 44 Lyon Jeng amp Mattern 2017 p 47 a b Borgman 2015 p 209 Borgman 2015 p 211 a b c Borgman 2015 p 212 Tenopir et al 2020 p 12 Davies et al 2019 p 6 Borgman 2015 p 283 Borgman 2015 p 205 a b c Borgman 2015 p 213 Tenopir et al 2011 p 7 Tenopir et al 2011 p 9 a b Schmidt Gemeinholzer amp Treloar 2016 Gabelica Bojcic amp Puljak 2022 Borgman 2015 p 206 a b Science 2011 Borgman 2015 p 207 Pujol Priego Wareham amp Romasanta 2022 p 226 Tenopir et al 2020 p 5 Borgman 2015 p 223 Colavizza et al 2020 p 13 a b Lipton 2020 p 127 Kerber 2021 p 1 Lipton 2020 p 119 a b c d e Lipton 2020 p 119 a b Lipton 2020 p 122 a b Lipton 2020 p 123 Lipton 2020 p 126 Article 6 Directive 2006 116 EC Lipton 2020 p 124 a b c Lipton 2020 p 125 Allen O Connell amp Kiermer 2019 p 73 Lipton 2020 p 129 a b Lipton 2020 p 130 Lipton 2020 p 131 a b Lipton 2020 p 138 Lipton 2020 p 139 Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities a b Lipton 2020 p 133 a b Lipton 2020 p 134 Schofield et al 2009 Lipton 2020 p 132 Cox amp Verbaan 2018 p 26 27 a b Borgman 2015 p 214 Borgman 2015 p 220 Borgman 2015 p 222 Borgman 2015 p 218 Borgman 2015 p 221 Briney 2015 Cox amp Verbaan 2018 Tibor 2021 Vines et al 2014 Tedersoo et al 2021 a b c Tenopir et al 2020 p 11 CCSDS 2012 p 1 Lipton 2020 p 73 Federer 2022 p 9 Federer 2022 p 11 Smale et al 2020 p 3 Smale et al 2020 p 4 Smale et al 2020 p 9 a b Tenopir et al 2020 p 13 Pujol Priego Wareham amp Romasanta 2022 p 239 240 UNESCO Recommendation on Open Science 2021 CL 4363 a b Borgman 2015 p 224 Ficarra et al 2020 p 16 a b Borgman 2015 p 225 Borgman 2007 p 21 Skinner 2019 p 6 Joseph 2018 p 1 Neylon et al 2015 Crossref s Board votes to adopt the Principles of Open Scholarly Infrastructure OpenCitations compliance with the Principles of Open Scholarly Infrastructure Dryad s Commitment to the Principles of Open Scholarly Infrastructure Fecher et al 2021 p 505 ESFRI Roadmap 2021 p 159 Borgman 2015 p 226 Ficarra et al 2020 p 23 Neylon 2017 p 7 Borgman 2015 p 229 Ficarra et al 2020 p 29 Ficarra et al 2020 p 50 Ross Hellauer et al 2020 p 13 Bibliography editReports edit National Research Council 1999 A Question of Balance Private Rights and the Public Interest in Scientific and Technical Databases Report National Academies Press Retrieved 2022 05 18 OECD 2007 OECD Principles and Guidelines for Access to Research Data from Public Funding Report Paris Organisation for Economic Co operation and Development Retrieved 2022 05 18 CCSDS 2012 Reference Model for an Open Archival Information System OAIS Report p 135 European Commission 2018 Cost benefit analysis for FAIR research data cost of not having FAIR research data Report LU Office des publications de l Union europeenne doi 10 2777 02999 Retrieved 2022 06 18 Astell Mathias Hrynaszkiewicz Iain Allin Katie Penny Dan Mithu Lucraft Baynes Grace Springer Nature Admin 2018 Practical challenges for researchers in data sharing Springer Nature survey data anonymised Report Springer Nature Retrieved 2022 09 11 Skinner Katherine 2019 Mapping the Scholarly Communication Landscape 2019 Census Report Educopia Institute S2CID 201314019 European Commission 2019 Horizon 2020 Annotated Model Grant A greements Report European Commission Ficarra Victoria Fosci Mattia Chiarelli Andrea Kramer Bianca Proudman Vanessa 2020 10 30 Scoping the Open Science Infrastructure Landscape in Europe Report Retrieved 2021 10 31 ESFRI 2021 ESFRI Roadmap PDF Report ESFRI Ross Hellauer Tony Fecher Benedikt Shearer Kathleen Rodrigues Eloy 2019 09 03 Pubfair a framework for sustainable distributed open science publishing services Report Retrieved 2021 12 12 Journal articles edit Bisco Ralph L 1965 09 01 Social Science Data Archives Technical Considerations Social Science Information 4 3 129 150 doi 10 1177 053901846500400311 ISSN 0539 0184 S2CID 144164959 Dodd Sue A 1979 Bibliographic references for numeric social science data files Suggested guidelines Journal of the American Society for Information Science 30 2 77 82 doi 10 1002 asi 4630300203 ISSN 1097 4571 Retrieved 2022 05 15 Buckland Michael K 1991 Information as thing Journal of the American Society for Information Science 42 5 351 360 doi 10 1002 SICI 1097 4571 199106 42 5 lt 351 AID ASI5 gt 3 0 CO 2 3 ISSN 1097 4571 Retrieved 2022 03 22 Star Susan Leigh Ruhleder Karen 1996 03 01 Steps Toward an Ecology of Infrastructure Design and Access for Large Information Spaces Information Systems Research 7 1 111 134 doi 10 1287 isre 7 1 111 ISSN 1047 7047 S2CID 10520480 Retrieved 2021 12 22 Brase Jan 2004 Using Digital Library Techniques Registration of Scientific Primary Data In Heery Rachel Lyon Liz eds Research and Advanced Technology for Digital Libraries Lecture Notes in Computer Science Berlin Heidelberg Springer pp 488 494 doi 10 1007 978 3 540 30230 8 44 ISBN 978 3 540 30230 8 Barateiro Jose Antunes Goncalo Cabral Manuel Borbinha Jose Rodrigues Rodrigo 2008 Digital Preservation of Scientific Data In Christensen Dalsgaard Birte Castelli Donatella Bolette Ammitzboll Jurik Lippincott Joan eds Research and Advanced Technology for Digital Libraries Lecture Notes in Computer Science Vol 5173 Berlin Heidelberg Springer Berlin Heidelberg pp 388 391 doi 10 1007 978 3 540 87599 4 41 ISBN 978 3 540 87598 7 Retrieved 2022 06 21 Schofield Paul N Bubela Tania Weaver Thomas Portilla Lili Brown Stephen D Hancock John M Einhorn David Tocchini Valentini Glauco Hrabe de Angelis Martin Rosenthal Nadia 2009 09 10 Post publication sharing of data and tools Nature 461 7261 171 173 Bibcode 2009Natur 461 171 doi 10 1038 461171a ISSN 0028 0836 PMC 6711854 PMID 19741686 Korsmo F L 2010 The Origins and Principles of the World Data Center System Data Science Journal 8 55 IGY65 doi 10 2481 dsj SS IGY 011 Edwards Paul N Mayernik Matthew S Batcheller Archer L Bowker Geoffrey C Borgman Christine L 2011 10 01 Science friction Data metadata and collaboration Social Studies of Science 41 5 667 690 doi 10 1177 0306312711413314 ISSN 0306 3127 PMID 22164720 S2CID 33973392 Science Staff 2011 02 11 Challenges and Opportunities Science 331 6018 692 693 Bibcode 2011Sci 331 692 doi 10 1126 science 331 6018 692 PMID 21311002 S2CID 109422723 Tenopir Carol Allard Suzie Douglass Kimberly Aydinoglu Arsev Umur Wu Lei Read Eleanor Manoff Maribeth Frame Mike 2011 Data Sharing by Scientists Practices and Perceptions PLOS ONE 6 6 21101 Bibcode 2011PLoSO 621101T doi 10 1371 journal pone 0021101 ISSN 1932 6203 PMC 3126798 PMID 21738610 Chavan Vishwas Penev Lyubomir 2011 12 15 The data paper a mechanism to incentivize data publishing in biodiversity science BMC Bioinformatics 12 Suppl 15 2 doi 10 1186 1471 2105 12 S15 S2 ISSN 1471 2105 PMC 3287445 PMID 22373175 Campbell Kelly Martin Garcia Swartz Daniel D 2013 The History of the Internet The Missing Narratives Journal of Information Technology 28 1 18 33 doi 10 1057 jit 2013 4 ISSN 0268 3962 S2CID 41013 Retrieved 2022 01 04 Dacos Marin 2013 Cyberclio vers une cyberinfrastructure au cœur de la discipline historique In Frederic Clavert Serge Noiret ed L histoire contemporaine a l ere contemporain Peter Lang ed Berne pp 29 41 a href Template Cite book html title Template Cite book cite book a CS1 maint location missing publisher link Wallis Jillian C Rolando Elizabeth Borgman Christine L 2013 If We Share Data Will Anyone Use Them Data Sharing and Reuse in the Long Tail of Science and Technology PLOS ONE 8 7 67332 Bibcode 2013PLoSO 867332W doi 10 1371 journal pone 0067332 ISSN 1932 6203 PMC 3720779 PMID 23935830 Vines Timothy H Albert Arianne Y K Andrew Rose L Debarre Florence Bock Dan G Franklin Michelle T Gilbert Kimberly J Moore Jean Sebastien Renaut Sebastien Rennison Diana J 2014 01 06 The Availability of Research Data Declines Rapidly with Article Age Current Biology 24 1 94 97 doi 10 1016 j cub 2013 11 014 ISSN 0960 9822 PMID 24361065 S2CID 7799662 Retrieved 2022 09 11 Crosas Merce 2014 05 26 The Evolution of Data Citation From Principles to Implementation IASSIST Quarterly 37 1 4 62 doi 10 29173 iq504 ISSN 0739 1137 Retrieved 2022 05 15 Tenopir Carol Dalton Elizabeth D Allard Suzie Frame Mike Pjesivac Ivanka Birch Ben Pollock Danielle Dorsett Kristina 2015 Changes in Data Sharing and Data Reuse Practices and Perceptions among Scientists Worldwide PLOS ONE 10 8 0134826 Bibcode 2015PLoSO 1034826T doi 10 1371 journal pone 0134826 ISSN 1932 6203 PMC 4550246 PMID 26308551 Shankar Kalpana Eschenfelder Kristin R Downey Greg 2016 05 13 Studying the History of Social Science Data Archives as Knowledge Infrastructure Science amp Technology Studies 29 2 62 73 doi 10 23987 sts 55691 ISSN 2243 4690 Retrieved 2021 12 23 Neylon Cameron Chan Leslie 2016 04 18 Exploring the opportunities and challenges of implementing open research strategies within development institutions Research Ideas and Outcomes 2 8880 doi 10 3897 rio 2 e8880 ISSN 2367 7163 Retrieved 2021 11 01 Schmidt Birgit Gemeinholzer Birgit Treloar Andrew 2016 01 15 Open Data in Global Environmental Research The Belmont Forum s Open Data Survey PLOS ONE 11 1 0146695 Bibcode 2016PLoSO 1146695S doi 10 1371 journal pone 0146695 ISSN 1932 6203 PMC 4714918 PMID 26771577 Wilkinson Mark D Dumontier Michel Aalbersberg IJsbrand Jan Appleton Gabrielle Axton Myles Baak Arie Blomberg Niklas Boiten Jan Willem Santos Luiz Bonino da Silva Bourne Philip E Bouwman Jildau Brookes Anthony J Clark Tim Crosas Merce Dillo Ingrid Dumon Olivier Edmunds Scott Evelo Chris T Finkers Richard Gonzalez Beltran Alejandra Gray Alasdair J G Groth Paul Goble Carole Grethe Jeffrey S Heringa Jaap Hoen Peter A C t Hooft Rob Kuhn Tobias Kok Ruben Kok Joost Lusher Scott J Martone Maryann E Mons Albert Packer Abel L Persson Bengt Rocca Serra Philippe Roos Marco Schaik Rene van Sansone Susanna Assunta Schultes Erik Sengstag Thierry Slater Ted Strawn George Swertz Morris A Thompson Mark Lei Johan van der Mulligen Erik van Velterop Jan Waagmeester Andra Wittenburg Peter Wolstencroft Katherine Zhao Jun Mons Barend 2016 The FAIR Guiding Principles for scientific data management and stewardship Scientific Data 3 160018 Bibcode 2016NatSD 360018W doi 10 1038 sdata 2016 18 PMC 4792175 PMID 26978244 Lyon Liz Jeng Wei Mattern Eleanor 2017 09 16 Research Transparency A Preliminary Study of Disciplinary Conceptualisation Drivers Tools and Support Services International Journal of Digital Curation 12 1 46 64 doi 10 2218 ijdc v12i1 530 ISSN 1746 8256 Retrieved 2022 06 10 Witkowski Tomasz 2017 A Scientist Pushes Psychology Journals toward Open Data Skeptical Inquirer 41 4 6 7 Archived from the original on 2018 09 15 although some scientists now agree that doing so could help prevent future retractions of scientific manuscripts Besancon Lonni Peiffer Smadja Nathan Segalas Corentin Jiang Haiting Masuzzo Paola Smout Cooper Billy Eric Deforet Maxime Leyrat Clemence 2020 Open Science Saves Lives Lessons from the COVID 19 Pandemic BMC Medical Research Methodology 21 1 117 doi 10 1186 s12874 021 01304 y PMC 8179078 PMID 34090351 Rosenberg Daniel 2018 11 01 Data as Word Historical Studies in the Natural Sciences 48 5 557 567 doi 10 1525 hsns 2018 48 5 557 hdl 21 11116 0000 0002 C567 C ISSN 1939 1811 S2CID 149765492 Retrieved 2022 03 21 Joseph Heather 2018 09 05 Securing community controlled infrastructure SPARC s plan of action College amp Research Libraries News 79 8 426 doi 10 5860 crln 79 8 426 S2CID 116057034 Federer Lisa M Belter Christopher W Joubert Douglas J Livinski Alicia Lu Ya Ling Snyders Lissa N Thompson Holly 2018 05 02 Data sharing in PLOS ONE An analysis of Data Availability Statements PLOS ONE 13 5 0194768 Bibcode 2018PLoSO 1394768F doi 10 1371 journal pone 0194768 ISSN 1932 6203 PMC 5931451 PMID 29719004 Ross Hellauer Tony Schmidt Birgit Kramer Bianca 2018 Are funder Open Access platforms a good idea SAGE Open 8 4 2158244018816717 doi 10 1177 2158244018816717 S2CID 220987901 Neylon Cameron 2017 12 27 Sustaining Scholarly Infrastructures through Collective Action The Lessons that Olson can Teach us KULA Knowledge Creation Dissemination and Preservation Studies 1 3 doi 10 5334 kula 7 ISSN 2398 4112 Retrieved 2022 01 09 Allen Liz O Connell Alison Kiermer Veronique 2019 How can we ensure visibility and diversity in research contributions How the Contributor Role Taxonomy CRediT is helping the shift from authorship to contributorship Learned Publishing 32 1 71 74 doi 10 1002 leap 1210 ISSN 1741 4857 S2CID 67868432 Retrieved 2022 05 14 Smale Nicholas Andrew Unsworth Kathryn Denyer Gareth Magatova Elise Barr Daniel 2020 01 01 A Review of the History Advocacy and Efficacy of Data Management Plans International Journal of Digital Curation 15 1 30 doi 10 2218 ijdc v15i1 525 ISSN 1746 8256 Retrieved 2022 06 21 Tenopir Carol Rice Natalie M Allard Suzie Baird Lynn Borycz Josh Christian Lisa Grant Bruce Olendorf Robert Sandusky Robert J 2020 03 11 Data sharing management use and reuse Practices and perceptions of scientists worldwide PLOS ONE 15 3 0229003 Bibcode 2020PLoSO 1529003T doi 10 1371 journal pone 0229003 ISSN 1932 6203 PMC 7065823 PMID 32160189 van Reisen Mirjam Stokmans Mia Basajja Mariam Ong ayo Antony Otieno Kirkpatrick Christine Mons Barend 2020 01 01 Towards the Tipping Point for FAIR Implementation Data Intelligence 2 1 2 264 275 doi 10 1162 dint a 00049 ISSN 2641 435X S2CID 207828428 Colavizza Giovanni Hrynaszkiewicz Iain Staden Isla Whitaker Kirstie McGillivray Barbara 2020 04 22 The citation advantage of linking publications to research data PLOS ONE 15 4 0230416 arXiv 1907 02565 Bibcode 2020PLoSO 1530416C doi 10 1371 journal pone 0230416 ISSN 1932 6203 PMC 7176083 PMID 32320428 Kerber Wolfgang 2021 Specifying and Assigning Bundles of Rights on Data An Economic Perspective SSRN Electronic Journal doi 10 2139 ssrn 3847620 hdl 10419 234876 ISSN 1556 5068 S2CID 235457824 Retrieved 2022 05 14 Tedersoo Leho Kungas Rainer Oras Ester Koster Kajar Eenmaa Helen Leijen Ali Pedaste Margus Raju Marju Astapova Anastasiya Lukner Heli Kogermann Karin Sepp Tuul 2021 07 27 Data sharing practices and data availability upon request differ across scientific disciplines Scientific Data 8 1 192 Bibcode 2021NatSD 8 192T doi 10 1038 s41597 021 00981 0 ISSN 2052 4463 PMC 8381906 PMID 34315906 Fecher Benedikt Kahn Rebecca Sokolovska Nataliia Volker Teresa Nebe Philip 2021 08 01 Making a Research Infrastructure Conditions and Strategies to Transform a Service into an Infrastructure Science and Public Policy 48 4 499 507 doi 10 1093 scipol scab026 ISSN 0302 3427 Retrieved 2021 12 22 Pujol Priego Laia Wareham Jonathan Romasanta Angelo Kenneth S 2022 02 07 The puzzle of sharing scientific data Industry and Innovation 29 2 219 250 doi 10 1080 13662716 2022 2033178 ISSN 1366 2716 S2CID 246795400 Retrieved 2022 06 18 Federer Lisa M 2022 08 24 Long term availability of data associated with articles in PLOS ONE PLOS ONE 17 8 0272845 Bibcode 2022PLoSO 1772845F doi 10 1371 journal pone 0272845 ISSN 1932 6203 PMC 9401135 PMID 36001577 Gabelica Mirko Bojcic Ruzica Puljak Livia 2022 10 01 Many researchers were not compliant with their published data sharing statement a mixed methods study Journal of Clinical Epidemiology 150 33 41 doi 10 1016 j jclinepi 2022 05 019 ISSN 0895 4356 PMID 35654271 S2CID 249213574 Retrieved 2023 09 07 Books amp thesis edit Bourne Charles P Hahn Trudi Bellardo 2003 08 01 A History of Online Information Services 1963 1976 MIT Press ISBN 978 0 262 26175 3 Borgman Christine L 2007 10 12 Scholarship in the Digital Age Information Infrastructure and the Internet Cambridge MA USA MIT Press ISBN 978 0 262 02619 2 Berners Lee Tim Fischetti Mark 2008 Weaving the Web The Original Design and Ultimate Destiny of the World Wide Web by Its Inventor Paw Prints ISBN 978 1 4395 0036 1 Bygrave Lee A Bing Jon 2009 01 22 Internet Governance Infrastructure and Institutions OUP Oxford ISBN 978 0 19 956113 1 Edwards Paul N 2010 03 12 A Vast Machine Computer Models Climate Data and the Politics of Global Warming Infrastructures Cambridge MA USA MIT Press ISBN 978 0 262 01392 5 National Research Council 2012 Uhlir Paul E ed For Attribution Developing Data Attribution and Citation Practices and Standards Summary of an International Workshop Washington DC The National Academies Press ISBN 978 0 309 26728 1 Retrieved 2022 03 22 Gaillard Remi 2014 De l Open data a l Open research data quelle s politique s pour les donnees de recherche Thesis ENSSIB Hogan A 2014 04 09 Reasoning Techniques for the Web of Data IOS Press ISBN 978 1 61499 383 4 Borgman Christine L 2015 01 02 Big Data Little Data No Data Scholarship in the Networked World Cambridge MA USA MIT Press ISBN 978 0 262 02856 1 Briney Kristin 2015 09 01 Data Management for Researchers Organize maintain and share your data for research success Pelagic Publishing Ltd ISBN 978 1 78427 013 1 Regazzi John J 2015 02 12 Scholarly Communications A History from Content as King to Content as Kingmaker Rowman amp Littlefield ISBN 978 0 8108 9088 6 Cox Andrew Verbaan Eddy 2018 05 11 Exploring Research Data Management Facet Publishing ISBN 978 1 78330 280 2 Davies Tim Walker Stephen B Rubinstein M Perini F 2019 Davies Tim Walker Stephen B Rubinstein Mor Perini Fernando eds The State of Open Data Histories and Horizons African Minds doi 10 5281 zenodo 2668475 S2CID 202295750 Retrieved 2022 09 11 Lipton Vera 2020 01 22 Open Scientific Data Why Choosing and Reusing the RIGHT DATA Matters BoD Books on Demand ISBN 978 1 83880 984 3 unreliable source Tibor Koltay 2021 10 31 Research Data Management and Data Literacies Chandos Publishing ISBN 978 0 323 86002 4 Other sources edit Neylon Cameron Bilder Geoffrey Lin Jennifer 2015 Principles for Open Scholarly Infrastructures Science in the open Retrieved 2021 11 01 External links editResearch Data Canada Open Data In Science article P Murray Rust Open Data about monitoring of deforestation in the Brazilian Amazon Rainforest OpenWetWare Open ConnectomeProject LinkedScience org Collective Mind Repository for computer engineering Retrieved from https en wikipedia org w index php title Open scientific data amp oldid 1189360247, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.