fbpx
Wikipedia

Data virtualization

Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located,[1] and can provide a single customer view (or single view of any other entity) of the overall data.[2]

Unlike the traditional extract, transform, load ("ETL") process, the data remains in place, and real-time access is given to the source system for the data. This reduces the risk of data errors, of the workload moving data around that may never be used, and it does not attempt to impose a single data model on the data (an example of heterogeneous data is a federated database system). The technology also supports the writing of transaction data updates back to the source systems.[3] To resolve differences in source and consumer formats and semantics, various abstraction and transformation techniques are used. This concept and software is a subset of data integration and is commonly used within business intelligence, service-oriented architecture data services, cloud computing, enterprise search, and master data management.

Applications, benefits and drawbacks edit

The defining feature of data virtualization is that the data used remains in its original locations and real-time access is established to allow analytics across multiple sources. This aids in resolving some technical difficulties such as compatibility problems when combining data from various platforms, lowering the risk of error caused by faulty data, and guaranteeing that the newest data is used. Furthermore, avoiding the creation of a new database containing personal information can make it easier to comply with privacy regulations. As a result, data virtualization creates new possibilities for data use.[4]

Building on this, data virtualization's real value, particularly for users, is its declarative approach. Unlike traditional data integration methods that require specifying every step of integration, this approach can be less error-prone and more efficient. Traditional methods are tedious, especially when adapting to changing requirements, involving changes at multiple steps. Data virtualization, in contrast, allows users to simply describe the desired outcome. The software then automatically generates the necessary steps to achieve this result. If the desired outcome changes, updating the description suffices, and the software adjusts the intermediate steps accordingly. This flexibility can accelerate processes by up to five times, underscoring the primary advantage of data virtualization.[5]

However, with data virtualization, the connection to all necessary data sources must be operational as there is no local copy of the data, which is one of the main drawbacks of the approach. Connection problems occur more often in complex systems where one or more crucial sources will occasionally be unavailable. Smart data buffering, such as keeping the data from the most recent few requests in the virtualization system buffer can help to mitigate this issue.[4]

Moreover, because data virtualization solutions may use large numbers of network connections to read the original data and server virtualised tables to other solutions over the network, system security requires more consideration than it does with traditional data lakes. In a conventional data lake system, data can be imported into the lake by following specific procedures in a single environment. When using a virtualization system, the environment must separately establish secure connections with each data source, which is typically located in a different environment from the virtualization system itself.[4]

Security of personal data and compliance with regulations can be a major issue when introducing new services or attempting to combine various data sources. When data is delivered for analysis, data virtualisation can help to resolve privacy-related problems. Virtualization makes it possible to combine personal data from different sources without physically copying them to another location while also limiting the view to all other collected variables. However, virtualization does not eliminate the requirement to confirm the security and privacy of the analysis results before making them more widely available. Regardless of the chosen data integration method, all results based on personal level data should be protected with the appropriate privacy requirements.[4]

Data virtualization and data warehousing edit

Some enterprise landscapes are filled with disparate data sources including multiple data warehouses, data marts, and/or data lakes, even though a Data Warehouse, if implemented correctly, should be unique and a single source of truth. Data virtualization can efficiently bridge data across data warehouses, data marts, and data lakes without having to create a whole new integrated physical data platform. Existing data infrastructure can continue performing their core functions while the data virtualization layer just leverages the data from those sources. This aspect of data virtualization makes it complementary to all existing data sources and increases the availability and usage of enterprise data.[citation needed]

Data virtualization may also be considered as an alternative to ETL and data warehousing but for performance considerations it's not really recommended for a very large data warehouse. Data virtualization is inherently aimed at producing quick and timely insights from multiple sources without having to embark on a major data project with extensive ETL and data storage. However, data virtualization may be extended and adapted to serve data warehousing requirements also. This will require an understanding of the data storage and history requirements along with planning and design to incorporate the right type of data virtualization, integration, and storage strategies, and infrastructure/performance optimizations (e.g., streaming, in-memory, hybrid storage).[citation needed]

Examples edit

  • The Phone House—the trading name for the European operations of UK-based mobile phone retail chain Carphone Warehouse—implemented Denodo’s data virtualization technology between its Spanish subsidiary’s transactional systems and the Web-based systems of mobile operators.[3]
  • Novartis implemented TIBCO's data virtualization tool to enable its researchers to quickly combine data from both internal and external sources into a searchable virtual data store.[3]
  • The storage-agnostic Primary Data (defunct, reincarnated as Hammerspace) was a data virtualization platform that enabled applications, servers, and clients to transparently access data while it was migrated between direct-attached, network-attached, private and public cloud storage.[6]
  • Linked Data can use a single hyperlink-based Data Source Name (DSN) to provide a connection to a virtual database layer that is internally connected to a variety of back-end data sources using ODBC, JDBC, OLE DB, ADO.NET, SOA-style services, and/or REST patterns.[citation needed]
  • Database virtualization may use a single ODBC-based DSN to provide a connection to a similar virtual database layer.[clarification needed]
  • Alluxio, an open-source virtual distributed file system (VDFS), started at the University of California, Berkeley's AMPLab. The system abstracts data from various file systems and object stores.[citation needed]

Functionality edit

Data Virtualization software provides some or all of the following capabilities:[7]

  • Abstraction – Abstract the technical aspects of stored data, such as location, storage structure, API, access language, and storage technology.
  • Virtualized Data Access – Connect to different data sources and make them accessible from a common logical data access point.
  • Data transformation – Transform, improve quality, reformat, aggregate etc. source data for consumer use.
  • Data federation – Combine result sets from across multiple source systems.
  • Data delivery – Publish result sets as views and/or data services executed by client application or users when requested.

Data virtualization software may include functions for development, operation, and/or management.[citation needed]

A metadata engine collects, stores and analyzes information about data and metadata (data about data) in use within a domain.[8][clarification needed]

Benefits include:

  • Reduce risk of data errors[dubious ]
  • Reduce systems workload through not moving data around[dubious ]
  • Increase speed of access to data on a real-time basis
  • Allows for query processing pushed down to data source instead of in middle tier
  • Most systems enable self-service creation of virtual databases by end users with access to source systems
  • Increase governance and reduce risk through the use of policies[9]
  • Reduce data storage required[10]
  • Accelerate processes up to five times through the declarative approach [5]

Drawbacks include:

  • May impact Operational systems response time, particularly if under-scaled to cope with unanticipated user queries or not tuned early on.[11]
  • Does not impose a heterogeneous data model, meaning the user has to interpret the data, unless combined with Data Federation and business understanding of the data[12]
  • Requires a defined Governance approach to avoid budgeting issues with the shared services
  • Not suitable for recording the historic snapshots of data. A data warehouse is better for this[12]
  • Change management "is a huge overhead, as any changes need to be accepted by all applications and users sharing the same virtualization kit"[12]
  • Designers should always keep performance considerations in mind

Avoid usage:

  • For accessing Operational Data Systems (Performance and Operational Integrity issues)
  • For federating or centralizing all data of the organization (Security and hacking issues)
  • For building very large virtual Data warehouse (Performance issues)
  • As an ETL process (Governance and performance issues)
  • If you have only one or two data sources to virtualize

History edit

Enterprise information integration (EII) (first coined by Metamatrix), now known as Red Hat JBoss Data Virtualization, and federated database systems are terms used by some vendors to describe a core element of data virtualization: the capability to create relational JOINs in a federated VIEW.[citation needed][clarification needed]

Technology edit

Some data virtualization solutions and vendors:

  • IBM data Virtualization[13]
  • Actifio Copy Data Virtualization[14]
  • Capsenta Ultrawrap,[15] acquired by data.world 2019
  • Data Virtuality[16]
  • DataWerks[17]
  • Delphix Data Virtualization Platform[18]
  • Denodo Data Virtualization and Data Fabric Platform[19]
  • Microsoft Gluent Data Platform[20]
  • Querona[21]
  • Red Hat JBoss Enterprise Application Platform Data Virtualization[22] (discontinued)
  • Teeid, part of JBoss Developer Studio[23]
  • Stone Bond Technologies Enterprise Enabler Data Virtualization Platform[24]
  • TIBCO Data Virtualization
  • Veritas Provisioning File System[25] / Data Virtualization Veritas Technologies
  • XAware[26]

Another more up-to-date list with user rankings is compiled by Gartner.[27]

See also edit

  • Data integration – Combining data from different sources and providing a unified view
  • Enterprise information integration – Support a unified view of data and information for an entire organization (EII)
  • Master data management – Practice for controlling corporate data
  • Federated database system – type of meta-database management system which transparently maps multiple autonomous database systems into a single federated database
  • Disparate system – Data processing system without interaction with other computer data processing systems

References edit

  1. ^ "What is Data Virtualization?", Margaret Rouse, TechTarget.com, retrieved 19 August 2013
  2. ^ Streamlining Customer Data
  3. ^ a b c "Data virtualisation on rise as ETL alternative for data integration" Gareth Morgan, Computer Weekly, retrieved 19 August 2013
  4. ^ a b c d Paiho, Satu; Tuominen, Pekka; Rökman, Jyri; Ylikerälä, Markus; Pajula, Juha; Siikavirta, Hanne (2022). "Opportunities of collected city data for smart cities". IET Smart Cities. 4 (4): 275–291. doi:10.1049/smc2.12044. S2CID 253467923.
  5. ^ a b "The True Value of Data Virtualization: Beyond Marketing Buzzwords", Nick Golovin, medium.com, retrieved 14 November 2023
  6. ^ "Hammerspace - A True Global File System". Hammerspace. Retrieved 2021-10-31.
  7. ^ Summan, Jesse; Handmaker, Leslie (2022-12-20). "Data Federation vs. Data Virtualization". StreamSets. Retrieved 2024-02-08.
  8. ^ Kendall, Aaron. "Metadata-Driven Design: Designing a Flexible Engine for API Data Retrieval". InfoQ. Retrieved 25 April 2017.
  9. ^ "Rapid Access to Disparate Data Across Projects Without Rework" Informatica, retrieved 19 August 2013
  10. ^ Data virtualization: 6 best practices to help the business 'get it' Joe McKendrick, ZDNet, 27 October 2011
  11. ^ Mark Brunelli, SearchDataManagement, 11 October 2012
  12. ^ a b c "The Pros and Cons of Data Virtualization" 2014-08-05 at the Wayback Machine Loraine Lawson, BusinessEdge, 7 October 2011
  13. ^ https://www.ibm.com/products/watson-query
  14. ^ https://www.actifio.com/company/blog/post/enterprise-data-service-new-copy-data-virtualization/
  15. ^ https://www.w3.org/2001/sw/wiki/Ultrawrap
  16. ^ https://datavirtuality.com/en/
  17. ^ https://datawerks.com/
  18. ^ https://www.delphix.com/
  19. ^ https://www.denodo.com/
  20. ^ https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RWJFdq
  21. ^ https://www.querona.io/
  22. ^ https://access.redhat.com/documentation/en-us/red_hat_jboss_data_virtualization/6.4/html-single/getting_started_guide/index
  23. ^ https://teiid.io/
  24. ^ https://stonebond.com/
  25. ^ https://www.veritas.com/support/en_US/doc/141196447-161587232-0/v160534095-161587232
  26. ^ https://sourceforge.net/projects/xaware/
  27. ^ "Best Data Virtualization Reviews". Gartner. 2024. Retrieved 2024-02-07.

Further reading edit

  • Judith R. Davis; Robert Eve (2011). Data Virtualization: Going Beyond Traditional Data Integration to Achieve Business Agility. ISBN 978-0979930416.
  • Rick van der Lans (2012). Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses. ISBN 9780123944252.
  • Anthony Giordano (2010). Data Integration Blueprint and Modeling: Techniques for a Scalable and Sustainable Architecture. IBM Press. ISBN 9780137085309.

data, virtualization, approach, data, management, that, allows, application, retrieve, manipulate, data, without, requiring, technical, details, about, data, such, formatted, source, where, physically, located, provide, single, customer, view, single, view, ot. Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data such as how it is formatted at source or where it is physically located 1 and can provide a single customer view or single view of any other entity of the overall data 2 Unlike the traditional extract transform load ETL process the data remains in place and real time access is given to the source system for the data This reduces the risk of data errors of the workload moving data around that may never be used and it does not attempt to impose a single data model on the data an example of heterogeneous data is a federated database system The technology also supports the writing of transaction data updates back to the source systems 3 To resolve differences in source and consumer formats and semantics various abstraction and transformation techniques are used This concept and software is a subset of data integration and is commonly used within business intelligence service oriented architecture data services cloud computing enterprise search and master data management Contents 1 Applications benefits and drawbacks 2 Data virtualization and data warehousing 3 Examples 4 Functionality 5 History 6 Technology 7 See also 8 References 9 Further readingApplications benefits and drawbacks editThe defining feature of data virtualization is that the data used remains in its original locations and real time access is established to allow analytics across multiple sources This aids in resolving some technical difficulties such as compatibility problems when combining data from various platforms lowering the risk of error caused by faulty data and guaranteeing that the newest data is used Furthermore avoiding the creation of a new database containing personal information can make it easier to comply with privacy regulations As a result data virtualization creates new possibilities for data use 4 Building on this data virtualization s real value particularly for users is its declarative approach Unlike traditional data integration methods that require specifying every step of integration this approach can be less error prone and more efficient Traditional methods are tedious especially when adapting to changing requirements involving changes at multiple steps Data virtualization in contrast allows users to simply describe the desired outcome The software then automatically generates the necessary steps to achieve this result If the desired outcome changes updating the description suffices and the software adjusts the intermediate steps accordingly This flexibility can accelerate processes by up to five times underscoring the primary advantage of data virtualization 5 However with data virtualization the connection to all necessary data sources must be operational as there is no local copy of the data which is one of the main drawbacks of the approach Connection problems occur more often in complex systems where one or more crucial sources will occasionally be unavailable Smart data buffering such as keeping the data from the most recent few requests in the virtualization system buffer can help to mitigate this issue 4 Moreover because data virtualization solutions may use large numbers of network connections to read the original data and server virtualised tables to other solutions over the network system security requires more consideration than it does with traditional data lakes In a conventional data lake system data can be imported into the lake by following specific procedures in a single environment When using a virtualization system the environment must separately establish secure connections with each data source which is typically located in a different environment from the virtualization system itself 4 Security of personal data and compliance with regulations can be a major issue when introducing new services or attempting to combine various data sources When data is delivered for analysis data virtualisation can help to resolve privacy related problems Virtualization makes it possible to combine personal data from different sources without physically copying them to another location while also limiting the view to all other collected variables However virtualization does not eliminate the requirement to confirm the security and privacy of the analysis results before making them more widely available Regardless of the chosen data integration method all results based on personal level data should be protected with the appropriate privacy requirements 4 Data virtualization and data warehousing editSome enterprise landscapes are filled with disparate data sources including multiple data warehouses data marts and or data lakes even though a Data Warehouse if implemented correctly should be unique and a single source of truth Data virtualization can efficiently bridge data across data warehouses data marts and data lakes without having to create a whole new integrated physical data platform Existing data infrastructure can continue performing their core functions while the data virtualization layer just leverages the data from those sources This aspect of data virtualization makes it complementary to all existing data sources and increases the availability and usage of enterprise data citation needed Data virtualization may also be considered as an alternative to ETL and data warehousing but for performance considerations it s not really recommended for a very large data warehouse Data virtualization is inherently aimed at producing quick and timely insights from multiple sources without having to embark on a major data project with extensive ETL and data storage However data virtualization may be extended and adapted to serve data warehousing requirements also This will require an understanding of the data storage and history requirements along with planning and design to incorporate the right type of data virtualization integration and storage strategies and infrastructure performance optimizations e g streaming in memory hybrid storage citation needed Examples editThe Phone House the trading name for the European operations of UK based mobile phone retail chain Carphone Warehouse implemented Denodo s data virtualization technology between its Spanish subsidiary s transactional systems and the Web based systems of mobile operators 3 Novartis implemented TIBCO s data virtualization tool to enable its researchers to quickly combine data from both internal and external sources into a searchable virtual data store 3 The storage agnostic Primary Data defunct reincarnated as Hammerspace was a data virtualization platform that enabled applications servers and clients to transparently access data while it was migrated between direct attached network attached private and public cloud storage 6 Linked Data can use a single hyperlink based Data Source Name DSN to provide a connection to a virtual database layer that is internally connected to a variety of back end data sources using ODBC JDBC OLE DB ADO NET SOA style services and or REST patterns citation needed Database virtualization may use a single ODBC based DSN to provide a connection to a similar virtual database layer clarification needed Alluxio an open source virtual distributed file system VDFS started at the University of California Berkeley s AMPLab The system abstracts data from various file systems and object stores citation needed Functionality editData Virtualization software provides some or all of the following capabilities 7 Abstraction Abstract the technical aspects of stored data such as location storage structure API access language and storage technology Virtualized Data Access Connect to different data sources and make them accessible from a common logical data access point Data transformation Transform improve quality reformat aggregate etc source data for consumer use Data federation Combine result sets from across multiple source systems Data delivery Publish result sets as views and or data services executed by client application or users when requested Data virtualization software may include functions for development operation and or management citation needed A metadata engine collects stores and analyzes information about data and metadata data about data in use within a domain 8 clarification needed Benefits include Reduce risk of data errors dubious discuss Reduce systems workload through not moving data around dubious discuss Increase speed of access to data on a real time basis Allows for query processing pushed down to data source instead of in middle tier Most systems enable self service creation of virtual databases by end users with access to source systems Increase governance and reduce risk through the use of policies 9 Reduce data storage required 10 Accelerate processes up to five times through the declarative approach 5 Drawbacks include May impact Operational systems response time particularly if under scaled to cope with unanticipated user queries or not tuned early on 11 Does not impose a heterogeneous data model meaning the user has to interpret the data unless combined with Data Federation and business understanding of the data 12 Requires a defined Governance approach to avoid budgeting issues with the shared services Not suitable for recording the historic snapshots of data A data warehouse is better for this 12 Change management is a huge overhead as any changes need to be accepted by all applications and users sharing the same virtualization kit 12 Designers should always keep performance considerations in mindAvoid usage For accessing Operational Data Systems Performance and Operational Integrity issues For federating or centralizing all data of the organization Security and hacking issues For building very large virtual Data warehouse Performance issues As an ETL process Governance and performance issues If you have only one or two data sources to virtualizeHistory editEnterprise information integration EII first coined by Metamatrix now known as Red Hat JBoss Data Virtualization and federated database systems are terms used by some vendors to describe a core element of data virtualization the capability to create relational JOINs in a federated VIEW citation needed clarification needed Technology editSome data virtualization solutions and vendors IBM data Virtualization 13 Actifio Copy Data Virtualization 14 Capsenta Ultrawrap 15 acquired by data world 2019 Data Virtuality 16 DataWerks 17 Delphix Data Virtualization Platform 18 Denodo Data Virtualization and Data Fabric Platform 19 Microsoft Gluent Data Platform 20 Querona 21 Red Hat JBoss Enterprise Application Platform Data Virtualization 22 discontinued Teeid part of JBoss Developer Studio 23 Stone Bond Technologies Enterprise Enabler Data Virtualization Platform 24 TIBCO Data Virtualization Veritas Provisioning File System 25 Data Virtualization Veritas Technologies XAware 26 Another more up to date list with user rankings is compiled by Gartner 27 See also editData integration Combining data from different sources and providing a unified view Enterprise information integration Support a unified view of data and information for an entire organization EII Master data management Practice for controlling corporate data Federated database system type of meta database management system which transparently maps multiple autonomous database systems into a single federated databasePages displaying wikidata descriptions as a fallback Disparate system Data processing system without interaction with other computer data processing systemsReferences edit What is Data Virtualization Margaret Rouse TechTarget com retrieved 19 August 2013 Streamlining Customer Data a b c Data virtualisation on rise as ETL alternative for data integration Gareth Morgan Computer Weekly retrieved 19 August 2013 a b c d Paiho Satu Tuominen Pekka Rokman Jyri Ylikerala Markus Pajula Juha Siikavirta Hanne 2022 Opportunities of collected city data for smart cities IET Smart Cities 4 4 275 291 doi 10 1049 smc2 12044 S2CID 253467923 a b The True Value of Data Virtualization Beyond Marketing Buzzwords Nick Golovin medium com retrieved 14 November 2023 Hammerspace A True Global File System Hammerspace Retrieved 2021 10 31 Summan Jesse Handmaker Leslie 2022 12 20 Data Federation vs Data Virtualization StreamSets Retrieved 2024 02 08 Kendall Aaron Metadata Driven Design Designing a Flexible Engine for API Data Retrieval InfoQ Retrieved 25 April 2017 Rapid Access to Disparate Data Across Projects Without Rework Informatica retrieved 19 August 2013 Data virtualization 6 best practices to help the business get it Joe McKendrick ZDNet 27 October 2011 IT pros reveal benefits drawbacks of data virtualization software Mark Brunelli SearchDataManagement 11 October 2012 a b c The Pros and Cons of Data Virtualization Archived 2014 08 05 at the Wayback Machine Loraine Lawson BusinessEdge 7 October 2011 https www ibm com products watson query https www actifio com company blog post enterprise data service new copy data virtualization https www w3 org 2001 sw wiki Ultrawrap https datavirtuality com en https datawerks com https www delphix com https www denodo com https query prod cms rt microsoft com cms api am binary RWJFdq https www querona io https access redhat com documentation en us red hat jboss data virtualization 6 4 html single getting started guide index https teiid io https stonebond com https www veritas com support en US doc 141196447 161587232 0 v160534095 161587232 https sourceforge net projects xaware Best Data Virtualization Reviews Gartner 2024 Retrieved 2024 02 07 Further reading editJudith R Davis Robert Eve 2011 Data Virtualization Going Beyond Traditional Data Integration to Achieve Business Agility ISBN 978 0979930416 Rick van der Lans 2012 Data Virtualization for Business Intelligence Systems Revolutionizing Data Integration for Data Warehouses ISBN 9780123944252 Anthony Giordano 2010 Data Integration Blueprint and Modeling Techniques for a Scalable and Sustainable Architecture IBM Press ISBN 9780137085309 Retrieved from https en wikipedia org w index php title Data virtualization amp oldid 1208850037, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.