fbpx
Wikipedia

Data classification (data management)

In the field of data management, data classification as a part of the Information Lifecycle Management (ILM) process can be defined as a tool for categorization of data to enable/help organizations to effectively answer the following questions:

  • What data types are available?
  • Where are certain data located?
  • What access levels are implemented?
  • What protection level is implemented and does it adhere to compliance regulations[1]?

When implemented it provides a bridge between IT professionals and process or application owners. IT staff are informed about the data value and management (usually application owners) understands better which part of the data centre needs to be invested in to keep operations running effectively. This can be of particular importance in risk management, legal discovery, and compliance with government regulations. Data classification is typically a manual process; however, there are many tools from different vendors that can help gather information about the data.

Data classification needs to take into account the following:

  • Regulatory requirements
  • Strategic or proprietary worth
  • Organization specific policies
  • Ethical and privacy considerations
  • Contractual agreements[2]

How to start process of data classification?

Note that this classification structure is written from a Data Management perspective and therefore has a focus for text and text convertible binary data sources. Images, videos, and audio files are highly structured formats built for industry standard API's and do not readily fit within the classification scheme outlined below.

Evaluation and a division of the various data applications and data into their respective categories is needed to start the data classification process. For example, the process may look like:

  • Relational or Tabular data (around 15% of non audio/video data)
    • Generally describes proprietary data which can be accessible only through application or application programming interfaces (API).
    • Applications that produce structured data are usually database applications.
    • This type of data usually brings complex procedures of data evaluation and migration between the storage tiers.
    • To ensure adequate quality standards, the classification process has to be monitored by subject matter experts.
  • Semi-structured or Poly-structured data (all other non-audio/video data that does not conform to a system or platform defined Relational or Tabular form).
    • Generally describes data files that have a dynamic or non-relational semantic structure (e.g. documents, XML, JSON, Device or System Log output, Sensor Output, etc.).
    • Relatively simple process of data classification is criteria assignment.
    • Simple process of data migration between assigned segments of predefined storage tiers.

There are different types of data classification used. Please note that this designation is entirely orthogonal to the application-centric designation outlined above. Regardless of structure inherited from application, data may be of a certain type, such as:

1. Geographical

2. Chronological

3. Qualitative

4. Quantitative

It should also be evaluated across three dimensions:

  1. Identifiability: how easily can this data be used to identify an individual?
  2. Sensitivity: how much damage could be done if this data reached the wrong hands?
  3. Scarcity: how readily available is this data?[3]

Basic criteria for semi-structured or poly-structured data classification

  • Time criteria are the simplest and most commonly used, where different types of data are evaluated by time of creation, time of access, time of update, etc.
  • Metadata criteria as type, name, owner, location, and so on can be used to create more advanced classification policy.
  • Content criteria which involve usage of advanced content classification algorithms are most advanced forms of unstructured data classification.

Note that any of these criteria may also apply to Tabular or Relational data as "Basic Criteria." These criteria are application specific, rather than inherent aspects of the form in which the data is presented..

Basic criteria for relational or Tabular data classification

These criteria are usually initiated by application requirements such as:

  • Disaster recovery and Business Continuity rules
  • Data centre resources optimization and consolidation
  • Hardware performance limitations and possible improvements by reorganization

Note that any of these criteria may also apply to semi/poly structured data as "Basic Criteria." These criteria are application specific, rather than inherent aspects of the form in which the data is presented.

Benefits of data classification

Benefits of effective implementation of appropriate data classification can significantly improve ILM process and save data centre storage resources. If implemented systemically it can generate improvements in data centre performance and utilization. Data classification can also reduce costs and administration overhead. "Good enough" data classification can produce these results:

  • Data compliance and easier risk management. Data are located where expected on predefined storage tier and "point in time"
  • Simplification of data encryption because all data need not be encrypted. This saves valuable processor cycles and all related consecutiveness.
  • Data indexing to improve user access times
  • Data protection is redefined where RTO (Recovery Time Objective) is improved.

Business data classification approaches

There are three different approaches to data classification within a business environment, each of these techniques – paper-based classification, automated classification and user-driven (or user-applied) classification[4] – has its own benefits and pitfalls.

Paper-Based Classification Policy

A corporate data classification policy will set out how employees are required to treat the different types of data they handle, aligned with the organisation's overall data security policy and strategy. A well-written policy will enable users to make fast and intuitive decisions about the value of a piece of information, and what the appropriate handling rules are for example who can access the data and should a rights management template be invoked. The challenge, without any supporting technology, is ensuring that everyone is aware of the policy and implements it correctly.

Automated Classification Policy

This technique bypasses the users’ involvement, enforcing a classification policy to be consistently applied across all touchpoints, without the need for major communication and education programmes.

Classifications are applied by solutions that use software algorithms based on keywords or phrases in the content to analyse and classify it. This approach comes into its own where certain types of data are created with no user involvement – for example reports generated by ERP systems or where the data includes specific personal information which is easily identified such as credit card details.

However, automated solutions do not understand context and are therefore susceptible to inaccuracies, giving false positive results that can frustrate users and impede business processes, as well as false negative errors that expose organisations to sensitive data loss.

User-Driven Classification Policy

The data classification process can be completely automated, but it is most effective when the user is placed in the driving seat.

The user-driven classification technique makes employees themselves responsible for deciding which label is appropriate, and attaching it using a software tool at the point of creating, editing, sending or saving. The advantage of involving the user in the process is that their insight into the context, business value and sensitivity of a piece of data enables them to make informed and accurate decisions about which label to apply. User-driven classification is an additional security layer often used to complement automated classification.

Involving users in classification also leads to other organisational benefits including increased security awareness, an improved culture and the ability to monitor user behaviour which aids reporting and provides the ability to demonstrate compliance. Furthermore, managers can use this behavioural data to identify a possible insider threat, and address any concerns by providing additional guidance to users as appropriate, for example through additional training or by tightening up policy.

See also

References

  1. ^ Knight, Michelle (2021-08-26). "What Are Data Regulations?". DATAVERSITY. Retrieved 2022-10-26.
  2. ^ "Get the scoop on data classification and GDPR before you're too late - LightsOnData". LightsOnData. 2018-05-23. Retrieved 2018-05-23.
  3. ^ Khatibloo, Fatemeh (May 2017). "How Dirty Is Your Data? Strategic Plan: The Customer Trust And Privacy Playbook". The Customer Trust and Privacy Playbook for 2018.
  4. ^ "What Is Data Classification And What Can It Do For My Business? | Boldon James". www.boldonjames.com. Retrieved 2019-03-05.

data, classification, data, management, this, article, includes, list, general, references, lacks, sufficient, corresponding, inline, citations, please, help, improve, this, article, introducing, more, precise, citations, august, 2017, learn, when, remove, thi. This article includes a list of general references but it lacks sufficient corresponding inline citations Please help to improve this article by introducing more precise citations August 2017 Learn how and when to remove this template message In the field of data management data classification as a part of the Information Lifecycle Management ILM process can be defined as a tool for categorization of data to enable help organizations to effectively answer the following questions What data types are available Where are certain data located What access levels are implemented What protection level is implemented and does it adhere to compliance regulations 1 When implemented it provides a bridge between IT professionals and process or application owners IT staff are informed about the data value and management usually application owners understands better which part of the data centre needs to be invested in to keep operations running effectively This can be of particular importance in risk management legal discovery and compliance with government regulations Data classification is typically a manual process however there are many tools from different vendors that can help gather information about the data Data classification needs to take into account the following Regulatory requirements Strategic or proprietary worth Organization specific policies Ethical and privacy considerations Contractual agreements 2 Contents 1 How to start process of data classification 2 Basic criteria for semi structured or poly structured data classification 3 Basic criteria for relational or Tabular data classification 4 Benefits of data classification 5 Business data classification approaches 5 1 Paper Based Classification Policy 5 2 Automated Classification Policy 5 3 User Driven Classification Policy 6 See also 7 ReferencesHow to start process of data classification EditThis section s tone or style may not reflect the encyclopedic tone used on Wikipedia See Wikipedia s guide to writing better articles for suggestions September 2017 Learn how and when to remove this template message Note that this classification structure is written from a Data Management perspective and therefore has a focus for text and text convertible binary data sources Images videos and audio files are highly structured formats built for industry standard API s and do not readily fit within the classification scheme outlined below Evaluation and a division of the various data applications and data into their respective categories is needed to start the data classification process For example the process may look like Relational or Tabular data around 15 of non audio video data Generally describes proprietary data which can be accessible only through application or application programming interfaces API Applications that produce structured data are usually database applications This type of data usually brings complex procedures of data evaluation and migration between the storage tiers To ensure adequate quality standards the classification process has to be monitored by subject matter experts Semi structured or Poly structured data all other non audio video data that does not conform to a system or platform defined Relational or Tabular form Generally describes data files that have a dynamic or non relational semantic structure e g documents XML JSON Device or System Log output Sensor Output etc Relatively simple process of data classification is criteria assignment Simple process of data migration between assigned segments of predefined storage tiers There are different types of data classification used Please note that this designation is entirely orthogonal to the application centric designation outlined above Regardless of structure inherited from application data may be of a certain type such as 1 Geographical2 Chronological3 Qualitative4 QuantitativeIt should also be evaluated across three dimensions Identifiability how easily can this data be used to identify an individual Sensitivity how much damage could be done if this data reached the wrong hands Scarcity how readily available is this data 3 Basic criteria for semi structured or poly structured data classification EditTime criteria are the simplest and most commonly used where different types of data are evaluated by time of creation time of access time of update etc Metadata criteria as type name owner location and so on can be used to create more advanced classification policy Content criteria which involve usage of advanced content classification algorithms are most advanced forms of unstructured data classification Note that any of these criteria may also apply to Tabular or Relational data as Basic Criteria These criteria are application specific rather than inherent aspects of the form in which the data is presented Basic criteria for relational or Tabular data classification EditThese criteria are usually initiated by application requirements such as Disaster recovery and Business Continuity rules Data centre resources optimization and consolidation Hardware performance limitations and possible improvements by reorganizationNote that any of these criteria may also apply to semi poly structured data as Basic Criteria These criteria are application specific rather than inherent aspects of the form in which the data is presented Benefits of data classification EditBenefits of effective implementation of appropriate data classification can significantly improve ILM process and save data centre storage resources If implemented systemically it can generate improvements in data centre performance and utilization Data classification can also reduce costs and administration overhead Good enough data classification can produce these results Data compliance and easier risk management Data are located where expected on predefined storage tier and point in time Simplification of data encryption because all data need not be encrypted This saves valuable processor cycles and all related consecutiveness Data indexing to improve user access times Data protection is redefined where RTO Recovery Time Objective is improved Business data classification approaches EditThere are three different approaches to data classification within a business environment each of these techniques paper based classification automated classification and user driven or user applied classification 4 has its own benefits and pitfalls Paper Based Classification Policy Edit A corporate data classification policy will set out how employees are required to treat the different types of data they handle aligned with the organisation s overall data security policy and strategy A well written policy will enable users to make fast and intuitive decisions about the value of a piece of information and what the appropriate handling rules are for example who can access the data and should a rights management template be invoked The challenge without any supporting technology is ensuring that everyone is aware of the policy and implements it correctly Automated Classification Policy Edit This technique bypasses the users involvement enforcing a classification policy to be consistently applied across all touchpoints without the need for major communication and education programmes Classifications are applied by solutions that use software algorithms based on keywords or phrases in the content to analyse and classify it This approach comes into its own where certain types of data are created with no user involvement for example reports generated by ERP systems or where the data includes specific personal information which is easily identified such as credit card details However automated solutions do not understand context and are therefore susceptible to inaccuracies giving false positive results that can frustrate users and impede business processes as well as false negative errors that expose organisations to sensitive data loss User Driven Classification Policy Edit The data classification process can be completely automated but it is most effective when the user is placed in the driving seat The user driven classification technique makes employees themselves responsible for deciding which label is appropriate and attaching it using a software tool at the point of creating editing sending or saving The advantage of involving the user in the process is that their insight into the context business value and sensitivity of a piece of data enables them to make informed and accurate decisions about which label to apply User driven classification is an additional security layer often used to complement automated classification Involving users in classification also leads to other organisational benefits including increased security awareness an improved culture and the ability to monitor user behaviour which aids reporting and provides the ability to demonstrate compliance Furthermore managers can use this behavioural data to identify a possible insider threat and address any concerns by providing additional guidance to users as appropriate for example through additional training or by tightening up policy See also EditData classification business intelligence References Edit Knight Michelle 2021 08 26 What Are Data Regulations DATAVERSITY Retrieved 2022 10 26 Get the scoop on data classification and GDPR before you re too late LightsOnData LightsOnData 2018 05 23 Retrieved 2018 05 23 Khatibloo Fatemeh May 2017 How Dirty Is Your Data Strategic Plan The Customer Trust And Privacy Playbook The Customer Trust and Privacy Playbook for 2018 What Is Data Classification And What Can It Do For My Business Boldon James www boldonjames com Retrieved 2019 03 05 Josh Judd and Dan Kruger 2005 Principles of SAN Design Infinity Publishing Stephen J Bigelown November 2005 SearchStorage com http searchstorage techtarget com news article 0 289142 sid5 gci1139240 00 html Retrieved from https en wikipedia org w index php title Data classification data management amp oldid 1118395299, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.