fbpx
Wikipedia

Data extraction

Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage (data migration). The import into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow.

Usually, the term data extraction is applied when (experimental) data is first imported into a computer from primary sources, like measuring or recording devices. Today's electronic devices will usually present an electrical connector (e.g. USB) through which 'raw data' can be streamed into a personal computer.

Data sources

Typical unstructured data sources include web pages, emails, documents, PDFs, scanned text, mainframe reports, spool files, classifieds, etc. which is further used for sales or marketing leads. Extracting data from these unstructured sources has grown into a considerable technical challenge where as historically data extraction has had to deal with changes in physical hardware formats, the majority of current data extraction deals with extracting data from these unstructured data sources, and from different software formats. This growing process of data extraction[1] from the web is referred to as "Web data extraction" or "Web scraping".

Imposing structure

The act of adding structure to unstructured data takes a number of forms

  • Using text pattern matching such as regular expressions to identify small or large-scale structure e.g. records in a report and their associated data from headers and footers;
  • Using a table-based approach to identify common sections within a limited domain e.g. in emailed resumes, identifying skills, previous work experience, qualifications etc. using a standard set of commonly used headings (these would differ from language to language), e.g. Education might be found under Education/Qualification/Courses;
  • Using text analytics to attempt to understand the text and link it to other information

See also

References

  1. ^ data extraction.

data, extraction, this, article, relies, largely, entirely, single, source, relevant, discussion, found, talk, page, please, help, improve, this, article, introducing, citations, additional, sources, find, sources, news, newspapers, books, scholar, jstor, augu. This article relies largely or entirely on a single source Relevant discussion may be found on the talk page Please help improve this article by introducing citations to additional sources Find sources Data extraction news newspapers books scholar JSTOR August 2020 Data extraction is the act or process of retrieving data out of usually unstructured or poorly structured data sources for further data processing or data storage data migration The import into the intermediate extracting system is thus usually followed by data transformation and possibly the addition of metadata prior to export to another stage in the data workflow Usually the term data extraction is applied when experimental data is first imported into a computer from primary sources like measuring or recording devices Today s electronic devices will usually present an electrical connector e g USB through which raw data can be streamed into a personal computer Contents 1 Data sources 2 Imposing structure 3 See also 4 ReferencesData sources EditTypical unstructured data sources include web pages emails documents PDFs scanned text mainframe reports spool files classifieds etc which is further used for sales or marketing leads Extracting data from these unstructured sources has grown into a considerable technical challenge where as historically data extraction has had to deal with changes in physical hardware formats the majority of current data extraction deals with extracting data from these unstructured data sources and from different software formats This growing process of data extraction 1 from the web is referred to as Web data extraction or Web scraping Imposing structure EditThe act of adding structure to unstructured data takes a number of forms Using text pattern matching such as regular expressions to identify small or large scale structure e g records in a report and their associated data from headers and footers Using a table based approach to identify common sections within a limited domain e g in emailed resumes identifying skills previous work experience qualifications etc using a standard set of commonly used headings these would differ from language to language e g Education might be found under Education Qualification Courses Using text analytics to attempt to understand the text and link it to other informationSee also EditInformation extraction Data retrieval Extract transform load ETL Data miningReferences Edit data extraction Retrieved from https en wikipedia org w index php title Data extraction amp oldid 975799176, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.