Tab-separated values
Tab-separated values (TSV) is a simple, text-based file format for storing tabular data.[3] Records are separated by newlines, and values within a record are separated by tab characters. The TSV format is thus a delimiter-separated values format, similar to comma-separated values.
Filename extension | .tsv , .tab [1] |
---|---|
Internet media type | text/tab-separated-values |
Uniform Type Identifier (UTI) | public.tab-separated-values-text[2] |
UTI conformation | public.delimited-values-text[2] |
Developed by | University of Minnesota Internet Gopher Team Internet Assigned Numbers Authority |
Initial release | c. June 1993 |
Type of format | Delimiter-separated values format |
Container for | database information organized as field separated lists |
Standard | IANA MIME type |
TSV is a simple file format that is widely supported, so it is often used in data exchange to move tabular data between different computer programs that support the format. For example, a TSV file might be used to transfer information from a database to a spreadsheet.
Example edit
The head of the Iris flower data set can be stored as a TSV using the following plain text (note that the HTML rendering may convert tabs to spaces):
Sepal lengthSepal widthPetal lengthPetal widthSpecies 5.13.51.40.2I. setosa 4.93.01.40.2I. setosa 4.73.21.30.2I. setosa 4.63.11.50.2I. setosa 5.03.61.40.2I. setosa
The TSV plain text above corresponds to the following tabular data:
Sepal length | Sepal width | Petal length | Petal width | Species |
---|---|---|---|---|
5.1 | 3.5 | 1.4 | 0.2 | I. setosa |
4.9 | 3.0 | 1.4 | 0.2 | I. setosa |
4.7 | 3.2 | 1.3 | 0.2 | I. setosa |
4.6 | 3.1 | 1.5 | 0.2 | I. setosa |
5.0 | 3.6 | 1.4 | 0.2 | I. setosa |
Character escaping edit
The IANA media type standard for TSV achieves simplicity by simply disallowing tabs within fields.[4]
Since the values in the TSV format cannot contain literal tabs or newline characters, a convention is necessary for lossless conversion of text values with these characters. A common convention is to perform the following escapes:[5][6]
escape sequence | meaning |
---|---|
\n | line feed |
\t | tab |
\r | carriage return |
\\ | backslash |
Another common convention is to use the CSV convention from RFC 4180 and enclose values containing tabs or newlines in double quotes. This can lead to ambiguities.[7][8]
Line endings edit
Records are typically separated by a line feed, as is typical for Unix platforms, or a carriage return and line feed, as is typical for Microsoft platforms. Some programs may expect the latter. The de-facto specification[9] specifies that records are separated by an EOL, but does not specify any specific newline.
See also edit
References edit
- ^ U of Edin. Research Data Support Team. "Choose the best file formats". University of Edinburgh. § Formats we recommend. Retrieved 23 May 2023.
- ^ a b "tabSeparatedText". Apple Developer Documentation: Uniform Type Identifiers. Apple Inc. Retrieved 23 May 2023.
- ^ "How To Use Tab Separated Value (TSV) files". International Monetary Fund. Retrieved 1 February 2023.
- ^ Lindner 1993.
- ^ Dusek, Jason (6 May 2014). "Linear TSV: simple, line-oriented, tabular data". Data Protocols - Open Knowledge Foundation (v1.0β ed.).
- ^ Dolan, Stephen (1 November 2018). "jq Manual". jq. Retrieved 23 May 2023.
- ^ Miller, Rob (22 September 2015). Text Processing with Ruby: Extract Value from the Data That Surrounds You. Pragmatic Bookshelf. p. 94. ISBN 978-1-68050-492-7.
- ^ Giuseppini, Gabriele; Burnett, Mark (10 February 2005). Microsoft Log Parser Toolkit: A Complete Toolkit for Microsoft's Undocumented Log Analysis Tool. Elsevier. p. 311. ISBN 978-0-08-048939-1.
- ^ "IANA: text/tab-separated-values".
Sources edit
- "TSV — Tab-Separated Values" (11 February 2021 ed.). Library of Congress. fdd000533. Retrieved 23 May 2023.
- Lindner, Paul (June 1993). "Definition of tab-separated-values (tsv)". text/tab-separated-values. Internet Assigned Numbers Authority. Minnesota: University of Minnesota Internet Gopher Team. Retrieved 23 May 2023.
{{cite book}}
:|work=
ignored (help)
Further reading edit
- Jukka, Korpela (1 September 2000). "Tab Separated Values (TSV): a format for tabular data exchange" (12 February 2005 ed.). Retrieved 23 May 2023.
- Welinder, Morten (19 December 2012). "§14.2.3 — Text File Formats". The Gnumeric Manual (v1.12 ed.). Retrieved 23 May 2023.