fbpx
Wikipedia

Comma-separated values

Comma-separated values (CSV) is a text file format that uses commas to separate values. A CSV file stores tabular data (numbers and text) in plain text, where each line of the file typically represents one data record. Each record consists of the same number of fields, and these are separated by commas in the CSV file. If the field delimiter itself may appear within a field, fields can be surrounded with quotation marks.[3]

Comma-separated values
A simple CSV file listing three people and the companies they work for
Filename extension.csv
Internet media typetext/csv[1]
Uniform Type Identifier (UTI)public.comma-separated-values-text[2]
UTI conformationpublic.delimited-values-text[2]
Type of formatmulti-platform, serial data streams
Container fordatabase information organized as field separated lists
StandardRFC 4180

The CSV file format is one type of delimiter-separated file format.[4] Delimiters frequently used include the comma, tab, space, and semicolon. Delimiter-separated files are often given a ".csv" extension even when the field separator is not a comma. Many applications or libraries that consume or produce CSV files have options to specify an alternative delimiter.[5]

The lack of adherence to the CSV standard RFC 4180 necessitates the support for a variety of CSV formats in data input software. Despite this drawback, CSV remains widespread in data applications and is widely supported by a variety of software, including common spreadsheet applications such as Microsoft Excel.[6] Benefits cited in favor of CSV include human readability and the simplicity of the format.[7]

Applications Edit

CSV is a common data exchange format that is widely supported by consumer, business, and scientific applications. Among its most common uses is moving tabular data[8][9] between programs that natively operate on incompatible (often proprietary or undocumented) formats.[1] For example, a user may need to transfer information from a database program that stores data in a proprietary format, to a spreadsheet that uses a completely different format. Most database programs can export data as CSV. Most spreadsheet programs can read CSV data, allowing CSV to be used as an intermediate format when transferring data from a database to a spreadsheet.

CSV is also used for storing data. Common data science tools such as Pandas include the option to export data to CSV for long-term storage.[10] Benefits of CSV for data storage include the simplicity of CSV makes parsing and creating CSV files easy to implement and fast compared to other data formats, human readability making editing or fixing data simpler,[11] and high compressibility leading to smaller data files.[12] Alternatively, CSV does not support more complex data relations and makes no distinction between null and empty values, and in applications where these features are needed other formats are preferred.

Specification Edit

RFC 4180 proposes a specification for the CSV format; however, actual practice often does not follow the RFC and the term "CSV" might refer to any file that:[1][13]

  1. is plain text using a character encoding such as ASCII, various Unicode character encodings (e.g. UTF-8), EBCDIC, or Shift JIS,
  2. consists of records (typically one record per line),
  3. with the records divided into fields separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),
  4. where every record has the same sequence of fields.

Within these general constraints, many variations are in use. Therefore, without additional information (such as whether RFC 4180 is honored), a file claimed simply to be in "CSV" format is not fully specified. As a result, some applications supporting CSV files have text import wizards that allow users to preview the first few lines of the file and then specify the delimiter character(s), quoting rules, and field trimming.

History Edit

Comma-separated values is a data format that predates personal computers by more than a decade: the IBM Fortran (level H extended) compiler under OS/360 supported CSV in 1972.[14] List-directed ("free form") input/output was defined in FORTRAN 77, approved in 1978. List-directed input used commas or spaces for delimiters, so unquoted character strings could not contain commas or spaces.[15]

The term "comma-separated value" and the "CSV" abbreviation were in use by 1983.[16] The manual for the Osborne Executive computer, which bundled the SuperCalc spreadsheet, documents the CSV quoting convention that allows strings to contain embedded commas, but the manual does not specify a convention for embedding quotation marks within quoted strings.[17]

Comma-separated value lists are easier to type (for example into punched cards) than fixed-column-aligned data, and they were less prone to producing incorrect results if a value was punched one column off from its intended location.

Comma separated files are used for the interchange of database information between machines of two different architectures. The plain-text character of CSV files largely avoids incompatibilities such as byte-order and word size. The files are largely human-readable, so it is easier to deal with them in the absence of perfect documentation or communication.[18]

The main standardization initiative—transforming "de facto fuzzy definition" into a more precise and de jure one—was in 2005, with RFC 4180, defining CSV as a MIME Content Type.[19] Later, in 2013, some of RFC 4180's deficiencies were tackled by a W3C recommendation.[20]

In 2014 IETF published RFC 7111 describing the application of URI fragments to CSV documents. RFC 7111 specifies how row, column, and cell ranges can be selected from a CSV document using position indexes.[21]

In 2015 W3C, in an attempt to enhance CSV with formal semantics, publicized the first drafts of recommendations for CSV metadata standards, which began as recommendations in December of the same year.[22]

General functionality Edit

CSV formats are best used to represent sets or sequences of records in which each record has an identical list of fields. This corresponds to a single relation in a relational database, or to data (though not calculations) in a typical spreadsheet.

The format dates back to the early days of business computing and is widely used to pass data between computers with different internal word sizes, data formatting needs, and so forth. For this reason, CSV files are common on all computer platforms.

CSV is a delimited text file that uses a comma to separate values (many implementations of CSV import/export tools allow other separators to be used; for example, the use of a "Sep=^" row as the first row in the *.csv file will cause Excel to open the file expecting caret "^" to be the separator instead of comma ","). Simple CSV implementations may prohibit field values that contain a comma or other special characters such as newlines. More sophisticated CSV implementations permit them, often by requiring " (double quote) characters around values that contain reserved characters (such as commas, double quotes, or less commonly, newlines). Embedded double quote characters may then be represented by a pair of consecutive double quotes,[23] or by prefixing a double quote with an escape character such as a backslash (for example in Sybase Central).

CSV formats are not limited to a particular character set.[1] They work just as well with Unicode character sets (such as UTF-8 or UTF-16) as with ASCII (although particular programs that support CSV may have their own limitations). CSV files normally will even survive naïve translation from one character set to another (unlike nearly all proprietary data formats). CSV does not, however, provide any way to indicate what character set is in use, so that must be communicated separately, or determined at the receiving end (if possible).

Databases that include multiple relations cannot be exported as a single CSV file[citation needed]. Similarly, CSV cannot naturally represent hierarchical or object-oriented data. This is because every CSV record is expected to have the same structure. CSV is therefore rarely appropriate for documents created with HTML, XML, or other markup or word-processing technologies.

Statistical databases in various fields often have a generally relation-like structure, but with some repeatable groups of fields. For example, health databases such as the Demographic and Health Survey typically repeat some questions for each child of a given parent (perhaps up to a fixed maximum number of children). Statistical analysis systems often include utilities that can "rotate" such data; for example, a "parent" record that includes information about five children can be split into five separate records, each containing (a) the information on one child, and (b) a copy of all the non-child-specific information. CSV can represent either the "vertical" or "horizontal" form of such data.

In a relational database, similar issues are readily handled by creating a separate relation for each such group, and connecting "child" records to the related "parent" records using a foreign key (such as an ID number or name for the parent). In markup languages such as XML, such groups are typically enclosed within a parent element and repeated as necessary (for example, multiple <child> nodes within a single <parent> node). With CSV there is no widely accepted single-file solution.

Standardization Edit

The name "CSV" indicates the use of the comma to separate data fields. Nevertheless, the term "CSV" is widely used to refer to a large family of formats that differ in many ways. Some implementations allow or require single or double quotation marks around some or all fields; and some reserve the first record as a header containing a list of field names. The character set being used is undefined: some applications require a Unicode byte order mark (BOM) to enforce Unicode interpretation (sometimes even a UTF-8 BOM).[1] Files that use the tab character instead of comma can be more precisely referred to as "TSV" for tab-separated values.

Other implementation differences include the handling of more commonplace field separators (such as space or semicolon) and newline characters inside text fields. One more subtlety is the interpretation of a blank line: it can equally be the result of writing a record of zero fields, or a record of one field of zero length; thus decoding it is ambiguous.

RFC 4180 and MIME standards Edit

The 2005 technical standard RFC 4180 formalizes the CSV file format and defines the MIME type "text/csv" for the handling of text-based fields. However, the interpretation of the text of each field is still application-specific. Files that follow the RFC 4180 standard can simplify CSV exchange and should be widely portable. Among its requirements:

  • MS-DOS-style lines that end with (CR/LF) characters (optional for the last line).
  • An optional header record (there is no sure way to detect whether it is present, so care is required when importing).
  • Each record should contain the same number of comma-separated fields.
  • Any field may be quoted (with double quotes).
  • Fields containing a line-break, double-quote or commas should be quoted. (If they are not, the file will likely be impossible to process correctly.)
  • If double-quotes are used to enclose fields, then a double-quote in a field must be represented by two double-quote characters.

The format can be processed by most programs that claim to read CSV files. The exceptions are (a) programs may not support line-breaks within quoted fields, (b) programs may confuse the optional header with data or interpret the first data line as an optional header, and (c) double-quotes in a field may not be parsed correctly automatically.

OKF frictionless tabular data package Edit

In 2011 Open Knowledge Foundation (OKF) and various partners created a data protocols working group, which later evolved into the Frictionless Data initiative. One of the main formats they released was the Tabular Data Package. Tabular Data package was heavily based on CSV, using it as the main data transport format and adding basic type and schema metadata (CSV lacks any type information to distinguish the string "1" from the number 1).[24]

The Frictionless Data Initiative has also provided a standard CSV Dialect Description Format for describing different dialects of CSV, for example specifying the field separator or quoting rules.[25]

W3C tabular data standard Edit

In 2013 the W3C "CSV on the Web" working group began to specify technologies providing higher interoperability for web applications using CSV or similar formats.[26] The working group completed its work in February 2016 and is officially closed in March 2016 with the release of a set of documents and W3C recommendations[27] for modeling "Tabular Data",[28] and enhancing CSV with metadata and semantics.

Basic rules Edit

Many informal documents exist that describe "CSV" formats. IETF RFC 4180 (summarized above) defines the format for the "text/csv" MIME type registered with the IANA.

Rules typical of these and other "CSV" specifications and implementations are as follows:

  • CSV is a delimited data format that has fields/columns separated by the comma character and records/rows terminated by newlines.
  • A CSV file does not require a specific character encoding, byte order, or line terminator format (some software do not support all line-end variations).
  • A record ends at a line terminator. However, line terminators can be embedded as data within fields, so software must recognize quoted line-separators (see below) in order to correctly assemble an entire record from perhaps multiple lines.
  • All records should have the same number of fields, in the same order.
  • Data within fields is interpreted as a sequence of characters, not as a sequence of bits or bytes (see RFC 2046, section 4.1). For example, the numeric quantity 65535 may be represented as the 5 ASCII characters "65535" (or perhaps other forms such as "0xFFFF", "000065535.000E+00", etc.); but not as a sequence of 2 bytes intended to be treated as a single binary integer rather than as two characters (e.g. the numbers 11264–11519 have a comma as their high order byte: ord(',')*256..ord(',')*256+255). If this "plain text" convention is not followed, then the CSV file no longer contains sufficient information to interpret it correctly, the CSV file will not likely survive transmission across differing computer architectures, and will not conform to the text/csv MIME type.
  • Adjacent fields must be separated by a single comma. However, "CSV" formats vary greatly in this choice of separator character. In particular, in locales where the comma is used as a decimal separator, a semicolon, TAB, or other character is used instead.
    1997,Ford,E350
  • Any field may be quoted (that is, enclosed within double-quote characters), while some fields must be quoted, as specified in the following rules and examples:
    "1997","Ford","E350"
  • Fields with embedded commas or double-quote characters must be quoted.
    1997,Ford,E350,"Super, luxurious truck"
  • Each of the embedded double-quote characters must be represented by a pair of double-quote characters.
    1997,Ford,E350,"Super, ""luxurious"" truck"
  • Fields with embedded line breaks must be quoted (however, many CSV implementations do not support embedded line breaks).
    1997,Ford,E350,"Go get one now they are going fast" 
  • In some CSV implementations[which?], leading and trailing spaces and tabs are trimmed (ignored). Such trimming is forbidden by RFC 4180, which states "Spaces are considered part of a field and should not be ignored."
    1997, Ford, E350 not same as 1997,Ford,E350 
  • According to RFC 4180, spaces outside quotes in a field are not allowed; however, the RFC also says that "Spaces are considered part of a field and should not be ignored." and "Implementers should 'be conservative in what you do, be liberal in what you accept from others' (RFC 793, section 2.10) when processing CSV files."
    1997, "Ford" ,E350
  • In CSV implementations that do trim leading or trailing spaces, fields with such spaces as meaningful data must be quoted.
    1997,Ford,E350," Super luxurious truck "
  • Double quote processing need only apply if the field starts with a double quote. Note, however, that double quotes are not allowed in unquoted fields according to RFC 4180.
    Los Angeles,34°03′N,118°15′W New York City,40°42′46″N,74°00′21″W Paris,48°51′24″N,2°21′03″E 
  • The first record may be a "header", which contains column names in each of the fields (there is no reliable way to tell whether a file does this or not; however, it is uncommon to use characters other than letters, digits, and underscores in such column names).
    Year,Make,Model 1997,Ford,E350 2000,Mercury,Cougar 

Example Edit

Year Make Model Description Price
1997 Ford E350 ac, abs, moon 3000.00
1999 Chevy Venture "Extended Edition" 4900.00
1999 Chevy Venture "Extended Edition, Very Large" 5000.00
1996 Jeep Grand Cherokee MUST SELL!
air, moon roof, loaded
4799.00

The above table of data may be represented in CSV format as follows:

Year,Make,Model,Description,Price 1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""","",4900.00 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00 

Example of a USA/UK CSV file (where the decimal separator is a period/full stop and the value separator is a comma):

Year,Make,Model,Length 1997,Ford,E350,2.35 2000,Mercury,Cougar,2.38 

Example of an analogous European CSV/DSV file (where the decimal separator is a comma and the value separator is a semicolon):

Year;Make;Model;Length 1997;Ford;E350;2,35 2000;Mercury;Cougar;2,38 

The latter format is not RFC 4180 compliant.[29] Compliance could be achieved by the use of a comma instead of a semicolon as a separator and either the international notation for the representation of the decimal mark or the practice of quoting all numbers that have a decimal mark.

Application support Edit

Some applications use CSV as a data interchange format to enhance its interoperability, exporting and importing CSV. Others use CSV as an internal format.

As a data interchange format: the CSV file format is supported by almost all spreadsheets and database management systems,

  • Spreadsheets including Apple Numbers, LibreOffice Calc, and Apache OpenOffice Calc. Microsoft Excel also supports a dialect of CSV with restrictions in comparison to other spreadsheet software (e.g., as of 2019 Excel still cannot export CSV files in the commonly used UTF-8 character encoding, and separator is not enforced to be the comma). LibreOffice Calc CSV importer is actually a more generic delimited text importer, supporting multiple separators at the same time as well as field trimming.
  • Relational databases, when using standard SQL, can export/import CSV by the COPY command. For example, on PostgreSQL is valid COPY TO t 'file.csv' CSV and COPY FROM t 'file.csv' CSV.[30]
  • Many utility programs on Unix-style systems (such as cut, paste, join, sort, uniq, awk) can split files on a comma delimiter, and can therefore process simple CSV files. However, this method does not correctly handle commas or new lines within quoted strings.
  • Some code and text editors such as Visual Studio Code, IntelliJ, Notepad++, CudaText and others support syntax highlighting for CSV files, making them easier to read and edit

As (main or optional) internal representation. Can be native or foreign, but differ from interchange format ("export/import only") because it is not necessary to create a copy in another format:

  • Some Spreadsheets including LibreOffice Calc offers this option, without enforcing user to adopt another format.
  • Some relational databases, when using standard SQL, offer foreign-data wrapper (FDW). For example, PostgreSQL offers the CREATE FOREIGN TABLE[31] and CREATE EXTENSION file_fdw[32] to configure any variant of CSV.
  • Databases like Apache Hive offer the option to express CSV or .csv.gz as an internal table format.
  • The emacs editor can operate on CSV files using csv-nav mode.[33]

CSV format is supported by libraries available for many programming languages. Most provide some way to specify the field delimiter, decimal separator, character encoding, quoting conventions, date format, etc.

Software and row limits Edit

Each software that works with CSV has its limits on the maximum number of rows CSV files can have. Below is a list of common software and its limitations:[34]

  • Microsoft Excel: 1,048,576 row limit;
  • Apple Numbers: 1,000,000 row limit;
  • Google Sheets: 5,000,000 cell limit (the product of columns and rows);
  • OpenOffice and LibreOffice: 1,048,576 row limit;
  • Text Editors (such as WordPad, TextEdit, Vim, etc.): no row or cell limit;
  • Databases (COPY command and FDW): no row or cell limit.

See also Edit

References Edit

  1. ^ a b c d e Shafranovich, Y. (October 2005). Common Format and MIME Type for CSV Files. IETF. p. 1. doi:10.17487/RFC4180. RFC 4180.
  2. ^ a b "commaSeparatedText". Apple Developer Documentation: Uniform Type Identifiers. Apple Inc.
  3. ^ "CSV Comma Separated Value File Format - How To - Creativyst - Explored,Designed,Delivered.(sm)". Creativyst Software. from the original on 1 April 2021. Retrieved 22 August 2023.
  4. ^ IBM DB2 Administration Guide. IBM.
  5. ^ "Which are the available formats". Eurostat. from the original on 26 July 2023. Retrieved 22 August 2023.
  6. ^ "Import or export text (.txt or .csv) files - Microsoft Support". support.microsoft.com. Retrieved 2023-08-16.
  7. ^ "CSV Files: Use cases, Benefits, and Limitations". www.oneschema.co. Retrieved 2023-08-16.
  8. ^ "CSV - Comma Separated Values". Retrieved 2017-12-02.
  9. ^ "CSV Files". Retrieved June 4, 2014.
  10. ^ "pandas.DataFrame.to_csv — pandas 2.0.3 documentation". pandas.pydata.org. Retrieved 2023-08-16.
  11. ^ "CSV Format: History, Advantages and Why It Is Still Popular". ByteScout. 2021-09-15. Retrieved 2023-08-16.
  12. ^ "Comparison of different file formats in Big Data". www.adaltas.com. 2020-07-23. Retrieved 2023-08-16.
  13. ^ "Comma Separated Values (CSV) Standard File Format". Edoceo, Inc. Retrieved June 4, 2014.
  14. ^ IBM FORTRAN Program Products for OS and the CMS Component of VM/370 General Information (PDF) (first ed.), July 1972, p. 17, GC28-6884-0, retrieved February 5, 2016, For users familiar with the predecessor FORTRAN IV G and H processors, these are the major new language capabilities
  15. ^ "List-Directed I/O", Fortran 77 Language Reference, Oracle
  16. ^ "SuperCalc², spreadsheet package for IBM, CP/M". Retrieved December 11, 2017.
  17. ^ "Comma-Separated-Value Format File Structure". 1983. Retrieved December 11, 2017.
  18. ^ "CSV, Comma Separated Values (RFC 4180)". Retrieved June 4, 2014.
  19. ^ RFC 4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files. doi:10.17487/RFC4180. RFC 4180. Retrieved December 22, 2020.
  20. ^ See sparql11-results-csv-tsv, the first W3C recommendation scoped in CSV and filling some of RFC 4180's deficiencies.
  21. ^ RFC 7111: URI Fragment Identifiers for the text/csv Media Type. doi:10.17487/RFC7111. RFC 7111. Retrieved December 22, 2020.
  22. ^ "Model for Tabular Data and Metadata on the Web – W3C Recommendation 17 December 2015". Retrieved March 23, 2016.
  23. ^ *Creativyst (2010), How To: The Comma Separated Value (CSV) File Format, creativyst.com, retrieved May 24, 2010
  24. ^ "Tabular Data Package". Frictionless Data Specs.
  25. ^ "CSV Dialect". Frictionless Data Specs.
  26. ^ "CSV on the Web Working Group". W3C CSV WG. 2013. Retrieved 2015-04-22.
  27. ^ CSV on the Web Repository (on GitHub)
  28. ^ Model for Tabular Data and Metadata on the Web (W3C Recommendation)
  29. ^ Shafranovich (2005) states, "Within the header and each record, there may be one or more fields, separated by commas."
  30. ^ "Documentation: 14: COPY". PostgreSQL. 2022-02-10. Retrieved 2022-03-04.
  31. ^ "Documentation: 14: F.35. postgres_fdw". PostgreSQL. 2022-02-10. Retrieved 2022-03-04.
  32. ^ "Documentation: 14: F.14. file_fdw". PostgreSQL. 2022-02-10. Retrieved 2022-03-04.
  33. ^ "EmacsWiki: Csv Nav". www.emacswiki.org.
  34. ^ "Understanding CSV and row limits". Retrieved Feb 28, 2021.

Further reading Edit

  • "IBM DB2 Administration Guide - LOAD, IMPORT, and EXPORT File Formats". IBM. from the original on 2016-12-13. Retrieved 2016-12-12. (Has file descriptions of delimited ASCII (.DEL) (including comma- and semicolon-separated) and non-delimited ASCII (.ASC) files for data transfer.)

comma, separated, values, text, file, format, that, uses, commas, separate, values, file, stores, tabular, data, numbers, text, plain, text, where, each, line, file, typically, represents, data, record, each, record, consists, same, number, fields, these, sepa. Comma separated values CSV is a text file format that uses commas to separate values A CSV file stores tabular data numbers and text in plain text where each line of the file typically represents one data record Each record consists of the same number of fields and these are separated by commas in the CSV file If the field delimiter itself may appear within a field fields can be surrounded with quotation marks 3 Comma separated valuesA simple CSV file listing three people and the companies they work forFilename extension csvInternet media typetext csv 1 Uniform Type Identifier UTI public comma separated values text 2 UTI conformationpublic delimited values text 2 Type of formatmulti platform serial data streamsContainer fordatabase information organized as field separated listsStandardRFC 4180The CSV file format is one type of delimiter separated file format 4 Delimiters frequently used include the comma tab space and semicolon Delimiter separated files are often given a csv extension even when the field separator is not a comma Many applications or libraries that consume or produce CSV files have options to specify an alternative delimiter 5 The lack of adherence to the CSV standard RFC 4180 necessitates the support for a variety of CSV formats in data input software Despite this drawback CSV remains widespread in data applications and is widely supported by a variety of software including common spreadsheet applications such as Microsoft Excel 6 Benefits cited in favor of CSV include human readability and the simplicity of the format 7 Contents 1 Applications 2 Specification 3 History 4 General functionality 5 Standardization 5 1 RFC 4180 and MIME standards 5 2 OKF frictionless tabular data package 5 3 W3C tabular data standard 6 Basic rules 7 Example 8 Application support 8 1 Software and row limits 9 See also 10 References 11 Further readingApplications EditCSV is a common data exchange format that is widely supported by consumer business and scientific applications Among its most common uses is moving tabular data 8 9 between programs that natively operate on incompatible often proprietary or undocumented formats 1 For example a user may need to transfer information from a database program that stores data in a proprietary format to a spreadsheet that uses a completely different format Most database programs can export data as CSV Most spreadsheet programs can read CSV data allowing CSV to be used as an intermediate format when transferring data from a database to a spreadsheet CSV is also used for storing data Common data science tools such as Pandas include the option to export data to CSV for long term storage 10 Benefits of CSV for data storage include the simplicity of CSV makes parsing and creating CSV files easy to implement and fast compared to other data formats human readability making editing or fixing data simpler 11 and high compressibility leading to smaller data files 12 Alternatively CSV does not support more complex data relations and makes no distinction between null and empty values and in applications where these features are needed other formats are preferred Specification EditRFC 4180 proposes a specification for the CSV format however actual practice often does not follow the RFC and the term CSV might refer to any file that 1 13 is plain text using a character encoding such as ASCII various Unicode character encodings e g UTF 8 EBCDIC or Shift JIS consists of records typically one record per line with the records divided into fields separated by delimiters typically a single reserved character such as comma semicolon or tab sometimes the delimiter may include optional spaces where every record has the same sequence of fields Within these general constraints many variations are in use Therefore without additional information such as whether RFC 4180 is honored a file claimed simply to be in CSV format is not fully specified As a result some applications supporting CSV files have text import wizards that allow users to preview the first few lines of the file and then specify the delimiter character s quoting rules and field trimming History EditComma separated values is a data format that predates personal computers by more than a decade the IBM Fortran level H extended compiler under OS 360 supported CSV in 1972 14 List directed free form input output was defined in FORTRAN 77 approved in 1978 List directed input used commas or spaces for delimiters so unquoted character strings could not contain commas or spaces 15 The term comma separated value and the CSV abbreviation were in use by 1983 16 The manual for the Osborne Executive computer which bundled the SuperCalc spreadsheet documents the CSV quoting convention that allows strings to contain embedded commas but the manual does not specify a convention for embedding quotation marks within quoted strings 17 Comma separated value lists are easier to type for example into punched cards than fixed column aligned data and they were less prone to producing incorrect results if a value was punched one column off from its intended location Comma separated files are used for the interchange of database information between machines of two different architectures The plain text character of CSV files largely avoids incompatibilities such as byte order and word size The files are largely human readable so it is easier to deal with them in the absence of perfect documentation or communication 18 The main standardization initiative transforming de facto fuzzy definition into a more precise and de jure one was in 2005 with RFC 4180 defining CSV as a MIME Content Type 19 Later in 2013 some of RFC 4180 s deficiencies were tackled by a W3C recommendation 20 In 2014 IETF published RFC 7111 describing the application of URI fragments to CSV documents RFC 7111 specifies how row column and cell ranges can be selected from a CSV document using position indexes 21 In 2015 W3C in an attempt to enhance CSV with formal semantics publicized the first drafts of recommendations for CSV metadata standards which began as recommendations in December of the same year 22 General functionality EditCSV formats are best used to represent sets or sequences of records in which each record has an identical list of fields This corresponds to a single relation in a relational database or to data though not calculations in a typical spreadsheet The format dates back to the early days of business computing and is widely used to pass data between computers with different internal word sizes data formatting needs and so forth For this reason CSV files are common on all computer platforms CSV is a delimited text file that uses a comma to separate values many implementations of CSV import export tools allow other separators to be used for example the use of a Sep row as the first row in the csv file will cause Excel to open the file expecting caret to be the separator instead of comma Simple CSV implementations may prohibit field values that contain a comma or other special characters such as newlines More sophisticated CSV implementations permit them often by requiring double quote characters around values that contain reserved characters such as commas double quotes or less commonly newlines Embedded double quote characters may then be represented by a pair of consecutive double quotes 23 or by prefixing a double quote with an escape character such as a backslash for example in Sybase Central CSV formats are not limited to a particular character set 1 They work just as well with Unicode character sets such as UTF 8 or UTF 16 as with ASCII although particular programs that support CSV may have their own limitations CSV files normally will even survive naive translation from one character set to another unlike nearly all proprietary data formats CSV does not however provide any way to indicate what character set is in use so that must be communicated separately or determined at the receiving end if possible Databases that include multiple relations cannot be exported as a single CSV file citation needed Similarly CSV cannot naturally represent hierarchical or object oriented data This is because every CSV record is expected to have the same structure CSV is therefore rarely appropriate for documents created with HTML XML or other markup or word processing technologies Statistical databases in various fields often have a generally relation like structure but with some repeatable groups of fields For example health databases such as the Demographic and Health Survey typically repeat some questions for each child of a given parent perhaps up to a fixed maximum number of children Statistical analysis systems often include utilities that can rotate such data for example a parent record that includes information about five children can be split into five separate records each containing a the information on one child and b a copy of all the non child specific information CSV can represent either the vertical or horizontal form of such data In a relational database similar issues are readily handled by creating a separate relation for each such group and connecting child records to the related parent records using a foreign key such as an ID number or name for the parent In markup languages such as XML such groups are typically enclosed within a parent element and repeated as necessary for example multiple lt child gt nodes within a single lt parent gt node With CSV there is no widely accepted single file solution Standardization EditThe name CSV indicates the use of the comma to separate data fields Nevertheless the term CSV is widely used to refer to a large family of formats that differ in many ways Some implementations allow or require single or double quotation marks around some or all fields and some reserve the first record as a header containing a list of field names The character set being used is undefined some applications require a Unicode byte order mark BOM to enforce Unicode interpretation sometimes even a UTF 8 BOM 1 Files that use the tab character instead of comma can be more precisely referred to as TSV for tab separated values Other implementation differences include the handling of more commonplace field separators such as space or semicolon and newline characters inside text fields One more subtlety is the interpretation of a blank line it can equally be the result of writing a record of zero fields or a record of one field of zero length thus decoding it is ambiguous RFC 4180 and MIME standards Edit The 2005 technical standard RFC 4180 formalizes the CSV file format and defines the MIME type text csv for the handling of text based fields However the interpretation of the text of each field is still application specific Files that follow the RFC 4180 standard can simplify CSV exchange and should be widely portable Among its requirements MS DOS style lines that end with CR LF characters optional for the last line An optional header record there is no sure way to detect whether it is present so care is required when importing Each record should contain the same number of comma separated fields Any field may be quoted with double quotes Fields containing a line break double quote or commas should be quoted If they are not the file will likely be impossible to process correctly If double quotes are used to enclose fields then a double quote in a field must be represented by two double quote characters The format can be processed by most programs that claim to read CSV files The exceptions are a programs may not support line breaks within quoted fields b programs may confuse the optional header with data or interpret the first data line as an optional header and c double quotes in a field may not be parsed correctly automatically OKF frictionless tabular data package Edit In 2011 Open Knowledge Foundation OKF and various partners created a data protocols working group which later evolved into the Frictionless Data initiative One of the main formats they released was the Tabular Data Package Tabular Data package was heavily based on CSV using it as the main data transport format and adding basic type and schema metadata CSV lacks any type information to distinguish the string 1 from the number 1 24 The Frictionless Data Initiative has also provided a standard CSV Dialect Description Format for describing different dialects of CSV for example specifying the field separator or quoting rules 25 W3C tabular data standard Edit In 2013 the W3C CSV on the Web working group began to specify technologies providing higher interoperability for web applications using CSV or similar formats 26 The working group completed its work in February 2016 and is officially closed in March 2016 with the release of a set of documents and W3C recommendations 27 for modeling Tabular Data 28 and enhancing CSV with metadata and semantics Basic rules EditMany informal documents exist that describe CSV formats IETF RFC 4180 summarized above defines the format for the text csv MIME type registered with the IANA Rules typical of these and other CSV specifications and implementations are as follows CSV is a delimited data format that has fields columns separated by the comma character and records rows terminated by newlines A CSV file does not require a specific character encoding byte order or line terminator format some software do not support all line end variations A record ends at a line terminator However line terminators can be embedded as data within fields so software must recognize quoted line separators see below in order to correctly assemble an entire record from perhaps multiple lines All records should have the same number of fields in the same order Data within fields is interpreted as a sequence of characters not as a sequence of bits or bytes see RFC 2046 section 4 1 For example the numeric quantity 65535 may be represented as the 5 ASCII characters 65535 or perhaps other forms such as 0xFFFF 000065535 000E 00 etc but not as a sequence of 2 bytes intended to be treated as a single binary integer rather than as two characters e g the numbers 11264 11519 have a comma as their high order byte span class nb ord span span class p span span class s span span class p span span class o span span class mi 256 span span class o span span class nb ord span span class p span span class s span span class p span span class o span span class mi 256 span span class o span span class mi 255 span If this plain text convention is not followed then the CSV file no longer contains sufficient information to interpret it correctly the CSV file will not likely survive transmission across differing computer architectures and will not conform to the text csv MIME type Adjacent fields must be separated by a single comma However CSV formats vary greatly in this choice of separator character In particular in locales where the comma is used as a decimal separator a semicolon TAB or other character is used instead 1997 Ford E350Any field may be quoted that is enclosed within double quote characters while some fields must be quoted as specified in the following rules and examples 1997 Ford E350 Fields with embedded commas or double quote characters must be quoted 1997 Ford E350 Super luxurious truck Each of the embedded double quote characters must be represented by a pair of double quote characters 1997 Ford E350 Super luxurious truck Fields with embedded line breaks must be quoted however many CSV implementations do not support embedded line breaks 1997 Ford E350 Go get one now they are going fast In some CSV implementations which leading and trailing spaces and tabs are trimmed ignored Such trimming is forbidden by RFC 4180 which states Spaces are considered part of a field and should not be ignored 1997 Ford E350 not same as 1997 Ford E350According to RFC 4180 spaces outside quotes in a field are not allowed however the RFC also says that Spaces are considered part of a field and should not be ignored and Implementers should be conservative in what you do be liberal in what you accept from others RFC 793 section 2 10 when processing CSV files 1997 Ford E350In CSV implementations that do trim leading or trailing spaces fields with such spaces as meaningful data must be quoted 1997 Ford E350 Super luxurious truck Double quote processing need only apply if the field starts with a double quote Note however that double quotes are not allowed in unquoted fields according to RFC 4180 Los Angeles 34 03 N 118 15 W New York City 40 42 46 N 74 00 21 W Paris 48 51 24 N 2 21 03 EThe first record may be a header which contains column names in each of the fields there is no reliable way to tell whether a file does this or not however it is uncommon to use characters other than letters digits and underscores in such column names Year Make Model 1997 Ford E350 2000 Mercury CougarExample EditYear Make Model Description Price1997 Ford E350 ac abs moon 3000 001999 Chevy Venture Extended Edition 4900 001999 Chevy Venture Extended Edition Very Large 5000 001996 Jeep Grand Cherokee MUST SELL air moon roof loaded 4799 00The above table of data may be represented in CSV format as follows Year Make Model Description Price 1997 Ford E350 ac abs moon 3000 00 1999 Chevy Venture Extended Edition 4900 00 1999 Chevy Venture Extended Edition Very Large 5000 00 1996 Jeep Grand Cherokee MUST SELL air moon roof loaded 4799 00 Example of a USA UK CSV file where the decimal separator is a period full stop and the value separator is a comma Year Make Model Length 1997 Ford E350 2 35 2000 Mercury Cougar 2 38 Example of an analogous European CSV DSV file where the decimal separator is a comma and the value separator is a semicolon Year Make Model Length 1997 Ford E350 2 35 2000 Mercury Cougar 2 38 The latter format is not RFC 4180 compliant 29 Compliance could be achieved by the use of a comma instead of a semicolon as a separator and either the international notation for the representation of the decimal mark or the practice of quoting all numbers that have a decimal mark Application support EditSome applications use CSV as a data interchange format to enhance its interoperability exporting and importing CSV Others use CSV as an internal format As a data interchange format the CSV file format is supported by almost all spreadsheets and database management systems Spreadsheets including Apple Numbers LibreOffice Calc and Apache OpenOffice Calc Microsoft Excel also supports a dialect of CSV with restrictions in comparison to other spreadsheet software e g as of 2019 update Excel still cannot export CSV files in the commonly used UTF 8 character encoding and separator is not enforced to be the comma LibreOffice Calc CSV importer is actually a more generic delimited text importer supporting multiple separators at the same time as well as field trimming Relational databases when using standard SQL can export import CSV by the COPY command For example on PostgreSQL is valid span class k COPY span span class w span span class k TO span span class w span span class n t span span class w span span class s1 file csv span span class w span span class k CSV span and span class k COPY span span class w span span class k FROM span span class w span span class n t span span class w span span class s1 file csv span span class w span span class k CSV span 30 Many utility programs on Unix style systems such as cut paste join sort uniq awk can split files on a comma delimiter and can therefore process simple CSV files However this method does not correctly handle commas or new lines within quoted strings Some code and text editors such as Visual Studio Code IntelliJ Notepad CudaText and others support syntax highlighting for CSV files making them easier to read and editAs main or optional internal representation Can be native or foreign but differ from interchange format export import only because it is not necessary to create a copy in another format Some Spreadsheets including LibreOffice Calc offers this option without enforcing user to adopt another format Some relational databases when using standard SQL offer foreign data wrapper FDW For example PostgreSQL offers the span class k CREATE span span class w span span class k FOREIGN span span class w span span class k TABLE span 31 and span class k CREATE span span class w span span class k EXTENSION span span class w span span class n file fdw span 32 to configure any variant of CSV Databases like Apache Hive offer the option to express CSV or csv gz as an internal table format The emacs editor can operate on CSV files using csv nav mode 33 CSV format is supported by libraries available for many programming languages Most provide some way to specify the field delimiter decimal separator character encoding quoting conventions date format etc Software and row limits Edit Each software that works with CSV has its limits on the maximum number of rows CSV files can have Below is a list of common software and its limitations 34 Microsoft Excel 1 048 576 row limit Apple Numbers 1 000 000 row limit Google Sheets 5 000 000 cell limit the product of columns and rows OpenOffice and LibreOffice 1 048 576 row limit Text Editors such as WordPad TextEdit Vim etc no row or cell limit Databases COPY command and FDW no row or cell limit See also EditTab separated values Comparison of data serialization formats Delimiter separated values Delimiter collision Flat file database Simple Data Format Substitute character Null character invisible comma U 2063References Edit a b c d e Shafranovich Y October 2005 Common Format and MIME Type for CSV Files IETF p 1 doi 10 17487 RFC4180 RFC 4180 a b commaSeparatedText Apple Developer Documentation Uniform Type Identifiers Apple Inc CSV Comma Separated Value File Format How To Creativyst Explored Designed Delivered sm Creativyst Software Archived from the original on 1 April 2021 Retrieved 22 August 2023 IBM DB2 Administration Guide IBM Which are the available formats Eurostat Archived from the original on 26 July 2023 Retrieved 22 August 2023 Import or export text txt or csv files Microsoft Support support microsoft com Retrieved 2023 08 16 CSV Files Use cases Benefits and Limitations www oneschema co Retrieved 2023 08 16 CSV Comma Separated Values Retrieved 2017 12 02 CSV Files Retrieved June 4 2014 pandas DataFrame to csv pandas 2 0 3 documentation pandas pydata org Retrieved 2023 08 16 CSV Format History Advantages and Why It Is Still Popular ByteScout 2021 09 15 Retrieved 2023 08 16 Comparison of different file formats in Big Data www adaltas com 2020 07 23 Retrieved 2023 08 16 Comma Separated Values CSV Standard File Format Edoceo Inc Retrieved June 4 2014 IBM FORTRAN Program Products for OS and the CMS Component of VM 370 General Information PDF first ed July 1972 p 17 GC28 6884 0 retrieved February 5 2016 For users familiar with the predecessor FORTRAN IV G and H processors these are the major new language capabilities List Directed I O Fortran 77 Language Reference Oracle SuperCalc spreadsheet package for IBM CP M Retrieved December 11 2017 Comma Separated Value Format File Structure 1983 Retrieved December 11 2017 CSV Comma Separated Values RFC 4180 Retrieved June 4 2014 RFC 4180 Common Format and MIME Type for Comma Separated Values CSV Files doi 10 17487 RFC4180 RFC 4180 Retrieved December 22 2020 See sparql11 results csv tsv the first W3C recommendation scoped in CSV and filling some of RFC 4180 s deficiencies RFC 7111 URI Fragment Identifiers for the text csv Media Type doi 10 17487 RFC7111 RFC 7111 Retrieved December 22 2020 Model for Tabular Data and Metadata on the Web W3C Recommendation 17 December 2015 Retrieved March 23 2016 Creativyst 2010 How To The Comma Separated Value CSV File Format creativyst com retrieved May 24 2010 Tabular Data Package Frictionless Data Specs CSV Dialect Frictionless Data Specs CSV on the Web Working Group W3C CSV WG 2013 Retrieved 2015 04 22 CSV on the Web Repository on GitHub Model for Tabular Data and Metadata on the Web W3C Recommendation Shafranovich 2005 states Within the header and each record there may be one or more fields separated by commas Documentation 14 COPY PostgreSQL 2022 02 10 Retrieved 2022 03 04 Documentation 14 F 35 postgres fdw PostgreSQL 2022 02 10 Retrieved 2022 03 04 Documentation 14 F 14 file fdw PostgreSQL 2022 02 10 Retrieved 2022 03 04 EmacsWiki Csv Nav www emacswiki org Understanding CSV and row limits Retrieved Feb 28 2021 Further reading Edit IBM DB2 Administration Guide LOAD IMPORT and EXPORT File Formats IBM Archived from the original on 2016 12 13 Retrieved 2016 12 12 Has file descriptions of delimited ASCII DEL including comma and semicolon separated and non delimited ASCII ASC files for data transfer Retrieved from https en wikipedia org w index php title Comma separated values amp oldid 1178564113, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.