CSV format

CSV 檔全名 Comma Seprated Values 是一種純 Text File.


以 Tab 字元分隔的文字格式
以記錄為基礎的 ASCII 文字格式,其中任務或資源記錄的每個欄位都是由清單分隔字元 (通常是定位停駐點) 分隔的。每個任務或資源記錄都是以換行符號結尾。

逗號分隔值 (CSV)


注意事項
  1. 留意在轉存成 CSV 時, 因逗號 , (尤其是在中文資料)所造成的誤判為欄位斷隔符

CSV 格式

這個格式用於輸入和輸出逗號分隔數值(CSV)檔案格式,許多其它程序都用這個檔案格式,比如電子報表。這個模式下產生並識別逗號分隔的 CSV 逃逸機制,而不是使用 PostgreSQL 標準文本模式的逃逸。

每條記錄的值都是用 DELIMITER 字元分隔的。如果數值本身包含分隔字元、QUOTE 字元、NULL 字元串、回車符、換行符,那麼整個數值用 QUOTE 字元前綴和後綴(包圍),並且數值裡任何 QUOTE 字元或 ESCAPE 字元都前導逃逸字元。你也可以使用 FORCE QUOTE 在輸出非 NULL 的指定欄位值時強制引號包圍。

CSV 格式沒有標準的辦法區分一個 NULL 值和一個空字元串。PostgreSQL 的 COPY 通過引號包圍來處理這些。一個當作 NULL 輸出的 NULL 值是沒有引號包圍的,而匹配 NULL 字元串的資料值是用引號包圍的。因此,使用缺省設置時,一個 NULL 是寫做一個無引號包圍的空字元串,而一個空字元串寫做雙引號包圍("")。讀取數值也遵循類似的規則。你可以使用 FORCE NOT NULL 來避免為特定欄位進行 NULL 比較。

因為對於 CSV 格式而言,反斜槓不是特殊字元,資料的結束標誌(\.)可以作為資料值出現。為了避免任何可能的歧意,一個單獨的 \. 資料值在輸出中將被自動使用引號包圍;在輸入中,如果被引號界定,那麼將不會當作資料結束標誌。如果你要載入其它程序創建的、有未用引號界定欄位的檔案,並且其中含有 \. 值,你就必須用引號進行界定。

【注意】CSV 模式下,所有字元都是有效的。一個被空白包圍的引號界定數值,或者任何非 DELIMITER 字元,都會被包含這些字元。如果你從會給 CSV 行填充空白的系統裡導入資料到定長欄位,那麼可能會導致錯誤。如果出現這種情況,你可能需要先處理一下 CSV 檔案,刪除結尾空白,然後再向 PostgreSQL 裡導入資料。

【注意】CSV 模式可以識別和產生引號包圍的回車和換行的 CVS 檔案。因此這些檔案並不像文本模式的檔案那樣嚴格地每條記錄一行。

【注意】許多程序產生奇怪的並且有時候不正確的 CVS 檔案,所以這個檔案格式更像一種慣用格式,而不是一種標準。因此你可能碰到一些不能使用這個機制輸入的檔案,而 COPY 也可能產生一些其它程序不能處理的檔案。



CSV Format

This format is used for importing and exporting the Comma Separated Value (CSV) file format used by many other programs, such as spreadsheets. Instead of the escaping used by PostgreSQL's standard text mode, it produces and recognizes the common CSV escaping mechanism.

The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character. You can also use FORCE QUOTE to force quotes when outputting non-NULL values in specific columns.

The CSV format has no standard way to distinguish a NULL value from an empty string. PostgreSQL's COPY handles this by quoting. A NULL is output as the NULL string and is not quoted, while a data value matching the NULL string is quoted. Therefore, using the default settings, a NULL is written as an unquoted empty string, while an empty string is written with double quotes (""). Reading values follows similar rules. You can use FORCE NOT NULL to prevent NULL input comparisons for specific columns.

Because backslash is not a special character in the CSV format, \., the end-of-data marker, could also appear as a data value. To avoid any misinterpretation, a \. data value appearing as a lone entry on a line is automatically quoted on output, and on input, if quoted, is not interpreted as the end-of-data marker. If you are loading a file created by another application that has a single unquoted column and might have a value of \., you might need to quote that value in the input file.

Note: In CSV mode, all characters are significant. A quoted value surrounded by white space, or any characters other than DELIMITER, will include those characters. This can cause errors if you import data from a system that pads CSV lines with white space out to some fixed width. If such a situation arises you might need to preprocess the CSV file to remove the trailing white space, before importing the data into PostgreSQL.

Note: CSV mode will both recognize and produce CSV files with quoted values containing embedded carriage returns and line feeds. Thus the files are not strictly one line per table row like text-mode files.

Note: Many programs produce strange and occasionally perverse CSV files, so the file format is more a convention than a standard. Thus you might encounter some files that cannot be imported using this mechanism, and COPY might produce files that other programs cannot process.




Comments