Skip to content

關於CSV的各種編碼,還有BOM的雜記

各種CSV格式

Windows的Notepad++轉換編碼後輸出,用hexdump看內容結果。

欄位名稱加雙引號

d 的編碼: 64

00000000  22 64 61 74 61 49 44 22  2c 22 65 76 65 6e 74 49  |"dataID","eventI|
00000010  44 22 2c 22 65 76 65 6e  74 44 61 74 65 22 2c 22  |D","eventDate","|
00000020  73 65 61 73 6f 6e 22 2c  22 79 65 61 72 22 2c 22  |season","year","|

欄位名稱沒有雙引號,沒有BOM header

d 的編碼: 64

00000000  64 61 74 61 49 44 2c 65  76 65 6e 74 49 44 2c 65  |dataID,eventID,e|
00000010  76 65 6e 74 44 61 74 65  2c 73 65 61 73 6f 6e 2c  |ventDate,season,|
00000020  79 65 61 72 2c 72 65 67  69 6f 6e 2c 6c 6f 63 61  |year,region,loca|

轉換成UTF-8 BOM

開頭: EF BB BF

d 的編碼: 64 (同ASCII)

00000000  ef bb bf 22 64 61 74 61  49 44 22 2c 22 65 76 65  |..."dataID","eve|
00000010  6e 74 49 44 22 2c 22 65  76 65 6e 74 44 61 74 65  |ntID","eventDate|
00000020  22 2c 22 73 65 61 73 6f  6e 22 2c 22 79 65 61 72  |","season","year|

轉換成UTF-16 LE BOM (Little Endian)

開頭: FF FE (11111111 11111110)

d 的編碼: 64 00

00000000  ff fe 22 00 64 00 61 00  74 00 61 00 49 00 44 00  |..".d.a.t.a.I.D.|
00000010  22 00 2c 00 22 00 65 00  76 00 65 00 6e 00 74 00  |".,.".e.v.e.n.t.|
00000020  49 00 44 00 22 00 2c 00  22 00 65 00 76 00 65 00  |I.D.".,.".e.v.e.|

轉換成UTF-16 BE BOM (Big Endian)

開頭: FE FF (11111110 11111111)

d 的編碼: 00 64

00000000  fe ff 00 22 00 64 00 61  00 74 00 61 00 49 00 44  |...".d.a.t.a.I.D|
00000010  00 22 00 2c 00 22 00 65  00 76 00 65 00 6e 00 74  |.".,.".e.v.e.n.t|
00000020  00 49 00 44 00 22 00 2c  00 22 00 65 00 76 00 65  |.I.D.".,.".e.v.e|

其他:

  • 用LibreOffice、Google Spreadsheet匯出的預設欄位沒有雙引號,也沒加BOM header

(有空再來補各種版本Excel的匯出)