What is the cause for the degradation of environment?
Capitalism, corruption, consuming society? - OVERPOPULATION!
Please, save the Planet - kill yourself...
Pages
▼
Wednesday, March 28, 2012
Sunday, March 11, 2012
QGIS and GDAL>=1.9 Encoding Issue: a Workaround
The cause of the issue is described here (Rus). Briefly: GDAL>=1.9 attemts to re-encode the .dbf-file to UTF-8 on the basis of the LDID (Language Driver ID) written in .dbf header. But unfortunately LDID is usually missing, and in particular QGIS does not write it to the .dbf-file it creates. In case when LDID is missing, GDAL>=1.9 assumes that encoding of the .dbf-file is ISO8859_1 (Latin-1) which makes non-Latin characters unreadable.
The workaround I'm currently using is creating additional .cpg-file, that contains the ID of the encoding used. For example if encoding is Windows-1251, .cpg-file contains the following record: "1251" (without quotes). When .cpg-file is present, GDAL>=1.9 + QGIS works just fine.
UPD: on some OS you will need to use ID from Additional ID column instead of Encoding ID column.
UPD3: There is another workaround. You can open .dbf-file in Libre Office Calc (Open Office Calc) providing encoding needed and save it from there. This will write necessary header to .dbf-file and QGIS will open attributes correctly. Note that this also will make fields names written in upper case.
UPD4: there is a plugin for encoding fixing available.
Here you are a table of the encoding IDs (taken from here):
Encoding ID | Encodind name | Additional ID | Other names |
1252 | Western | iso-8859-1 except when 128-159 is used, use "Windows-1252" |
iso8859-1, iso_8859-1, iso-8859-1, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO_646, irv:1991, ISO646-US, us, IBM367, cp367, csASCII, latin1, iso_8859-1:1987, iso-ir-100, ibm819, cp819, Windows-1252 |
20105 | us-ascii | us-acii, ascii | |
28592 | Central European (ISO) | iso-8859-2 | iso8859-2, iso-8859-2, iso_8859-2, latin2, iso_8859-2:1987, iso-ir-101, l2, csISOLatin2 |
1250 | Central European (Windows) | Windows-1250 | Windows-1250, x-cp1250 |
1251 | Cyrillic (Windows) | Windows-1251 | Windows-1251, x-cp1251 |
1253 | Greek (Windows) | Windows-1253 | Windows-1253 |
1254 | Turkish (Windows) | Windows-1254 | Windows-1254 |
932 | Japanese (Shift-JIS) | shift_jis | shift_jis, x-sjis, ms_Kanji, csShiftJIS, x-ms-cp932 |
51932 | Japanese (EUC) | x-euc-jp | Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese, x-euc-jp, x-euc |
50220 | Japanese (JIS) | iso-2022-jp | csISO2022JP, iso-2022-jp |
1257 | Baltic (Windows) | Windows-1257 | windows-1257 |
950 | Traditional Chinese (BIG5) | big5 | big5, csbig5, x-x-big5 |
936 | Simplified Chinese (GB2312) | gb2312 | GB_2312-80, iso-ir-58, chinese, csISO58GB231280, csGB2312, gb2312 |
20866 | Cyrillic (KOI8-R) | koi8-r | csKOI8R, koi8-r |
949 | Korean (KSC5601) | ks_c_5601 | ks_c_5601, ks_c_5601-1987, korean, csKSC56011987 |
1255 (logical) | Hebrew (ISO-logical) | Windows-1255 | iso-8859-8i |
1255 (visual) | Hebrew (ISO-Visual) | iso-8859-8 | ISO-8859-8 Visual, ISO-8859-8 , ISO_8859-8, visual |
862 | Hebrew (DOS) | dos-862 | dos-862 |
1256 | Arabic (Windows) | Windows-1256 | Windows-1256 |
720 | Arabic (DOS) | dos-720 | dos-720 |
874 | Thai | Windows-874 | Windows-874 |
1258 | Vietnamese | Windows-1258 | Windows-1258 |
65001 | Unicode UTF-8 | UTF-8 | UTF-8, unicode-1-1-utf-8, unicode-2-0-utf-8 |
65000 | Unicode UTF-7 | UNICODE-1-1-UTF-7 | utf-7, UNICODE-1-1-UTF-7, csUnicode11UTF7, utf-7 |
50225 | Korean (ISO) | ISO-2022-KR | ISO-2022-KR, csISO2022KR |
52936 | Simplified Chinese (HZ) | HZ-GB-2312 | HZ-GB-2312 |
28594 | Baltic (ISO) | iso-8869-4 | ISO_8859-4:1988, iso-ir-110, ISO_8859-4, ISO-8859-4, latin4, l4, csISOLatin4 |
28585 | Cyrillic (ISO) | iso_8859-5 | ISO_8859-5:1988, iso-ir-144, ISO_8859-5, ISO-8859-5, cyrillic, csISOLatinCyrillic, csISOLatin5 |
28597 | Greek (ISO) | iso-8859-7 | ISO_8859-7:1987, iso-ir-126, ISO_8859-7, ISO-8859-7, ELOT_928, ECMA-118, greek, greek8, csISOLatinGreek |
28599 | Turkish (ISO) | iso-8859-9 | ISO_8859-9:1989, iso-ir-148, ISO_8859-9, ISO-8859-9, latin5, l5, csISOLatin5 |