The cause of the issue is described here (Rus). Briefly: GDAL>=1.9 attemts to re-encode the .dbf-file to UTF-8 on the basis of the LDID (Language Driver ID) written in .dbf header. But unfortunately LDID is usually missing, and in particular QGIS does not write it to the .dbf-file it creates. In case when LDID is missing, GDAL>=1.9 assumes that encoding of the .dbf-file is ISO8859_1 (Latin-1) which makes non-Latin characters unreadable.
The workaround I'm currently using is creating additional .cpg-file, that contains the ID of the encoding used. For example if encoding is Windows-1251, .cpg-file contains the following record: "1251" (without quotes). When .cpg-file is present, GDAL>=1.9 + QGIS works just fine.
UPD: on some OS you will need to use ID from Additional ID column instead of Encoding ID column.
UPD3: There is another workaround. You can open .dbf-file in Libre Office Calc (Open Office Calc) providing encoding needed and save it from there. This will write necessary header to .dbf-file and QGIS will open attributes correctly. Note that this also will make fields names written in upper case.
UPD4: there is a plugin for encoding fixing available.
Here you are a table of the encoding IDs (taken from here):
Encoding ID | Encodind name | Additional ID | Other names |
1252 | Western | iso-8859-1 except when 128-159 is used, use "Windows-1252" |
iso8859-1, iso_8859-1, iso-8859-1, ANSI_X3.4-1968, iso-ir-6, ANSI_X3.4-1986, ISO_646, irv:1991, ISO646-US, us, IBM367, cp367, csASCII, latin1, iso_8859-1:1987, iso-ir-100, ibm819, cp819, Windows-1252 |
20105 | us-ascii | us-acii, ascii | |
28592 | Central European (ISO) | iso-8859-2 | iso8859-2, iso-8859-2, iso_8859-2, latin2, iso_8859-2:1987, iso-ir-101, l2, csISOLatin2 |
1250 | Central European (Windows) | Windows-1250 | Windows-1250, x-cp1250 |
1251 | Cyrillic (Windows) | Windows-1251 | Windows-1251, x-cp1251 |
1253 | Greek (Windows) | Windows-1253 | Windows-1253 |
1254 | Turkish (Windows) | Windows-1254 | Windows-1254 |
932 | Japanese (Shift-JIS) | shift_jis | shift_jis, x-sjis, ms_Kanji, csShiftJIS, x-ms-cp932 |
51932 | Japanese (EUC) | x-euc-jp | Extended_UNIX_Code_Packed_Format_for_Japanese, csEUCPkdFmtJapanese, x-euc-jp, x-euc |
50220 | Japanese (JIS) | iso-2022-jp | csISO2022JP, iso-2022-jp |
1257 | Baltic (Windows) | Windows-1257 | windows-1257 |
950 | Traditional Chinese (BIG5) | big5 | big5, csbig5, x-x-big5 |
936 | Simplified Chinese (GB2312) | gb2312 | GB_2312-80, iso-ir-58, chinese, csISO58GB231280, csGB2312, gb2312 |
20866 | Cyrillic (KOI8-R) | koi8-r | csKOI8R, koi8-r |
949 | Korean (KSC5601) | ks_c_5601 | ks_c_5601, ks_c_5601-1987, korean, csKSC56011987 |
1255 (logical) | Hebrew (ISO-logical) | Windows-1255 | iso-8859-8i |
1255 (visual) | Hebrew (ISO-Visual) | iso-8859-8 | ISO-8859-8 Visual, ISO-8859-8 , ISO_8859-8, visual |
862 | Hebrew (DOS) | dos-862 | dos-862 |
1256 | Arabic (Windows) | Windows-1256 | Windows-1256 |
720 | Arabic (DOS) | dos-720 | dos-720 |
874 | Thai | Windows-874 | Windows-874 |
1258 | Vietnamese | Windows-1258 | Windows-1258 |
65001 | Unicode UTF-8 | UTF-8 | UTF-8, unicode-1-1-utf-8, unicode-2-0-utf-8 |
65000 | Unicode UTF-7 | UNICODE-1-1-UTF-7 | utf-7, UNICODE-1-1-UTF-7, csUnicode11UTF7, utf-7 |
50225 | Korean (ISO) | ISO-2022-KR | ISO-2022-KR, csISO2022KR |
52936 | Simplified Chinese (HZ) | HZ-GB-2312 | HZ-GB-2312 |
28594 | Baltic (ISO) | iso-8869-4 | ISO_8859-4:1988, iso-ir-110, ISO_8859-4, ISO-8859-4, latin4, l4, csISOLatin4 |
28585 | Cyrillic (ISO) | iso_8859-5 | ISO_8859-5:1988, iso-ir-144, ISO_8859-5, ISO-8859-5, cyrillic, csISOLatinCyrillic, csISOLatin5 |
28597 | Greek (ISO) | iso-8859-7 | ISO_8859-7:1987, iso-ir-126, ISO_8859-7, ISO-8859-7, ELOT_928, ECMA-118, greek, greek8, csISOLatinGreek |
28599 | Turkish (ISO) | iso-8859-9 | ISO_8859-9:1989, iso-ir-148, ISO_8859-9, ISO-8859-9, latin5, l5, csISOLatin5 |
Encoding ID = 1251 is not work in QGIS! :(
ReplyDeleteSeems that you are working with Windows, because I've heard about that problem only from Windows users. In Linux everything work fine.
DeleteMy final guess is that the encoding of the *.cpg file itself should be UTF-8.
Woooow !! :o) The workaround with the .cpg file saved my day ! :o)
ReplyDeleteThank you very much!
You are welcome!
DeleteThat russian QGis really saved me, thank you :)
ReplyDeleteI created a .cpg file (just in the editor, calling it .cpg) and entered 1251 for cyrillic, but it didn't change anything. When I entered UTF-8, there was finally at least a change in the attribute table. After that I could enter 1251 leading to correct display of cyrillic!Maybe the .cpg had to be recognized as such first or something...
ReplyDeleteHave you tried "Windows-1251" or "cp1251" instead of just "1251"? First UPD to this post suggests to try either Encoding ID or Additional ID.
DeleteHow to convert font to encoding cp1251 and send to fiscal printer
ReplyDeleteI have no idea what you are talking about. Seems that you found wrong place to ask.
Delete