[FIXED] cyrillic languages+entities

Submitted by Anonymous on Sun, 02/20/2005 - 12:06
Written by

htmlentities must be replaced with htmlspecialchars in order to be compatible with cyrillic languages. i already tested this and all works well. Also you can add charset definition to language files

I tried this locally but it doesn't seem to work. Some copied Russian still appear as HTML entities. I guess this is due to server and/or browser configuration.

Indeed, just found it. :)

This should be fixed now in CVS (see the link in my signature for the just updated tarball). Thanks for reporting and helping out with this.

I done my own patch for this. But feature with defining $lang_encoding in langfile and showing it in meta tag is critical for multilanguage project

That's present now in the CVS version. Just set

$lang['character_encoding'] = 'windows-1251';

in your language file.

Ok. Also i will add more questions to FAQ :)


That's present now in the CVS version. Just set

$lang['character_encoding'] = 'windows-1251';

in your language file.

One the one hand,.. I hope that this means that UseBB sents

header('Content-Type: text/html; encoding='.$lang['character_encoding']);

apart from the <meta tags signifying it, which is not enough in many latest browsers.

... and on the other, I wanted to know if it takes care of the data stored in the db.
I tried entering utf-8 encoded posts in there but couldn't get them to display correctly when viewing the topic. If the received data from the browser isn't explicitly converted to the proper encoding before they get stored in the db, then the matter of encoding goes beyond the display of characters. There is this article on the subject to see ... ( discussing ways to get the issue fixed )

Ok, searched some more on this and found this from the Wikka site.
Now its time for bed.. Its past 2:30, and tomorrow a have an early start ...

Back up again, another brave day to waste :)

I apologise for repeated posting but it is necessary since I felt that it is vital to define the i18n, multi-language terms.

Internationalization, Localization :
Often reffered to as i18n, l10n respectivly, means that the package can be displayed in many languages. So the software is usable in more than one language.

There is a little difference between i18n and multi-language, though.
With multi-language, you can have many different character encoding in the same page, like this. With this, I think the only one way is to use UTF-8 encoding by default.

The header has already been put into the sources. I'll check the rest of your links later, however UTF-8 is a real pain in the ass with PHP.

This bug has been fixed again in 0.4.1-CVS. It was not possible to decently write any Russian (or whatever else with other character sets) on the default charset setting iso-8859-1.