Nov 9, 2008

How to set the NLS_LANG in Unix Environments

Subject: The correct NLS_LANG setting in Unix Environments

In this Document
Goal
Solution
References
--------------------------------------------------------------------------------
Applies to: Oracle Server - Enterprise Edition - Version: 8.1.7.4 to 10.2.0.0
Information in this document applies to any platform.
GoalHow to set the NLS_LANG correctly in Unix environments:

-1- Check which locale is set and correct this if needed.
-2- Check if the choosen locale is indeed installed and install this if needed.
-3- Check if you telnet/ssh software is properly set up.
-4- Set the NLS_LANG and Test.
-5- What to do if this is not working?
-6- More in depth debugging.

Solution

-1- Check which locale is set and correct this if needed.
---------------------------------------------------------
Please note that the ouput of this command is not exactly the same on all the Unix environments.
Contact your Unix vendor if you have questions.
The example used here is to configure your unix environment so that you can use Unicode (UTF8) with sqlplus on your unix box.
To see your current setup, use the "locale" command like this:

$ locale

example of output:

LANG=fr_FR
LC_CTYPE="fr_FR.iso885915@euro"
LC_COLLATE="fr_FR.iso885915@euro"
LC_MONETARY="fr_FR.iso885915@euro"
LC_NUMERIC="fr_FR.iso885915@euro"
LC_TIME="fr_FR.iso885915@euro"
LC_MESSAGES="fr_FR.iso885915@euro"
LC_ALL=fr_FR.iso885915@euro

Most Unix versions have this as default:
$ locale

LANG=
LC_CTYPE="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_MESSAGES="C"
LC_ALL=

"C" means US7ASCII , this implies that you can only display a-z, A-Z and 0-9 and nothing else.

We recommend to use UTF-8 when possible, this should look like:

$ locale

LANG=en_US
LC_CTYPE="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_ALL=en_US.UTF-8

When you have choosen a value, for example "en_US.UTF-8" on Linux, you can set it like this:

$ export LC_ALL=en_US.UTF-8
or
% setenv LC_ALL en_US.UTF-8

However you should set the needed variables in the user ".profile" file and then log out an in agian and check if the "locale" command returns the wanted output.
If you have questions on how to configure a user environment on your specific Unix/Linux flavor please consult your OS vendor.

-2- Check if the choosen locale is indeed installed and install this if needed.
-------------------------------------------------------------------------------
Please *do* check that you have the defined locale really installed.
To see all installed locales issue:
$ locale -a
sample output:
$ locale -a
POSIX
common
en_US.UTF-8
C
iso_8859_1
iso_8859_15
en_CA
en_CA.ISO8859-1
en_US
en_US.ISO8859-1
en_US.ISO8859-15
en_US.ISO8859-15@euro
fr_CA
fr_CA.ISO8859-1
th
th_TH
th_TH.TIS620
ja

This lists all installed locales for the unix box, for example "fr_FR.iso885915@euro" is missing in the list, so if you want to use it then you need to install this first.
If you set the user environment to a locale that is not installed you will not get an error but it will not work.
Please also note that you need to have installed the exact locale , if you have "fr_FR.UTF8" or "UTF-8" installed but want to use "en_US.UTF8" you need to install "en_US.UTF8".
In the example you see that "en_US.UTF-8" is listed , so we can use it on this server.

-3- Check if you telnet/ssh software is properly set up.
--------------------------------------------------------
You will need *also* to check that your TELNET/SSH software is correctly configured.
It's your telnet software is responible for the conversion of the Unix locale to the client's environment (most likly a windows system).
We suggest that you try first with the free PUTTY client.
Putty download site - http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
When using the "en_US.UTF-8" like in our example on the unix side change the following setting in Putty:
Open the configuration window, navigate to "Window" then to "Translation" and set "Received data assumed to be in which character set" to "UTF-8".
Then navigate to "Window" then to "Appearance" and choose a font in the "Font used in the terminal window" that supports the languages you want to use/see.
For non-asian languages you can use the default "Courier New".
On windows you can use the windows tool "character map" to see what characters are know in a font.
Putty UTF8 how to - http://www.laug.org/2004/11/putty-utf8-howto.html
If it works with Putty but not with your telnet/ssh package then please consult the vendor of your telnet/ssh software.
If you do not use telnet but a "real" unix display please see Note 265090.1 - How to check Unix terminal Environments for the capability to display extended characters.

-4- Set the NLS_LANG and Test.
------------------------------
Once you have 1) configured the locale correctly , 2) have checked that the used locale is really installed and 3) configured your telnet/ssh client, then you can use a NLS_LANG set to LC_TYPE values.
In our "en_US.UTF-8" example this means setting NLS_LANG=AMERICAN_AMERICA.UTF8 (note the difference in notation between UTF-8 (unix) and UTF8 (oracle) )
so, login with your Unix user and then
a) check with locale if it's correctly set (assuming here en_US.UTF8)
b) set the NLS_LANG to match the locale settings
$ export NLS_LANG=AMERICAN_AMERICA.UTF8
or
% setenv NLS_LANG AMERICAN_AMERICA.UTF8
c) connect with sqlplus to your database and select some data.
For example "select UNISTR('\20AC') from dual;" will give the euro symbol when selecting using a 9i or up database that can handle the euro (UTF8,WE8MSWIN1252,...) and a correct UTF-8 or ISO8859-15 Unix enviroment.
If this works, then set the NLS_LANG also in the profile of your user.

-5- What to do if this is not working?
--------------------------------------
If you do not see the expected characters then please double check your settings.
However it's also possible that you have wrong data in your database.
The easy way to check:
Use a windows client, download and install SQL developer http://www.oracle.com/technology/products/database/project_raptor/, connect to your database and see if your data is correctly displayed in that tool.
If it is visible in SQL developer then the data is correct in the database, if not then you have wrong data in your database.

The harder way:
If for example "select ename from scott.emp where empno='7369';" is a select that returns one row of your data then do "select dump(ename,1016),ename from scott.emp where empno='7369';".
You can then look up if the codes match the characters you expect for your database characterset
( select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET'; )
in NOTE.282336 Charts of most current mono-byte Character sets or in case you have a (AL32)UTF8 database use Note:69518.1
Determining the codepoint for UTF8 characters.

If you can't figure out what's wrong log a tar, refer to this note and provide:
* the info asked in Note 226692.1 Finding out your NLS Setup.
* a spool (!not copy paste!) of the output of your version of the
"select dump(ename,1016),ename from scott.emp where empno='7369';" select .

It might also be good to follow Database Character Set Healthcheck,
same here, if you are stuck, log a tar and provide all gatherd information.
note to support: if a ct log's a TAR about this please *do* request a ARE review.

-6- More in depth debugging.
----------------------------
The steps 1-4 should be enough for 99% of the cases, the rest of the note is more in depth debugging
On some platforms, it can be usefull to use the following syntax to have more details about the codepage really used:

$ locale LC_CTYPE | head
example of output in a HP-UX env:
""
""
"iso885915"
""
example of output in a Linux env:
upper;lowe ;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
1
ISO-8859-15
70
84
1
0
1

$ locale LC_CTYPE | head
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
6
UTF-8
70
84
1
0
1

On Solaris, AIX, TRU64, this syntax doesn't give interesting complementary information.
To find more details about these settings:
on Solaris, have a look in /usr/lib/locale.
on AIX, have a look in /usr/lib/nls/README
on TRU64, have a look in /usr/lib/nls
on HP-UX, have a look in /usr/lib/nls/config
on Linux, have a look in /usr/share/locale/locale.alias

How to check the codepoints managed by the O.S.:
To know which code point is generated for a character in a Unix Environment,
you can use the "od" command like this (examples with a iso-8859-1 locale):

$ od -xc
é^D^D
0000000 00e9
351 \0
0000001

as you can see the hexa-decimal code point e9 is corresponding to the "é" (lower e acute)
351 is the corresponding Octal value (Octal is the native mode of the od command).
You can also check the character corresponding to a code point using the "echo" command like this:
for Solaris, AIX, HP-UX, TRU64:
$echo '\0351'
é

for Linux:
$echo -e '\0351'
é

As you can see, echo uses the Octal value, so you need to convert in octal the value you want to check.

ReferencesNote 158577.1 - NLS_LANG Explained (How does Client-Server Character Conversion Work?)
Note 179133.1 - The correct NLS_LANG in a Windows Environment
Note 265090.1 - How to check Unix terminal Environments for the capability to display extended characters.
http://www.xfree86.org/4.2.0/xfontsel.1.html - to see which fonts are known in the xwin env
http://www.xfree86.org/4.2.0/xlsfonts.1.html - to see what fonts are installed on the system
http://publib.boulder.ibm.com/infocenter/pseries/index.jsp?topic=/com.ibm.aix.doc/aixprggd/nlsgdrf/locale_env.htm - IBM locale settings reference
http://docs.hp.com/en/5187-3605/ch02.html - HP Encoding Characters Reference
http://docs.hp.com/en/UXL10N-90302/UXL10N-90302.pdf - HP Language configuration reference
http://developers.sun.com/dev/gadc/faq/locale.html - Sun Solaris locale reference
http://www.opengroup.org/onlinepubs/007908799/xbd/locale.html - UNIX Open Group locale specifications reference
Note 119119.1 - AL32UTF8 / UTF8 (unicode) Database Character Set Implications



1 comment:

  1. Rogério Aguilar SilvaFebruary 4, 2021 at 12:40 PM

    That's pretty accurate, hope this work on my project!
    Thanks for posting it!

    ReplyDelete