Java Programming/Unicode Learning | Java Programming/Unicode Facts and Resources | DefaultLogic For Business




Unicode


Most Java program text consists of ASCII characters, but any Unicode character can be used as part of identifier names, in comments, and in character and string literals. For example, ? (which is the Greek Lowercase Letter pi) is a valid Java identifier:

Example Code section 3.100: Pi.
1 double ? = Math.PI;

and in a string literal:

Example Code section 3.101: Pi literal.
1 String pi = "?";

Unicode escape sequences

Unicode characters can also be expressed through Unicode Escape Sequences. Unicode escape sequences may appear anywhere in a Java source file (including inside identifiers, comments, and string literals).

Unicode escape sequences consist of

  1. a backslash '\' (ASCII character 92, hex 0x5c),
  2. a 'u' (ASCII 117, hex 0x75)
  3. optionally one or more additional 'u' characters, and
  4. four hexadecimal digits (the characters '0' through '9' or 'a' through 'f' or 'A' through 'F').

Such sequences represent the UTF-16 encoding of a Unicode character. For example, 'a' is equivalent to '\u0061'. This escape method does not support characters beyond U+FFFF or you have to make use of surrogate pairs.[1]

Any and all characters in a program may be expressed in Unicode escape characters, but such programs are not very readable, except by the Java compiler - in addition, they are not very compact.

One can find a full list of the characters here.

? may also be represented in Java as the Unicode escape sequence \u03C0. Thus, the following is a valid, but not very readable, declaration and assignment:

Example Code section 3.102: Unicode escape sequences for Pi.
1 double \u03C0 = Math.PI;

The following demonstrates the use of Unicode escape sequences in other Java syntax:

Example Code section 3.103: Unicode escape sequences in a string literal.
1 // Declare Strings pi and quote which contain \u03C0 and \u0027 respectively:
2 String pi = "\u03C0";
3 String quote = "\u0027";

Note that a Unicode escape sequence functions just like any other character in the source code. E.g., \u0022 (double quote, ") needs to be quoted in a string just like ".

Example Code section 3.104: Double quote.
1 // Declare Strings doubleQuote1 and doubleQuote2 which both contain " (double quote):
2 String doubleQuote1 = "\"";
3 String doubleQuote2 = "\\u0022"; // "\u0022" doesn't work since """ doesn't work.

International language support

The language distinguishes between bytes and characters. Characters are stored internally using UCS-2, although as of J2SE 5.0, the language also supports using UTF-16 and its surrogates. Java program source may therefore contain any Unicode character.

The following is thus perfectly valid Java code; it contains Chinese characters in the class and variable names as well as in a string literal:

Computer code Code listing 3.50.java
1 public class ? {
2     private String  = "?";
3 }

References

  1. ? "3.1 Unicode", The Java(TM) Language Specification [1], Java SE 7 Edition, pp. 15-16.



  This article uses material from the Wikipedia page available here. It is released under the Creative Commons Attribution-Share-Alike License 3.0.

Java_Programming/Unicode
 



 

Connect with defaultLogic
What We've Done
Led Digital Marketing Efforts of Top 500 e-Retailers.
Worked with Top Brands at Leading Agencies.
Successfully Managed Over $50 million in Digital Ad Spend.
Developed Strategies and Processes that Enabled Brands to Grow During an Economic Downturn.
Taught Advanced Internet Marketing Strategies at the graduate level.


Manage research, learning and skills at defaultlogic.com. Create an account using LinkedIn to manage and organize your omni-channel knowledge. defaultlogic.com is like a shopping cart for information -- helping you to save, discuss and share.


  Contact Us