including letters, digits, puncuation, and other symbols. You
can include string literals in programs by enclosing strings in
datatype. A single character can be represented by a string of
String literals must be contained within a single line. To represent a literal double quote enclose it within a pair of matching single quotes and vice-a-verse for single quotes. The page simple_literals.html gives an example of use of string literals. The string literals are declared simply as
The backslash character
is used to represent characters that are not easy or impossible
to represent in strings. For example,
represents a new line. The backslash is also useful for
representing single quotes
and double quotes
represents the backslash itself. The
escape represents a Unicode character with the four digit
hexadecimal code following. For example,
represents the Chinese character 一 and
represents 五. Unicode Tables are available at [
The concatenation operator
works as expected and the property string.length gives the value
2 for the string "你好", also as expected. Some more simple
examples are given on the page
Again as expected, the first character of "你好" is "你" and the index of "好" is 1.
You can find a comprehensive list of fundamental string
]. They include
valueOf(). The methods are mostly useful but there are many other things
that you may want to do with Chinese text. For example, for a
given character determine whether it is a Chinese character or
not; whether it is a simplified character or not; convert
simplified to tradtional and vice-a-versa, etc.
The String function
s1.localeCompare(s2) supposedly does a locale sensitive comparison. The
function gives -1 if s1 is before s2, 0 if s1 and s2 are equal, and 1 if s1 is after s2.
The page string_compare.html tests this out with the strings
a compared with
一 (Unicode 4e00) compared with
(Unicode 6211). In mainland Chinese dictionaries words a ordered by their pinyin order so we would expect
我 to be before
s1.localeCompare(s2) gives the
opposite in Firefox, IE, and Safari. It appears that each browser uses Unicode order for comparison.
Regular expressions are represented by
RegExp objects. They can be created either with the
RegExp() constructor or with the literal syntax, sandwiching the regular expression pattern
in slash (/) characters. Some of the special symbols in regular expressions match only ASCII characters
and cannot be used with Chinese. In particular,
\w on matches ASCII word characters and
\W matches any character that is not an ASCII word character. The word boundary symbol
\b will not work with Chinese words.
The page pinyin_format.html uses a regular expression to match Pinyin written in the form nin2hao3.
The regular expression
/[a-zA-Z]+[1-4]/ matches any ASCII characters followed by a digit
in the range 1 to 4. The code uses the String
gsub global substitute method to
globally replace the nin2hao3 style with a nínhǎo style.
Copyright Alex Amies 2008. Please send comments to email@example.com.