Felicia Heng avatar
Written by Felicia Heng
Updated over a week ago


GSM Charset: The GSM 03.38 charset is the standard character set for text messaging on GSM-based cell phones. All GSM handsets and network elements support the GSM 7-bit alphabet. The basic GSM charset contains the letters A to Z (uppercase and lowercase), numbers, special symbols and several symbols from the Greek alphabet.

Escape Characters: Some characters in the GSM 03.38 extension table can only be used at the cost of two characters. The GSM charset uses 7-bit alphabet encoding, but the escape characters require 14 bits to encode, thus taking up two characters. These symbols are: |, ^, {, }, €, [, ~, ] and \.

Unicode Symbols: Unicode is a standard for encoding, handling and representing the text expressed in many of the world’s writing systems. The latest list of Unicode symbols contains over 120,000 characters from multiple symbol sets and 129 historic and modern scripts.

Unicode Encoding: Compared to the GSM charset, Unicode encoding supports a huge range of languages and symbols. However, if your text message contains a symbol that isn’t in the 7-bit alphabet, UCS-2 encoding must be used. This type of encoding takes up a lot of space, thus reducing the number of characters allowed in a message to 70.

Count the number of characters in your text message. The standard length for text messages is 160, but using the Unicode charset will decrease this length to 70 characters. In addition, certain characters from the GSM 03.38 charset require ‘escape characters’. These escape characters take up two characters (14 bits) to encode. So, even if you have 160 GSM characters, the message may be split if it contains one such symbol.

The best way to reduce the length of an SMS message is to replace such characters (usually Unicode characters, GSM characters that require Unicode encoding or escape characters) with a GSM equivalent.

Did this answer your question?