String handling was a bit different in 'C' world. Everything was ascii C string
But now the things are changed. Text is unicode now. In most of the languages, there are chances that an API exists which deals string as sequences of UTF-16s. NSString also is a sequence of UTF-16 units what we call unichars.
So the first thing which comes in mind while dealing with NSStrings is that they are similar to old c strings except that the characters are little wider and the right thing to do is to go and iterate through the string character by character. But that is not the way unicode works.
A user visible character can be composed of more than one unichar. User visible characters also called as:
So if you want to iterate over user visible characters, then you just can't go through each character of NSString. Some characters might be composed of more than one character sequences.
- Character clusters.
- Grapheme clusters
- Composed character sequence (This term is also used by Apple in its documentation)
Fox example (From apple's video):
So in order to iterate through User Visible Characters, you need to user more Unicode Savvy APIs provided by iOS.
If you want to find the first character in NSString which also contains emojis. Emojis are also the example of characters which consists of more than one unichars. So, first thing which comes to mind is:
But when you run this you will not get firstCharacter as expected. Instead of getting that first laughing emoji, you will get some other random emoji. This happens because, here in string, first user visible character is not actually a single unichar but this is actually a composition of two unichars(0xD83D 0xDE03).
characterAtIndex gives single unichar(0xD83D) and thus character corresponding to that unichar is shown.
To correctly get the first visible character(i.e. both 0xD83D 0xDE03), we should do it like:
This correctly gets the first smiling emoji. There are various other APIs provided by Apple to correctly iterate through User visible characters. You can check them in Apple docs.
The most solid way is to use the
NSString methods that are sensitive to these characters. You would probably be interested in the WWDC2011 - Session 128 - Advanced Text Processing video. It talks extensively about just this subject. Pay attention to the part about "Composed Character Sequences"