unicode - How to do a true reverse of a String in Java, including code points outside the BMP? -


one of primary goals of java represent each , every glyph in language using basic primitive type; when java born, there unicode, , @ time unicode defined glyphs numbers, of less or equal 65535.

hence char born in java: unsigned, 16bit integer.

however, things have changed in unicode world. there exist numerous glyphs number greater 65535.

while java has acknowledged , represents such code points using surrogate pairs (essentially, char utf-16 code unit), standard jdk not provide method reverse string code point wise (stringbuilder#reverse, instance, cares individual chars).

assuming java 8, how code method true string reversing, is, taking code points outside bmp account?

one such method follows:

public static string truereverse(final string input) {     final deque<integer> queue = new arraydeque<>();     input.codepoints().foreach(queue::addfirst);      final stringbuilder sb = new stringbuilder();     queue.foreach(sb::appendcodepoint);      return sb.tostring(); } 

not optimized, functional. try this, instance:

public final class test {     public static string truereverse(final string input)     {         final deque<integer> queue = new arraydeque<>();         input.codepoints().foreach(queue::addfirst);          final stringbuilder sb = new stringbuilder();         queue.foreach(sb::appendcodepoint);          return sb.tostring();     }      public static void main(final string... args)     {         final string input = "abc\ud83d\udca9de";          system.out.println(truereverse(input));     } } 

yes, happens use a defined character... now, font may, or may not, display character correctly.

note how unicode character coded in java string literal: \ud83d\udca9.


Comments