Reading a Unicode File (C++)?
I’m creating a problem getting my plan to effectively read any Unicode track.What I am just doing will be opening a unicode document in binary mode and reading it with the getline job.For most cases this has worked with luck.
Even so, getline would seem to solely read upwards until the idea finds the ascii newline nature 0x0a.Because it may not be looking for just a unicode endline (0x000a) it is going to sometimes prevent where it’s actually not supposed to be able to.
For example:the Far east character is usually represented inside hex since 0x0a4e, so as an alternative to reading the character it can interpret the very first half regarding it as a newline.
How might I go about fixing this challenge.
Sounds like your computer data is in true UTF-16.Unlike the greater common UTF-8, which often C++ can easily handle along with wchar_t/wifstream/wstring/wcout plus related characteristics, only the recent C++ compilers handle UTF-16, given it was added to the different language common only lately.
If you are compiler is new more than enough, you can easily use
a sexually transmitted disease::basic_ifstream file(“test.txt”); // not necessarily binary mode
a sexually transmitted disease::basic_string series;
getline(file, series, u’\n’);
analyze:
https://ideone.com/IZ2nc
If you don’t have char16_t support yet, then you need to use some sort of third-party catalogue, the most popular is ICU coming from
http://icu-project.org/
Should you prefer to help read the file around binary function and deal with each byte singularly, then, obviously, you can not use getline().Subsequent this strategy, I might read your whole file, inside binary manner, into some sort of byte string and use std::string’s functions to find the two-byte endline substring or just run the loop reading right two-byte range:while( document.read(data, 2) ) // examination if data contains 0x0a as well as do whatever it is advisable to do.
Leave a Reply
You must be logged in to post a comment.