reading binary files C++

2023-02-15 16:11 问答作者：

I would like to ask for help ... I am starting in C++ and I got this homework at school ... We got to write function bool UTF8toUTF16 (const char * src, const char * dst ); which is supposed to read src file coded in UTF-8 and write it into dst file but in UTF-16. We also mustn't use any other libraries than in my code down...

So the first thing I am trying to do is that I make a file "xx.txt" and in classic Windows notepad I write there for example char 'š'. Then am trying to write a program which reads each char of this file in binary mode byte by byte (or bytes by bytes) and prints it's value... but my program doesn't work like that...

So I have this file 'xx.txt' where is only 'š' which has UTF-8 value 'c5 a1', UTF-16 value '0161' and Unicode value '161' and I suppose result that it will print: i = 161 (hex) or something close to this result at least...

Here is my code so far:

#include <stdio.h>
#include <stdlib.h>
#include <iomanip>
#include <iostream>
#include <f开发者_运维问答stream>

using namespace std;

int main ( void ) {
    char name[] = "xx.txt";
    fstream F ( name, ios::in | ios::binary );
    unsigned int i;
    while( F.read ((char *) & i, 2))
    /* I dont know what size to write there - I would guess it s '2' - because I need 2     bytes for the char with hexUTF-16 code '0161', but 2 doesnt work*/
    cout << "i = " << hex << i << " (hex) ";
    cout << endl;
    F.close();
    system("PAUSE");
    return 0;}

Thanks in advance

Nikolas Jíša

You don't know how big a character is in utf8 until you finish parsing it, you need to read "chars" one at a time until you have a complete utf8 character.

edit - you don't say what you are getting as an output - but I suspect it's a byte ordering issue.
You might be better reading the input (if you know it is always a 16bit value) into a char array and then looking at the individual bytes.

See http://www.joelonsoftware.com/articles/Unicode.html

If your input is in UTF-8, you need to read one byte at a time, not two (you'll want i to have type unsigned char). This gives you a stream of binary data, which you need to decode following the UTF-8 Specification, which will yield a stream of unsigned ints (Unicode code points), which you'll then need to re-encode according to the UTF-16 specification.

It depends. If the role of the class is to contain such objects (e.g. a container class), then its very idiomatic, and the normal way of doing things. In most other cases, however, it is considered preferrable to use getter and setter methods. Not necessarily named getXxx and setXxx---the most frequent naming convention I've seen uses m_attr for the name of the attribute, and simply attr for the name of both the getter and the setter. (Operator overloading will choose between them according to the number of arguments.)

-- James Kanze

reading binary files C++

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

Best solution for private video database [closed]

imessage会显示已读吗？