View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002126 | FreeCAD | Bug | public | 2015-05-29 11:56 | 2015-12-15 13:15 |
Reporter | okamotom | Assigned To | wmayer | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | 0.15 | ||||
Fixed in Version | 0.16 | ||||
Summary | 0002126: Fails to load .FCStd file which contains many MultiByte-Char strings | ||||
Description | a error has shown while loading FCStd file in report view. Fatal Error at file xxxx/xxxx.FCStd, line xxx, char xx Invalid Document.xml: invalid byte 'xx' at position 1 of a 1-byte sequence and restored document is broken. (lacks informations.) | ||||
Steps To Reproduce | 1. Start FreeCAD 2. Open JapaneseLabel2.FCStd 3. wait some time.. 4. error occurs (in report view) Fatal Error at file xxxxxxxxxxxxx/JapaneseLabel2.FCStd, line 1038, char 19 Invalid Document.xml: invalid byte '?E' at position 1 of a 1-byte sequence 5. after loading FCStd, objects in document Shape to Shape057 has Label with Japanese "??????????....." Shape058 to Shape230 has Label with English (Same as ObjectName) but, originally All objects have Japanese Label. OS: Windows 8.1 Word size of OS: 64-bit Word size of FreeCAD: 64-bit Version: 0.16.5005 (Git) Build type: Release Branch: master Hash: a4441f2a41672cf1e39269dc5ce0c5dc8608acb3 Python version: 2.7.8 Qt version: 4.8.6 Coin version: 4.0.0a OCC version: 6.8.0.oce-0.17 and Binary release of 0.15(Win64) acts same as above. | ||||
Additional Information | Base/InputSource.cpp StdInputStream::readBytes( XMLByte* const toFill, const XMLSize_t maxToRead ) modifies some bytes to '?' at near of end of buffer. (to avoid invalid Mutibyte-char ?) and xerces/util/XMLUTF8Transcoder.cpp ?XMLUTF8Transcoder::transcodeFrom() raises exception of 'invalid sequence of UTF8'. Does readBytes() really need to fix incomplete Multibyte-char ? xerces may do it, I think. So, readBytes() can leave it. | ||||
Tags | No tags attached. | ||||
FreeCAD Information | |||||
|
|
|
|
|
here is my patch.
|
|
The check was added more than four years ago: http://free-cad.svn.sourceforge.net/viewvc/free-cad/trunk/src/Base/InputSource.cpp?r1=3431&r2=4539 I can't remember why it was added but if I am not totally wrong then xerces had problems with invalid characters in xml leading to strange behaviour or even a crash. |
|
In case you get a project file with a broken (Gui)Document.xml then an exception is raised with no chance to continue loading the rest. The check is supposed to replace invalid characters with '?' so that xerces (or whoever) doesn't raise an exception and the project can be fully loaded. In bug 0000412 I found an example that doesn't load with the check commented out. |
|
Here is an example to check if utf-8 input has errors: http://stackoverflow.com/questions/18227530/check-if-utf-8-string-is-valid-in-qt When using QTextCodec::toUnicode it reports a couple of invalid characters. |
|
The point is that the method StdInputStream::readBytes doesn't get the whole xml file but only chunks of 49152 characters. Now the problem is that the buffer is split exactly on a multi-byte character and the next time the method is called the check just starts at the beginning of the current buffer and then of course the first character seems broken. So, to fix this issue we have to remember for the previous check how many characters are missing for the next time and then continue from there. |
|
Note to myself: Example to check buffer with Qt functions: QTextCodec::ConverterState state; state.flags |= QTextCodec::IgnoreHeader; state.flags |= QTextCodec::ConvertInvalidToNull; QTextCodec *codec = QTextCodec::codecForName("UTF-8"); const QString text = codec->toUnicode((char *)toFill, len, &state); if (state.remainingChars > 0) { std::cerr << "Not a complete UTF-8 sequence." << state.remainingChars << std::endl; } if (state.invalidChars > 0) { std::cerr << "Not a valid UTF-8 sequence." << state.invalidChars << std::endl; QByteArray ba = codec->fromUnicode(text); for (int i=0; i<ba.length(); i++) { toFill[i] = ba[i]; if (toFill[i] == '\0') toFill[i] = '?'; } return ba.length(); } return len; |
Date Modified | Username | Field | Change |
---|---|---|---|
2015-05-29 11:56 | okamotom | New Issue | |
2015-05-29 11:56 | okamotom | File Added: JapaneseLabel2.FCStd | |
2015-05-29 12:00 | okamotom | File Added: AfterLoadFails.png | |
2015-06-09 06:35 | okamotom | Note Added: 0006197 | |
2015-06-09 06:36 | okamotom | Note Edited: 0006197 | |
2015-06-09 06:40 | okamotom | Note Edited: 0006197 | |
2015-06-09 06:44 | okamotom | Note Edited: 0006197 | |
2015-06-09 06:44 | okamotom | Note Edited: 0006197 | |
2015-09-19 20:57 | wmayer | Note Added: 0006436 | |
2015-09-19 21:21 | wmayer | Note Added: 0006438 | |
2015-09-20 07:32 | wmayer | Note Added: 0006439 | |
2015-09-20 09:16 | wmayer | Note Added: 0006440 | |
2015-09-20 09:21 | wmayer | Note Added: 0006441 | |
2015-09-20 10:29 | wmayer | Changeset attached | => FreeCAD Master master 1dad42c2 |
2015-09-20 10:29 | wmayer | Assigned To | => wmayer |
2015-09-20 10:29 | wmayer | Status | new => closed |
2015-09-20 10:29 | wmayer | Resolution | open => fixed |
2015-12-15 13:15 | yorik | Fixed in Version | => 0.16 |