View Issue Details

IDProjectCategoryView StatusLast Update
0002126FreeCADBugpublic2015-12-15 13:15
Reporterokamotom Assigned Towmayer  
PrioritynormalSeverityminorReproducibilityalways
Status closedResolutionfixed 
Product Version0.15 
Fixed in Version0.16 
Summary0002126: Fails to load .FCStd file which contains many MultiByte-Char strings
Descriptiona error has shown while loading FCStd file in report view.
    Fatal Error at file xxxx/xxxx.FCStd, line xxx, char xx
    Invalid Document.xml: invalid byte 'xx' at position 1 of a 1-byte sequence

and restored document is broken. (lacks informations.)
Steps To Reproduce1. Start FreeCAD
2. Open JapaneseLabel2.FCStd
3. wait some time..
4. error occurs (in report view)
    Fatal Error at file xxxxxxxxxxxxx/JapaneseLabel2.FCStd, line 1038, char 19
    Invalid Document.xml: invalid byte '?E' at position 1 of a 1-byte sequence
5. after loading FCStd, objects in document
     Shape to Shape057 has Label with Japanese "??????????....."
     Shape058 to Shape230 has Label with English (Same as ObjectName)
   but, originally All objects have Japanese Label.

OS: Windows 8.1
Word size of OS: 64-bit
Word size of FreeCAD: 64-bit
Version: 0.16.5005 (Git)
Build type: Release
Branch: master
Hash: a4441f2a41672cf1e39269dc5ce0c5dc8608acb3
Python version: 2.7.8
Qt version: 4.8.6
Coin version: 4.0.0a
OCC version: 6.8.0.oce-0.17

and Binary release of 0.15(Win64) acts same as above.
Additional Information Base/InputSource.cpp StdInputStream::readBytes( XMLByte* const toFill, const XMLSize_t maxToRead )

modifies some bytes to '?' at near of end of buffer. (to avoid invalid Mutibyte-char ?)
and

 xerces/util/XMLUTF8Transcoder.cpp ?XMLUTF8Transcoder::transcodeFrom()

raises exception of 'invalid sequence of UTF8'.

Does readBytes() really need to fix incomplete Multibyte-char ?
xerces may do it, I think. So, readBytes() can leave it.
TagsNo tags attached.
FreeCAD Information

Activities

okamotom

2015-05-29 11:56

reporter  

JapaneseLabel2.FCStd (Attachment missing)

okamotom

2015-05-29 12:00

reporter  

AfterLoadFails.png (Attachment missing)

okamotom

2015-06-09 06:35

reporter   ~0006197

Last edited: 2015-06-09 06:44

here is my patch.

--- freecad-0.15.4671\src\Base\InputSource.org.cpp	Sun Apr 5 17:25:02 2015 UTC
+++ freecad-0.15.4671\src\Base\InputSource.cpp	Mon Jun 1 08:16:36 2015 UTC
@@ -128,21 +128,21 @@
 
 XMLSize_t StdInputStream::readBytes( XMLByte* const  toFill, const XMLSize_t maxToRead )
 {
   //
   //  Read up to the maximum bytes requested. We return the number
   //  actually read.
   //
   
   stream.read((char *)toFill,maxToRead);
   XMLSize_t len = stream.gcount();
-
+#if 0
   // See http://de.wikipedia.org/wiki/UTF-8#Kodierung
   for (XMLSize_t i=0; i<len; i++) {
       XMLByte& b = toFill[i];
       int seqlen = 0;
 
       if ((b & 0x80) == 0) {
           seqlen = 1;
       }
       else if ((b & 0xE0) == 0xC0) {
           seqlen = 2;
@@ -162,21 +162,21 @@
       for(int j = 1; j < seqlen; ++j) {
           i++;
           XMLByte& c = toFill[i];
           // range of second, third or fourth byte
           if ((c & 0xC0) != 0x80) {
               b = '?';
               c = '?';
           }
       }
   }
-
+#endif
   return len;
 }
 #endif

wmayer

2015-09-19 20:57

administrator   ~0006436

The check was added more than four years ago: http://free-cad.svn.sourceforge.net/viewvc/free-cad/trunk/src/Base/InputSource.cpp?r1=3431&r2=4539

I can't remember why it was added but if I am not totally wrong then xerces had problems with invalid characters in xml leading to strange behaviour or even a crash.

wmayer

2015-09-19 21:21

administrator   ~0006438

In case you get a project file with a broken (Gui)Document.xml then an exception is raised with no chance to continue loading the rest.

The check is supposed to replace invalid characters with '?' so that xerces (or whoever) doesn't raise an exception and the project can be fully loaded.

In bug 0000412 I found an example that doesn't load with the check commented out.

wmayer

2015-09-20 07:32

administrator   ~0006439

Here is an example to check if utf-8 input has errors: http://stackoverflow.com/questions/18227530/check-if-utf-8-string-is-valid-in-qt

When using QTextCodec::toUnicode it reports a couple of invalid characters.

wmayer

2015-09-20 09:16

administrator   ~0006440

The point is that the method StdInputStream::readBytes doesn't get the whole xml file but only chunks of 49152 characters. Now the problem is that the buffer is split exactly on a multi-byte character and the next time the method is called the check just starts at the beginning of the current buffer and then of course the first character seems broken.

So, to fix this issue we have to remember for the previous check how many characters are missing for the next time and then continue from there.

wmayer

2015-09-20 09:21

administrator   ~0006441

Note to myself:
Example to check buffer with Qt functions:

  QTextCodec::ConverterState state;
  state.flags |= QTextCodec::IgnoreHeader;
  state.flags |= QTextCodec::ConvertInvalidToNull;
  QTextCodec *codec = QTextCodec::codecForName("UTF-8");
  const QString text = codec->toUnicode((char *)toFill, len, &state);
  if (state.remainingChars > 0) {
      std::cerr << "Not a complete UTF-8 sequence." << state.remainingChars << std::endl;
  }
  if (state.invalidChars > 0) {
      std::cerr << "Not a valid UTF-8 sequence." << state.invalidChars << std::endl;
      QByteArray ba = codec->fromUnicode(text);
      for (int i=0; i<ba.length(); i++) {
          toFill[i] = ba[i];
          if (toFill[i] == '\0')
              toFill[i] = '?';
      }
      return ba.length();
  }
  return len;

Related Changesets

FreeCAD: master 1dad42c2

2015-09-20 12:29:05

wmayer

Details Diff
+ fixes 0002126: Fails to load .FCStd file which contains many MultiByte-Char strings Affected Issues
0002126
mod - src/Base/InputSource.cpp Diff File
mod - src/Base/InputSource.h Diff File

Issue History

Date Modified Username Field Change
2015-05-29 11:56 okamotom New Issue
2015-05-29 11:56 okamotom File Added: JapaneseLabel2.FCStd
2015-05-29 12:00 okamotom File Added: AfterLoadFails.png
2015-06-09 06:35 okamotom Note Added: 0006197
2015-06-09 06:36 okamotom Note Edited: 0006197
2015-06-09 06:40 okamotom Note Edited: 0006197
2015-06-09 06:44 okamotom Note Edited: 0006197
2015-06-09 06:44 okamotom Note Edited: 0006197
2015-09-19 20:57 wmayer Note Added: 0006436
2015-09-19 21:21 wmayer Note Added: 0006438
2015-09-20 07:32 wmayer Note Added: 0006439
2015-09-20 09:16 wmayer Note Added: 0006440
2015-09-20 09:21 wmayer Note Added: 0006441
2015-09-20 10:29 wmayer Changeset attached => FreeCAD Master master 1dad42c2
2015-09-20 10:29 wmayer Assigned To => wmayer
2015-09-20 10:29 wmayer Status new => closed
2015-09-20 10:29 wmayer Resolution open => fixed
2015-12-15 13:15 yorik Fixed in Version => 0.16