Support divelogs.de exports that include Cyrillic characters

divelogs.de sends us XML files that explicitly state that they are in
ISO-8859-1 encoding (which is true). These files contain the HTML encoded
Cyrillic characters. Once we decode those characters the resulting file is
actually UTF-8 encoded (which is a superset of ISO-8859-1). That seriously
confuses libxml when it tries to parse things.

So instead recognize divelogs.de files and skip the encoding declaration
for them before decoding the HTML encoded non-ISO-8859-1 characters.

This does show, however, that divelogs.de incorrectly truncates the
encoded strings (at least in some sample data that I created the parsing
throws errors because of that).

Reported-by: Sergey Starosek <sergey.starosek@gmail.com>
Based-on-code-by: Miika Turkia <miika.turkia@gmail.com>
Signed-off-by: Dirk Hohndel <dirk@hohndel.org>
This commit is contained in:
Miika Turkia 2013-03-15 19:02:14 +02:00 committed by Dirk Hohndel
parent 98d769a02f
commit 757791335f
2 changed files with 24 additions and 2 deletions

View file

@ -1,7 +1,7 @@
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:strip-space elements="*"/>
<xsl:output method="xml" indent="yes"/>
<xsl:output method="xml" indent="no" encoding="UTF-8" omit-xml-declaration="yes"/>
<xsl:template match="/">
<divelog program='subsurface-import' version='2'>