mirror of
https://github.com/subsurface/subsurface.git
synced 2024-11-30 22:20:21 +00:00
git parser: handle left-over multi-line quoted strings better
The git save format is designed to be entirely line-based, where all the dive data is on individual lines that are independent. That is very much by design, so that you can merge these files automatically, and not worry about what it does to the context (contrast this to structured files like JSON or XML, where you have multiple levels of indentation, and the context of a line matters). So the parser can just ignore any conflict markers, and parse everything one line at a time. Well, almost. We do have *one* special form of multi-line context, where flowed text (think things like dive notes) will have one "header line" that starts the note, and then it can continue for several lines until the final line that ends the quote. In such a situation, the dive merging can result in a partially merged string note, which has the ending line from one dive, and then continues with more string data from the other dive. That will confuse our parser mightily, because it will have seen the end of the string, and parsed the rest of those string comments as garbage lines. That part in itself is fine - the garbage lines won't pass as any real data (because they don't start with a proper keyword), but while parsing that garbage the *next* end of the string will be seen as a start of a new string. And *that* then confuses the git parser to think that the line after that is now part of the string, and so it won't correctly parse the non-string line that follows. To give a more concrete example, the git dive data (here indented and abbreviated) might look like this: suit "5mm long + 3mm hooded vest" notes "First boat dive. Giant-stride entry." Saw a turtle." cylinder vol=10.0l description="10.0ℓ" depth=66.019m where the two notes from the two dives were notes "First boat dive. Giant-stride entry" and notes "First boat dive. Saw a turtle." respectively, and the merged result contained parts of both. When we parse this, we will parse the 'notes' line as having the string First boat dive. Giant-stride entry which is fine. But then the next line will be that Saw a turtle." and now the ending double quote character on that line will be seen as the beginning of a new string, and the cylinder information on the next line will then be mixed up. The resulting mess will be ignored, but in the process the data on the "cylinder" line will basically have been lost. There are several ways to deal with this, but this particular fix depends on the fact that we can recognize stale string continuation lines: they are either empty (for an empty line), or they start with a TAB character. So to solve the problem with the mis-identified end quote, this recognizes that we're in such a "stale left-over comment line" context, and will just skip such lines entirely. That does mean that when you have conflicts in dive note sections due to having edited the dive concurrently on different machines, you may just lose some of the edits. But this way at least you shouldn't lose any other data due to the merge conflict. NOTE! We could try to improve on this by instead noticing that a "end of multi-line string has a continuation entry on the next line", and just say "ok, that wasn't a real end after all". But that would be an independent thing anyway - this "ignore stale text comment lines" logic would be required anyway, in case those stale text comments ended up somewhere *else* than right after another text line. So do this more important fix first. Reported-by: Michael Werle Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
parent
d68fd2922c
commit
05e1294be9
1 changed files with 16 additions and 0 deletions
|
@ -1365,6 +1365,22 @@ static unsigned parse_one_line(const char *buf, unsigned size, line_fn_t *fn, st
|
|||
char line[MAXLINE + 1];
|
||||
int off = 0;
|
||||
|
||||
// Check the first character of a line: an empty line
|
||||
// or a line starting with a TAB is invalid, and likely
|
||||
// due to an early string end quote due to a merge
|
||||
// conflict. Ignore such a line.
|
||||
switch (*p) {
|
||||
case '\n': case '\t':
|
||||
do {
|
||||
if (*p++ == '\n')
|
||||
break;
|
||||
} while (p < end);
|
||||
SSRF_INFO("git storage: Ignoring line '%.*s'", (int)(p-buf-1), buf);
|
||||
return p - buf;
|
||||
default:
|
||||
break;
|
||||
}
|
||||
|
||||
while (p < end) {
|
||||
char c = *p++;
|
||||
if (c == '\n')
|
||||
|
|
Loading…
Reference in a new issue