git parser: handle left-over multi-line quoted strings better

The git save format is designed to be entirely line-based, where all the
dive data is on individual lines that are independent.

That is very much by design, so that you can merge these files
automatically, and not worry about what it does to the context (contrast
this to structured files like JSON or XML, where you have multiple
levels of indentation, and the context of a line matters).

So the parser can just ignore any conflict markers, and parse everything
one line at a time.

Well, almost.

We do have *one* special form of multi-line context, where flowed text
(think things like dive notes) will have one "header line" that starts
the note, and then it can continue for several lines until the final
line that ends the quote.

In such a situation, the dive merging can result in a partially merged
string note, which has the ending line from one dive, and then continues
with more string data from the other dive.

That will confuse our parser mightily, because it will have seen the end
of the string, and parsed the rest of those string comments as garbage lines.

That part in itself is fine - the garbage lines won't pass as any real
data (because they don't start with a proper keyword), but while parsing
that garbage the *next* end of the string will be seen as a start of a
new string.

And *that* then confuses the git parser to think that the line after
that is now part of the string, and so it won't correctly parse the
non-string line that follows.

To give a more concrete example, the git dive data (here indented and
abbreviated) might look like this:

	suit "5mm long + 3mm hooded vest"
	notes "First boat dive.
		Giant-stride entry."
		Saw a turtle."
	cylinder vol=10.0l description="10.0ℓ" depth=66.019m

where the two notes from the two dives were

	notes "First boat dive.
		Giant-stride entry"

and

	notes "First boat dive.
		Saw a turtle."

respectively, and the merged result contained parts of both.

When we parse this, we will parse the 'notes' line as having the string

	First boat dive.
	Giant-stride entry

which is fine. But then the next line will be that

	Saw a turtle."

and now the ending double quote character on that line will be seen as
the beginning of a new string, and the cylinder information on the next
line will then be mixed up.  The resulting mess will be ignored, but in
the process the data on the "cylinder" line will basically have been
lost.

There are several ways to deal with this, but this particular fix
depends on the fact that we can recognize stale string continuation
lines: they are either empty (for an empty line), or they start with a
TAB character.

So to solve the problem with the mis-identified end quote, this
recognizes that we're in such a "stale left-over comment line" context,
and will just skip such lines entirely.

That does mean that when you have conflicts in dive note sections due to
having edited the dive concurrently on different machines, you may just
lose some of the edits.

But this way at least you shouldn't lose any other data due to the merge
conflict.

NOTE! We could try to improve on this by instead noticing that a "end of
multi-line string has a continuation entry on the next line", and just
say "ok, that wasn't a real end after all".

But that would be an independent thing anyway - this "ignore stale text
comment lines" logic would be required anyway, in case those stale text
comments ended up somewhere *else* than right after another text line.

So do this more important fix first.

Reported-by: Michael Werle
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This commit is contained in:
Linus Torvalds 2022-07-30 20:47:43 -07:00 committed by Dirk Hohndel
parent d68fd2922c
commit 05e1294be9

View file

@ -1365,6 +1365,22 @@ static unsigned parse_one_line(const char *buf, unsigned size, line_fn_t *fn, st
char line[MAXLINE + 1];
int off = 0;
// Check the first character of a line: an empty line
// or a line starting with a TAB is invalid, and likely
// due to an early string end quote due to a merge
// conflict. Ignore such a line.
switch (*p) {
case '\n': case '\t':
do {
if (*p++ == '\n')
break;
} while (p < end);
SSRF_INFO("git storage: Ignoring line '%.*s'", (int)(p-buf-1), buf);
return p - buf;
default:
break;
}
while (p < end) {
char c = *p++;
if (c == '\n')