Metadata: rudimentary support for XMP metadataa in MP4-based videos

XMP is a media-metadata standard based on XML which may be used
across a variety of media formats. Some video-processing software
writes XMP data without updating the native metadata fields.
Therefore, we should aim at reading XMP metadata and give priority
of XMP data over native fields.

Pros:
	- Support for *all* common media formats.
Cons:
	- XML (complex, verbose, chaotic).
	- Does not even come close to fulfilling its promise of being
	  well defined (see below).

Implement a simple XMP-parser using libxml2. Connect the XMP-parser to
the existing Quicktime/MP4 parser.

First problem encountered: According to the spec, XMP data supposed
to be put in the 'XMP_' atom. But for example exiftools instead
writes an 'uuid' atom with a special 16-byte uid. Implement both,
more options will probably follow.

Second problem: two versions of recording the creation date were found
  1) The content of a <exif:DateTimeOriginal> tag.
  2) The xmp::CreateDate attribute of a <rdf:Description> tag.

Here too, more versions are expected to surface and will have
to be supported in due course (with an obvious priority problem).

Signed-off-by: Berthold Stoeger <bstoeger@mail.tuwien.ac.at>
This commit is contained in:
Berthold Stoeger 2018-09-15 19:11:01 +02:00 committed by Dirk Hohndel
parent 0aab39b35d
commit cc4f48be3f
4 changed files with 185 additions and 4 deletions

View file

@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include "metadata.h"
#include "xmp_parser.h"
#include "exif.h"
#include "qthelper.h"
#include <QString>
@ -111,6 +112,15 @@ static bool parseExif(QFile &f, struct metadata *metadata)
}
}
// Parse an embedded XMP block. Note that this is likely generated by
// external tools and therefore we give priority of XMP data over
// native metadata.
static void parseXMP(const char *data, size_t size, metadata *metadata)
{
if (timestamp_t timestamp = parse_xmp(data, size))
metadata->timestamp = timestamp;
}
static bool parseMP4(QFile &f, metadata *metadata)
{
f.seek(0);
@ -170,8 +180,9 @@ static bool parseMP4(QFile &f, metadata *metadata)
if (!memcmp(type, "moov", 4) ||
!memcmp(type, "trak", 4) ||
!memcmp(type, "mdia", 4)) {
// Recurse into "moov", "trak" and "mdia" atoms
!memcmp(type, "mdia", 4) ||
!memcmp(type, "udta", 4)) {
// Recurse into "moov", "trak", "mdia" and "udta" atoms
atom_stack.push_back(atom_size);
continue;
} else if (!memcmp(type, "mdhd", 4) && atom_size >= 24 && atom_size < 4096) {
@ -203,10 +214,30 @@ static bool parseMP4(QFile &f, metadata *metadata)
metadata->duration.seconds = lrint((double)duration / timescale);
// Timestamp is given as seconds since midnight 1904/1/1. To be convertible to the UNIX epoch
// it must be larger than 2082844800.
if (timestamp >= 2082844800) {
// Note that we only set timestamp if not already set, because we give priority to XMP data.
if (!metadata->timestamp && timestamp >= 2082844800) {
metadata->timestamp = timestamp - 2082844800;
// Currently, we only know how to extract timestamps, so we might just quit parsing here.
// We got our timestamp and duration. Nevertheless, we continue
// parsing, as there might still be an XMP atom.
}
} else if (!memcmp(type, "XMP_", 4) && atom_size > 32 && atom_size < 100000) {
// Parse embedded XMP data.
std::vector<char> d(atom_size);
if (f.read(&d[0], atom_size) != static_cast<int>(atom_size))
break;
parseXMP(&d[0], atom_size, metadata);
} else if (!memcmp(type, "uuid", 4) && atom_size > 32 && atom_size < 100000) {
// UUID atoms with uid "BE7ACFCB97A942E89C71999491E3AFAC" contain XMP blocks
// according the JPEG 2000 standard. exiftools produces mp4-style videos with such
// an UUID atom.
std::vector<char> d(atom_size);
if (f.read(&d[0], atom_size) != static_cast<int>(atom_size))
break;
static const char xmp_uid[17] = "\xBE\x7A\xCF\xCB\x97\xA9\x42\xE8\x9C\x71\x99\x94\x91\xE3\xAF\xAC";
if (!memcmp(&d[0], xmp_uid, 16)) {
parseXMP(&d[16], atom_size - 16, metadata);
}
} else {
// Jump over unknown atom