Meta Tags to Microformats

Earlier today, Jamie Tanna announced the opengraph-mf2 library and hosted project. It takes OpenGraph meta tags and converts them to microformats.

I do the same thing as one of the many pieces of my somewhat messy Parse This library. Parse This, which is designed to feed WordPress plugins, forms the basis of the reply-contexts in the Post Kinds plugin, the parsing for the Yarns Microsub plugin, and my newly released bookmarks plugin. In all cases, it tries to extract as much data about the URL sent to it, and return it in microformats 2 json, or the simplified jf2 format.

Jamie’s code is a simple 80 lines that takes a few tags and tries to convert them. I ran through every meta tag I could find by looking at dozens of different sites, so I was inspired to document same.

First of all, if you look at MDN’s definition for the meta tag, it states that if the name property is set, the meta element applies to the entire page, but if the itemprop property is set, that’s user-defined metadata. The content property contains the value for the name attribute. There is no mention of the attribute property in the HTML spec, but it is mentioned in the OpenGraph protocol.

I take name, property, or itemprop and map it to the key in an associative array, then content is the value. For values with curies(:), I use that to create a nested array, which is what I use to map properties.

There are common classic meta names that are longstanding and defined in the HTML specification, such as author, description, and keywords. If nothing else, this might generate some simple information.

Moving up a level to OpenGraph…there are several common metadata fields, namespaced with og.

  • og:title – this would map to p-name
  • Media – Some media has the :secure_url addition for the https version of the image. This is still used, although the modern utility is sometimes questionable.
    • og:image – this would map to u-photo.
    • og:video – this would map to u-video
    • og:audio – this would map to u-audio
  • og:url – this would map to u-url
  • og:description – this would map to p-summary
  • og:longitude, og:latitude can map to the equivalent location
  • og:type – The type is a bit harder to map, but can be used as hinting otherwise. Article as a type would be considered h-entry, profile would be h-card, music and video types would be h-cite.

Of the various types, music and video types are not really represented well in Microformats. So let’s focus on article first.

  • article:published_time – mapped to the dt-published property
  • article:modified_time – mapped to the dt-updated property
  • article:author – mapped to the author property

Many of the types have a tag property, that can have one or more tags…which get mapped to category.

Jamie opted to map the Twitter namespace properties as a secondary factor. I opted not to. The namespace is from their Cards specification, which is really just another OGP namespace. The problem is that they don’t provide an author name or website, only their Twitter handle. The majority of sites I viewed had both the og and the twitter namespaces, and I never got anything from the twitter namespace that wasn’t in the og namespace except Twitter specific details, which I wasn’t interested in. Facebook was responsible for OGP, so most people want to cover both sites, so they have both.

I did opt to look for the custom namespace for FourSquare venues, which is playfourquare, for latitude longitude. I also considered the presence of the namespace to indicate a FourSquare venue, and therefore an h-card.

  • playfoursquare:location:latitude – maps to p-latitude
  • playfoursquare:location:longitude – maps to p-longitude

After the OGP tags, I also looked for some other common meta tag names.

Some academic sources use Dublin Core properties in meta tags:

    • DC.Creator – p-author
    • DC.Title – p-name
    • DC.Date – dt-published
    • DC.Date.modified – dt-updated

Parse.ly, which is part of WordPress VIP, has its own markup.

  • parsely-title – p-name
  • parsely-link – u-url
  • parsely-image-url – u-photo
  • parse-type – post is h-entry, index would be h-feed
  • parsely-pub-date – Publication date
  • parsely-author as p-author
  • parsely-tags as the p-category
  • They also offer the property parsely-metadata for other fields which is json encoded.

I also convert JSON-LD to microformats, but that’s another story

 

Chapel Trail Nature Preserve
450-acre passive park that was established in the 1990s. The wetlands have become home to 120 species of birds, deer, marsh rabbits, alligators, snakes, turtles, largemouth bass, and insects. This nature preserve includes a 1,650-foot long boardwalk, a pavilion for observation, and canoe rentals on Saturdays.
Been hanging around my grandparent’s apartment, where I’m visiting, setting up the infrastructure I had to move out of my parent’s apartment due renovation. Added some new tricks. Running a wireguard gateway off a travel router, and pumping my DVR back at home through it so I can watch TV. Relocated the weather station, offline since May as well, need to get the sensors back online as well.
Going through my list of itineraries over the last decade to add limited records of where I’ve been to my site. I think I also have paper tickets and old boarding passes as well somewhere I can merge in. Right now, these are just simple posts with the location and the time keyed to the departure time of the flight.
Replied to https://twitter.com/mterenzio/status/1470064609876975618 by Matt (Twitter)

Well, the IndieWeb folks and the standards folks like more complex solutions (not saying they are bad) to a lot of the simple things that were working then and you lose some of the charm https://indieweb.org/Webmention

How is webmention more complicated vs trackback? It just adds verification? Trackback lacked any spam control.

IndieAuth for WordPress 4.2.0 Released

Decided to dive into the unknown with the IndieAuth spec. The WordPress plugin now supports the latest in the standard, some of which has been merged, and some of which is pending merge. This will be visible if you visit the spec repo, but has not been deployed to the spec page yet.

The first change is the introduction of the metadata endpoint. This means that instead of a Link header for every endpoint, there is one endpoint that has parameters for all the other endpoints. This means even if an extension like Ticket Auth(which requires another endpoint) is optional, it won’t require another header.

This is something we have in Micropub, where the media endpoint does not have its own link header(although there is a proposal to change that). But it does mean you have to make two requests(caching aside) instead of one in discovery.

The metadata endpoint also provides some configuration information on what the endpoints support, such as which scopes, which can be useful.

The introspection endpoint, introduced in 4.1.0, as a result, is no longer sharing a URL with the token endpoint. The side effect of needing to implement proof of concept….as the introspection proposal has yet to be merged. Until it is, it is considered experimental.

The new revocation endpoint allows this feature to be separated from the token endpoint as well. The old method still works for the foreseeable future.

The final endpoint added, the userinfo endpoint, is just a way of getting a refreshed version of the profile info returned when you make the initial request. This also being experimental till merged.

All of this, as well as some minor tweaks and optimizations, works, and is fully backward compatible. At some point in the future, when adoption changes, will be looking to deprecate older methods.

All of this is a step along the way of making IndieAuth not so much a separate protocol, but what it is described as….an identity layer on top of OAuth 2.0(or increasingly on top of the proposed OAuth 2.1), with the changes meaning less custom code.