Most tools that scrape web pages use the OpenGraph metadata embedded in web pages. Some fall back to using the more general and older metadata tags, like description
or the <title>
element, but this leads to a rather limited embedding. Almost no-one extracts pictures from pages unless explicitly requested to do so by metadata.
Until recently, earthli didn’t include this metadata, leading to somewhat substandard rendering of any links pasted to social media.
As an example, the article NY Times Spelling Bee now includes the following OpenGraph metadata:
<meta name="twitter:image"
content="https://…/forthwith.png" />
<meta property="og:url"
content="https://…view_article.php?id=3974" />
<meta property="og:title" content="NY Times Spelling Bee" />
<meta property="og:type" content="website" />
<meta property="og:description" content="I recently
wrote that Kath and I have a one-year streak going in the
NYT Crossword Puzzle. While that is still ongoing,
we've also recently discovered a little gem
called Spelling Bee. The concept is …" />
<meta property="article:author" content="marco" />
<meta property="article:published_time"
content="2020-05-16 20:39:52" />
<meta property="article:modified_time"
content="2020-05-21 21:15:08" />
<meta property="og:image"
content="https://…/forthwith.png" />
<meta property="og:image:width " content="2562" />
<meta property="og:image:height " content="1566" />
The same article also now has Twitter metadata:
<meta name="twitter:card" content="summary_large_image" />
<meta name="twitter:site" content="" />
<meta name="twitter:creator" content="@mvonballmo" />
<meta name="twitter:title" content="NY Times Spelling Bee" />
<meta name="twitter:description" content="I recently
wrote that Kath and I have a one-year streak going in the
NYT Crossword Puzzle. While that is still ongoing,
we've also recently discovered a little gem
called Spelling Bee. The concept is …" />
Twitter refuses to use any of the OpenGraph information, so you really need to include both copies.
Some of the properties aren’t necessarily required, but it was easy enough to generate them all from earthli’s general facilities.
In addition, I added support for SOCIAL_PAGE_OPTIONS
and introduced a method on the data hierarchy called set_social_options()
, which allows the data objects to enrich the social options before they’re formatted into the metadata area of the page header. A page must enable the social options and explicitly request to generate them, a feature I only enabled from the view_entry.php
and view_folder.php
pages.
The results are shown below.
Apple Messages uses the OpenGraph tags to make nicely formatted previews now.
I haven’t actually posted anything to Facebook, but was able to use the Social-graph Testing tools to see how posts would look.
I only tested articles with Twitter because I don’t anticipate ever tweeting photos or albums. The tweet is nicely formatted now, with or without an attachment. Previously, Twitter displayed links to earthli as only a simple title, with no description or image.