NELKINDA SOFTWARE CRAFT

Workaround for Bug in LinkedInBot

LinkedInBot doesn't support content served as Content-Type: application/xhtml+xml. This article shows how to workaround this LinkedIn bug in your Apache configuration.

Author:
Christian Hujer, Software Crafter and CEO / CTO of Nelkinda Software Craft Private Limited
First Published:
by NNelkinda Software Craft Private Limited
Last Modified:
by Christian Hujer
Approximate reading time:
⊖ Cannot display preview. You can post as is, or try another link.
Figure -1: Bug in LinkedInBot

1 The Specification

The HTML5.2 specification clearly mentions two syntaxes for transmitting HTML resources [HTML5.2 Syntax].

The XHTML syntax has a couple of benefits for me. For example, I can use XSLT, XInclude, and other XML technologies. With those, I have setup a document preparation system comparable to LaTeX. The consequence is that my website is one of those rare websites served as Content-Type: application/xhtml+xml instead of Content-Type: text/html. That shouldn't be a problem, though, as what I do is 100% compliant with the HTML 5.2 specification [HTML5.2 Syntax].

2 The Bug in LinkedInBot

LinkedInBot has a bug which is related to the Content-Type. LinkedInBot seems to be unable to process content which is served as Content-Type: application/xhtml+xml. When writing a post on LinkedIn, and that post includes a URL which points to a resource which is served as Content-Type: application/xhtml+xml, then LinkedIn is unable to generate a preview. Instead, the user will see an error message that says ⊖ Cannot display preview. You can post as is, or try another link.

I have reported this bug to LinkedIn a couple of times over the past years. But the company behind LinkedIn — Microsoft — is known to give a shit about standards, correctness, and interoperability. So, if they ain't gonna fix LinkedInBot, I have to workaround in my web server instead.

The log entries in the log file look like this:

144.2.2.50 - - [09/May/2018:07:15:14 +0000] "GET /blog/ HTTP/1.1" 200 3940 "-" "LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com)"
Listing 2-1: Sample Apache log file entry from LinkedInBot

At least LinkedInBot is easy to identify in the log files via its unique User-Agent string LinkedInBot/1.0 (compatible; Mozilla/5.0; Jakarta Commons-HttpClient/3.1 +http://www.linkedin.com).

3 The Workaround for Apache httpd

The web server that I use is Apache. The following addition to the .htaccess file provides a workaround for the issue in LinkedInBot.

BrowserMatch .*LinkedIn.* is_shit
Header edit Content-Type application/xhtml\+xml text.html env=is_shit
Listing 3-1: Code for .htaccess which serves Content-Type: text/html instead of application/xhtml+xml to LinkedInBot.

With this workaround, the Apache web server will serve content to LinkedIn as Content-Type: text/html instead of Content-Type: application/xhtml+xml. That way, LinkedInBot can process the content. Note that not everything might work, though. LinkedInBot will use an HTML parser, not an XML parser. You will have the best success only if you do not rely on any specific features of an XML parser. So, do not use entities, XInclude, XSLT, or anything like that.