Back to Top

Protect Shortcodes from wpautop and the likes

Previous Post:

Protect Shortcodes from wpautop and the likes

It is typical for any WordPress blog owner to have a few (if not many) shortcodes defined in his or her installed theme or plugins. While some shortcodes will work perfectly, some, especially enclosing ones, will not. Most of the time the problems lie at WordPress’s built-in formatting functions, famous one of which is wpautop().

There are actually countless support questions and articles about wpautop() and other formatting functions mangling the contents of shortcodes but there has yet been an ultimate solution to such issue. Without hacking into the core WordPress there are two ways we can protect the shortcode contents from those functions as far as I know, and in this article I will try to describe both.

The Problems

Let us make a simple shortcode1 first:

  1. add_shortcode('copyright', bwp_copyright);
  2. function bwp_copyright()
  3. {
  4.     return '<div class="copyright">Please cite this website if you copy any contents.</div>';
  5. }
add_shortcode('copyright', bwp_copyright);
function bwp_copyright()
{
	return '<div class="copyright">Please cite this website if you copy any contents.</div>';
}

This is called a self-closing shortcode, i.e. a shortcode that doesn’t need a closing tag and you can use it by placing [copyright] in your post contents. Okay now you should try adding [copyright] to your post, hit Update or Publish and then view your post. You should see the correct output that is <div class="copyright">Please cite this website if you copy any contents.</div> where you place the above shortcode, so far so good.

Let’s try changing this shortcode into an enclosing shortcode:

  1. add_shortcode('copyright', bwp_copyright);
  2. function bwp_copyright($atts, $content = '')
  3. {
  4.     return "<div>$content</div>";
  5. }
add_shortcode('copyright', bwp_copyright);
function bwp_copyright($atts, $content = '')
{
	return "<div>$content</div>";
}

and then post our new shortcode into the post contents:

[copyright]Please cite this website if you copy any contents.[/copyright]

This shortcode will once again output what we want: <div>Please cite this website if you copy any contents.</div>. Of course with such simple tests there would be no problem whatsoever but what about more complicated ones, for example shortcodes with HTML contents and special characters?

We will use the above enclosing shortcode and only change what we’re gonna type into our editor, to this (I suggest you use the HTML editor for expected result):

[copyright]
Please cite this website if you copy any contents.
In case you do not want to; please "contact" us at ...
[/copyright]

Now if you view your post again and look at the source code instead, you will see this:

  1. <div><br />
  2. Please cite this website if you copy any contents.<br />
  3. In case you do not want to, please &#8220;contact&#8221; us at &#8230;<br />
  4. </div>
<div><br />
Please cite this website if you copy any contents.<br />
In case you do not want to, please &#8220;contact&#8221; us at &#8230;<br />
</div>

See all those weird things added to your shortcode contents? This might look perfect if you only need to print the contents out but what if you want to actually process the contents of the shortcode? For example if you want to convert all semicolons (;) to commas ,, you might do this in your shortcode:

  1. add_shortcode('copyright', bwp_copyright);
  2. function bwp_copyright($atts, $content = '')
  3. {
  4.     return '<div>' . str_replace(';', ',', $content) . '</div>';
  5. }
add_shortcode('copyright', bwp_copyright);
function bwp_copyright($atts, $content = '')
{
	return '<div>' . str_replace(';', ',', $content) . '</div>';
}

but the result is, obviously, not what you want (semicolons used by HTML entities also got replaced):

  1. <div><br />
  2. Please cite this website if you copy any contents.<br />
  3. In case you do not want to, please &#8220,contact&#8221, us at &#8230,<br />
  4. </div>
<div><br />
Please cite this website if you copy any contents.<br />
In case you do not want to, please &#8220,contact&#8221, us at &#8230,<br />
</div>

Now whether you view the post normally or not, your shortcode’s contents will be mangled. This is actually the result of the two filters: wptexturize()2 and wpautop()3 on the_content()4 (they also affect the_excerpt()5, in case you use it.)

The Solutions

Enough with the problems you say? Alright, let’s fix this annoying issue then! As stated from the beginning there are two approaches we can take to protect the shortcode contents from those formatting functions6.

Process Shortcodes sooner

By default shortcodes are processed after formatting functions7 (they are registered at the default priority of 10 while do_shortcode()8 is registered at priority 11) and that somehow explains why we got those weird results. What if we process our shortcodes before priority 10? This approach was introduced by Viper, a well-known WordPress developer, on his blog. I will merge his codes into our codes (with some minor modifications) and now we have this:

  1. function bwp_copyright($atts, $content = '')
  2. {
  3.     return '<div>' . str_replace(';', ',', $content) . '</div>';
  4. }
  5.  
  6. function pre_process_shortcode($content) {
  7.     global $shortcode_tags;
  8.  
  9.     // Backup current registered shortcodes and clear them all out
  10.     $orig_shortcode_tags = $shortcode_tags;
  11.     $shortcode_tags = array();
  12.  
  13.     add_shortcode('copyright', 'bwp_copyright');
  14.  
  15.     // Do the shortcode (only the one above is registered)
  16.     $content = do_shortcode($content);
  17.  
  18.     // Put the original shortcodes back
  19.     $shortcode_tags = $orig_shortcode_tags;
  20.  
  21.     return $content;
  22. }
  23.  
  24. add_filter('the_content', 'pre_process_shortcode', 7);
function bwp_copyright($atts, $content = '')
{
    return '<div>' . str_replace(';', ',', $content) . '</div>';
}

function pre_process_shortcode($content) {
    global $shortcode_tags;

    // Backup current registered shortcodes and clear them all out
    $orig_shortcode_tags = $shortcode_tags;
    $shortcode_tags = array();

    add_shortcode('copyright', 'bwp_copyright');

    // Do the shortcode (only the one above is registered)
    $content = do_shortcode($content);

    // Put the original shortcodes back
    $shortcode_tags = $orig_shortcode_tags;

    return $content;
}

add_filter('the_content', 'pre_process_shortcode', 7);

The above snippet should be self-explanatory, but you should note the use of $shortcode_tags which is a global array containing all currently registered shortcodes. Now we will test the above shortcode again:

[copyright]
Please cite this website if you copy any contents.
In case you do not want to; please "contact" us at ...
[/copyright]

And this is the result when you view your post’s source:

  1. <div><br />
  2. Please cite this website if you copy any contents.<br />
  3. In case you do not want to, please &#8220;contact&#8221; us at &#8230;<br />
  4. </div>
<div><br />
Please cite this website if you copy any contents.<br />
In case you do not want to, please &#8220;contact&#8221; us at &#8230;<br />
</div>

Exactly what we want: only the semicolon after ‘want to’ got replaced.

At this point you might think that this is the perfect solution, but unfortunately it has one major drawback: you will not be able to strip this shortcode later on using strip_shortcodes()9. The reason is highlighted in the snippet above: when we put the original shortcodes back we’re basically removing the [copyright] shortcode completely and thus making it unstripable and unusable in the future.

Fortunately we can make it stripable by adding the same shortcode again but using a dummy callback function, like so:

  1. add_filter('the_content', 'bwp_add_dummy_shortcode', 12);
  2. function bwp_add_dummy_shortcode($atts, $content = '')
  3. {
  4.     add_shortcode('copyright', 'bwp_dummy_shortcode');
  5.     return $content;
  6. }
  7. function bwp_dummy_shortcode($atts, $content = '')
  8. {
  9.     return $content;
  10. }
add_filter('the_content', 'bwp_add_dummy_shortcode', 12);
function bwp_add_dummy_shortcode($atts, $content = '')
{
	add_shortcode('copyright', 'bwp_dummy_shortcode');
	return $content;
}
function bwp_dummy_shortcode($atts, $content = '')
{
	return $content;
}

If you try strip_shortcodes() now it should be able to strip the [copyright] shortcode. The priority of 12 ensures that this dummy shortcode gets called after the normal do_shortcode() to avoid some shortcode nesting problems.

Process the Shortcode independently

This is the approach used by the majority of syntax highlighting plugins out there. Basically we will try to get the contents of the shortcodes out of the post contents, save them into an object or array, process them and then put them back. This approach can be risky and much more complicated but it is well worth taking a look anyway.

To get the contents of the [copyright] shortcode out of the post contents you will need this:

  1. add_filter('the_content', 'bwp_before_format', 7); // 7 is simply a lucky number, nothing more =)
  2.  
  3. function bwp_build_shortcode_match($match)
  4. {
  5.     global $bwp_shortcode_matches, $bwp_shortcode_hash;
  6.  
  7.     $bwp_shortcode_matches[] = $match[1];
  8.  
  9.     return "\n<p>" . $bwp_shortcode_hash . sprintf("%03d", sizeof($bwp_shortcode_matches) - 1) . "</p>\n";
  10. }
  11.  
  12. function bwp_before_format($content)
  13. {
  14.     return preg_replace_callback(
  15.         "/\[copyright\](.*?)\[\/copyright\]/siu",
  16.         "bwp_build_shortcode_match",
  17.         $content
  18.     );
  19. }
add_filter('the_content', 'bwp_before_format', 7); // 7 is simply a lucky number, nothing more =)

function bwp_build_shortcode_match($match)
{
	global $bwp_shortcode_matches, $bwp_shortcode_hash;

	$bwp_shortcode_matches[] = $match[1];

	return "\n<p>" . $bwp_shortcode_hash . sprintf("%03d", sizeof($bwp_shortcode_matches) - 1) . "</p>\n";
}

function bwp_before_format($content)
{
	return preg_replace_callback(
		"/\[copyright\](.*?)\[\/copyright\]/siu",
		"bwp_build_shortcode_match",
		$content
	);
}

Complicated, eh? You can still give up now if you want :P. The snippet above makes use of two global variables10, $bwp_shortcode_matches used to store all shortcode contents and $bwp_shortcode_hash used to identify a placeholder; you will need to give them initial values:

  1. $bwp_shortcode_matches = array();
  2. $bwp_shortcode_hash = md5(rand(0, 1000));
$bwp_shortcode_matches = array();
$bwp_shortcode_hash = md5(rand(0, 1000));

The snippet below will process the shortcode contents and put them back into the placeholders:

  1. add_filter('the_content', 'bwp_after_format', 1000); // high enough to make this the last filter ever on the_content()
  2.  
  3. function bwp_process_shortcode($identifier)
  4. {
  5.     global $bwp_shortcode_matches;
  6.  
  7.     $identifier = (int) $identifier[1];
  8.     $content = (isset($bwp_shortcode_matches[$identifier])) ? $bwp_shortcode_matches[$identifier] : '';
  9.  
  10.     return '<div>' . str_replace(';', ',', $content) . '</div>';
  11. }
  12.  
  13. function bwp_after_format($content)
  14. {
  15.     global $bwp_shortcode_hash;
  16.  
  17.     return preg_replace_callback(
  18.         "/<p>" . $bwp_shortcode_hash . "(\d{3})<\/p>/siu",
  19.         "bwp_process_shortcode",
  20.         $content
  21.     );
  22. }
add_filter('the_content', 'bwp_after_format', 1000); // high enough to make this the last filter ever on the_content()

function bwp_process_shortcode($identifier)
{
	global $bwp_shortcode_matches;

	$identifier = (int) $identifier[1];
	$content = (isset($bwp_shortcode_matches[$identifier])) ? $bwp_shortcode_matches[$identifier] : '';

	return '<div>' . str_replace(';', ',', $content) . '</div>';
}

function bwp_after_format($content)
{
	global $bwp_shortcode_hash;

	return preg_replace_callback(
		"/<p>" . $bwp_shortcode_hash . "(\d{3})<\/p>/siu",
		"bwp_process_shortcode",
		$content
	);
}

Time to test this with our shortcode:

[copyright]
Please cite this website if you copy any contents.
In case you do not want to; please "contact" us at ...
[/copyright]

and the result when you view post’s source is:

  1. <div>
  2. Please cite this website if you copy any contents.
  3. In case you do not want to, please "contact" us at ...
  4. </div>
<div>
Please cite this website if you copy any contents.
In case you do not want to, please "contact" us at ...
</div>

It looks like no formatting function has touched your shortcode’s contents, which is simply amazing!

Of course great thing like this comes at a price: since no formatting function is applied on our shortcode, we are eliminating not only unwanted but also needed formating, which is not good at all. You can modify the bwp_process_shortcode() function to include any formatting functions you might need to overcome such problem, like so:

  1. function bwp_process_shortcode($identifier)
  2. {
  3.     global $bwp_shortcode_matches;
  4.  
  5.     $identifier = (int) $identifier[1];
  6.     $content = (isset($bwp_shortcode_matches[$identifier])) ? $bwp_shortcode_matches[$identifier] : '';
  7.  
  8.     return '<div>' . wpautop($content) . '</div>';
  9. }
function bwp_process_shortcode($identifier)
{
	global $bwp_shortcode_matches;

	$identifier = (int) $identifier[1];
	$content = (isset($bwp_shortcode_matches[$identifier])) ? $bwp_shortcode_matches[$identifier] : '';

	return '<div>' . wpautop($content) . '</div>';
}

The actual drawback of this approach is its complexity as well as its requirement of sufficient Regular Expression11 knowledge. If you wish to extend this approach (for example using shortcodes with attributes), I suggest that you pay this page a visit: http://www.regular-expressions.info/reference.html. Oh don’t forget to add the dummy shortcode like the first approach so you can strip it normally later ;).

That’s it for now folks, I hope you find this article useful. À bientôt!

References

  1. http://codex.wordpress.org/Shortcode_API []
  2. http://codex.wordpress.org/Function_Reference/wptexturi ... ptexturize []
  3. http://codex.wordpress.org/Function_Reference/wpautop []
  4. http://codex.wordpress.org/Function_Reference/the_conte ... he_content []
  5. http://codex.wordpress.org/Function_Reference/the_excer ... he_excerpt []
  6. http://codex.wordpress.org/How_WordPress_Processes_Post ... st_Content []
  7. http://codex.wordpress.org/Shortcode_API#Output []
  8. http://codex.wordpress.org/Function_Reference/do_shortc ... _shortcode []
  9. http://codex.wordpress.org/Function_Reference/strip_sho ... shortcodes []
  10. 8-using-global-variables-in-wordpress/ []
  11. http://www.regular-expressions.info/ []
Print Article Trackback Trackback to this Article   Subscribe to Comments RSS Subscribe to Comments RSS

3 Opinions for Protect Shortcodes from wpautop and the likes (2 Trackbacks)

  1. User's Gravatar
    2
    Jonny October 31, 2012 at 6:52 pm – Permalink

    There’s a tiny mistake in your code in the first solution.

    1. add_filter('the_content', 'bwp_add_dummy_shortcode', 12);
    2. function bwp_add_dummy_shortcode($atts, $content = '')
    3. {
    4.     add_shortcode('copyright', 'bwp_dummy_shortcode');
    5.     return $content;
    6. }
    add_filter('the_content', 'bwp_add_dummy_shortcode', 12);
    function bwp_add_dummy_shortcode($atts, $content = '')
    {
        add_shortcode('copyright', 'bwp_dummy_shortcode');
        return $content;
    }

    the_content filter callback should only have one argument $content.

  1. WordPress BugNet - Part Two - Better WordPress

    [...] been following my blog for some time, you would know that I had already written an article about protecting shortcodes’ contents from wpautop and the likes. Check that out for a not-so-simple [...]

  2. Wordpress Shortcode p tags - (and HTML escape plugins)

    [...] This is by far the most interesting and robust method I've come across so far, and is apparently the method used by most syntax highlighting plugins that use enclosing shortcodes. There's a complete walkthrough of this (and some other interesting possibilities) over at the Better WP blog. [...]

Speak Up Your Mind!

An asterisk (*) indicates a required field and must be filled.




  • Web page and e-mail addresses turn into links automatically.
  • Wrap codes in: <code lang=""></code> or <pre lang="" extra="">
  • Lines and paragraphs break automatically.

Next Post: