Back to Top

Better WordPress Google XML Sitemaps

Better WordPress Google XML Sitemaps

The first WordPress XML Sitemap plugin that comes with comprehensive support for Google News sitemap, Sitemap Index and Multi-site. Extend functionality via flexible modules, not just hooks!

A WordPress XML sitemap plugin that has support for Google News sitemap, Sitemap Index and Multi-site. You will no longer have to worry about the 50,000 URL limit or the time it takes for a sitemap to be generated. This plugin is fast, consumes much fewer resources and can be extended via your very own modules (yes, no hooks needed).

Before moving on to the documentation, which can be rather long and boring, I think it’s better to show you the actual sitemapindex this plugin can generate ;).

Plugin Features

Google News Sitemap support (since 1.2.0)

Add a Google News sitemap to your sitemapindex easily. News sitemap can be used to ping search engines individually if you want. And of course, whenever you publish a new post in a news category, all selected search engines will be pinged.

Sitemapindex Support

Sitemapindex, as its name suggests, is one kind of sitemaps that allows you to group multiple sitemaps files inside it. Sitemapindex, therefore, gives you many benefits, such as: possibility to bypass the 50,000 URL limit (you can have 10 custom sitemaps, each has 10000 URLs), or possibility to make the generation time much faster (because each sitemap is requested separately and is built by its own module), etc.

Splitting post-based sitemaps (since 1.1.0)

As of version 1.1.0, this plugin can automatically split large post sitemaps into smaller ones when limit reached. For example if you have 200K posts and would like to have 10K posts for each sitemap, BWP GXS will then split post.xml into 20 parts (i.e. from post_part1.xml to post_part20.xml).

This not only helps you bypass the 50,000 URLs limit without having to build your custom modules, but also helps make your sitemaps smaller, lighter, and of course faster to generate. This plugin has been tested on sites that have nearly 200K posts and it took less than 1 second to generate the sitemapindex.

Furthermore, you can set a separate limit for split sitemaps or simply use the global limit.

Multi-site Support

Each website within your network will have its own sitemapindex and sitemaps. For sub-domain installation, your sitemapindex will appear at http://sub-domain.example.com/sitemapindex.xml. For sub-folder installation, your sitemapindex will appear at http://example.com/sub-folder/sitemapindex.xml.

There’s always a sitemapindex for your main site, available at http://example.com/sitemapindex.xml. If you choose the sub-domain approach, each sub-domain can also have its own robots.txt. More on that in the Robots.txt section.

Custom sitemaps using modules

The unrivaled flexibility this plugin offers is the ability to define your custom sitemaps using modules. Each module is a actually .php file that tell BWP Google XML Sitemap how to build a sitemap file. You can extend default modules or create completely new ones.

This plugin also comes with a convenient base class for developing modules with easy to use and thoroughly documented API. Since modules can be defined by you, there’s no limitation what a sitemap can have (for example you can bypass the 50,000 URL limit, as stated above). There’s one limitation, though: your imagination ;). Oh, did I mention that you can even use module to create another sitemapindex?

Detailed Sitemap Log and Debug

Developing modules needs debugging and this plugin makes that so easy for any developers.

There are two kinds of logs: sitemap item log and sitemap generation log. Sitemap item log tells you what and when sitemaps are generated while sitemap generation log tells you how they are generated.

As of version 1.3.0 there are two debug modes, namely “Debug” and “Debug extra”, read on if you want to learn more about those two useful modes.

Now for a more complete feature list

  • New in 1.1.0!
    • This plugin can automatically split large post sitemaps into smaller ones. You can set a limit for each small sitemap.
    • You now have an External Pages’ sitemap, using which you can easily add links to pages that do not belong to WordPress to the Sitemap Index.
    • Exclude certain post types, taxonomies without having to use filters.
    • Hooks to default post-based and taxonomy-based modules to allow easier SQL query customization (you don’t have to develop custom modules anymore just to change minor things).
  • By default, this plugin allows you to create a sitemapindex that contains the following sitemaps: posts (including custom post types), static pages, taxonomy archives (including custom taxonomies) and date archives. You can of course enable or disable any of them.
  • Provide all basic options for creating sitemaps, such as:
    • Maximum number of items per sitemaps
    • Default change frequency
    • Default priority
    • Minimum priority
  • Allows you to add the sitemap to WordPress’s virtual robots.txt. If you have a sub-domain Multi-site installation, each blog will have its own robots.txt. Of course this only works if you don’t have a physical robots.txt in the main site’s root.
  • Have full support for WPMU Domain Mapping plugin.
  • Allows you to style your sitemaps using a built-in XSLT style sheet, or custom-made ones.
  • Allows you to compress you sitemaps, thus making them approximately 70% smaller.
  • Allows you to selectively ping search engines (Google, Bing) as well as set ping limit when you:
    • Publish a new post
    • Publish a draft
    • Publish a pending post
    • Publish a scheduled (future) post
  • Other advanced features:
    • Allows you to cache the sitemap for a certain period of time. You can choose to automatically or manually generate new cache sitemaps.
    • SQL cycling: if you have a lot of URLs, e.g. 30000, in a single sitemap, it is recommended that you do not query for 30000 items in just one query as it will result in a very heavy one. We use SQL cycling to split such query into smaller queries, i.e. we will do 30 queries, with 1000 items queried each.
    • Modules: a module is actually a generator that tells this plugin how to build a sitemap. Module gives you the ultimate flexibility when you want to make custom sitemaps or sitemapindexes. This will be covered in great details in the Module API section.
    • To support modules, this plugin provides detailed logging system with two debug modes that help you trace errors.

Plugin Usage

Basic Usage

Using this plugin is super easy, which basically requires two steps:

Step 1: Select what sitemaps to produce

After a successful activation (remember that you need WordPress 3.0 or higher for this plugin to work), navigate to BWP Sitemaps >> XML Sitemaps you should see something similar to:

BWP Sitemaps - Sitemaps to generate

BWP Sitemaps – Sitemaps to generate

In the Sitemaps to generate section you should select which sitemaps you want BWP GXS to produce. Most of the time it is the Taxonomy sitemaps (categories, tags, etc.) and Site address sitemap that you want to enable.

For post-based sitemaps and taxonomy-based sitemaps, you also have the option to exclude any post types or taxonomies you don’t want generated.

After you finish choosing which sitemaps to build, you can of course view the sitemapindex right away by clicking on the sitemapindex link in the Your sitemaps section. You should then see the default sitemapindex with a simple yet nice XSLT stylesheet attached to it. This is, however, completely optional.

Important note: When the sitemapindex is generated for the first time, you won’t see any Last modified date for any child sitemaps because none of them have been generated yet. This is expected and adhered to the official sitemap protocol.

Step 2: Submit your sitemap

For all your sitemaps to be crawled by search engine bots, you only have to copy the URL (e.g. http://example.com/sitemapindex.xml) and paste it into your webmaster tool of choice.

Please note that sitemaps are only updated when ‘something’ or ‘someone’ requests them. In other words, when you publish new/draft/future/pending posts you will not notice any slowdown at all, as this plugin will only notify (or ping) search engines about the fact that you have just updated your website/blog. When those search engines actually download the sitemaps, they will be updated.

Now just sit back and relax, search engines will crawl your sitemapindex and all other sitemaps inside that sitemapindex within hours. 90% of URLs on this website were indexed by Google on the third day it is online, just FYI ;).

Advanced Usage

Google News Sitemap

A Google News Sitemap is yet another sitemap that allows you to control which content you submit to Google News. By creating and submitting a Google News Sitemap, you’re able to help Google News discover and crawl your site’s articles.

With this module, you have an option to either include or exclude posts in certain categories. Let’s say you have 4 categories (A, B, C, and D) in which A and D are the ones you want to use as news categories. You can select category A and D, and choose to include, or select category B and C, and choose to exclude, simple as that!

Each category can be assigned with five pre-defined genres, as per Google News’s guidelines (http://support.google.com/news/publisher/bin/answer.py? ... swer=93992). You can select none or all of them, it’s totally up to you.

Last but not least, it is possible to map some categories in your language to Google News’s suggested keywords in English. To do that, use the following filter:

  1. add_filter('bwp_gxs_news_keyword_map', 'bwp_gxs_news_keyword_map');
  2.  
  3. function bwp_gxs_news_keyword_map($map_array)
  4. {
  5.     $map_array = array(
  6.         // Use this structure: 'category in your language' => 'Google News suggested keyword in English'
  7.         '電視台' => 'television',
  8.         '名人'=> 'celebrities'
  9.     );
  10.     return $map_array;
  11. }
add_filter('bwp_gxs_news_keyword_map', 'bwp_gxs_news_keyword_map');

function bwp_gxs_news_keyword_map($map_array)
{
	$map_array = array(
		// Use this structure: 'category in your language' => 'Google News suggested keyword in English'
		'電視台' => 'television',
		'名人'=> 'celebrities'
	);
	return $map_array;
}
Robots.txt

WordPress by default comes with a virtual robots.txt1 whose contents can be filtered by plugins or themes.

BWP Google XML Sitemaps allows you to dynamically add a Sitemap: http://example.com/sitemapindex.xml entry to such file, thus allowing search engine crawlers to detect your sitemapindex automatically. If you, however, have a real robots.txt file in your website’s root, the sitemap entry won’t be added, so please keep that in mind.

If you’re on a Sub-domain Multi-site installation you will notice that each blog in your network can have its own robots.txt. So with the robots option enabled, if you browse to http://example.com/robots.txt, you will see something similar to this:

User-agent: *
Disallow:

Sitemap: http://example.com/sitemapindex.xml

and if you browse to http://sub-domain.example.com/robots.txt, you will see something like this:

User-agent: *
Disallow:

Sitemap: http://sub-domain.example.com/sitemapindex.xml

Please note that there’s no http://example.com/sub-folder/robots.txt, so you won’t be able to add the sitemap entry dynamically in a Sub-folder Mult-site installation.

Sitemap Cache

When enabled, all sitemaps are cached for a default period of one hour after newly generated. As of version 1.3.0 this feature is turned off by default.

When a sitemap is requested, this plugin will see if the cache is still valid, and will serve the cached sitemap file if so. Otherwise the sitemap will be re-generated, but only if you enable “Enable auto cache re-generation”. You can manually flush the cache as well by clicking on “Flush cache” button.

As of version 1.3.0 you can also specify a custom cache directory, for e.g. /path/to/wordpress/wp-content/cache/sitemaps/. Instead of setting a cache directory via admin setting, it is possible to use a PHP constant:

  1. // put this in wp-config.php
  2. define('BWP_GXS_CACHE_DIR', '/path/to/wordpress/wp-content/cache/sitemaps/');
// put this in wp-config.php
define('BWP_GXS_CACHE_DIR', '/path/to/wordpress/wp-content/cache/sitemaps/');

or a filter:

  1. add_filter('bwp_gxs_cache_dir', 'bwp_gxs_my_cache_dir');
  2. function bwp_gxs_my_cache_dir()
  3. {
  4.     // if you want different cache dir per blog, add the logic here
  5.     return '/path/to/wordpress/wp-content/cache/sitemaps/';
  6. }
add_filter('bwp_gxs_cache_dir', 'bwp_gxs_my_cache_dir');
function bwp_gxs_my_cache_dir()
{
    // if you want different cache dir per blog, add the logic here
    return '/path/to/wordpress/wp-content/cache/sitemaps/';
}

The two alternative methods make it easy for you to programmatically apply a custom cache directory site-wise.

As always, the cache folder must be writable, i.e. you will have to CHMOD it to either 755 or 777.

Sitemap log & debug

Although completely optional, it is strongly recommended that you always enable the sitemap log as it tells you in details how each sitemap was generated. All potential errors such as sitemap not found, empty sitemap, or headers already sent are all logged for your convenience.

When there’s an issue with generating a sitemap, or you’re developing a new module, it is a good idea to enable “Debug mode”. In this mode, the plugin will not make use of any caching mechanism, and if you have WP_DEBUG enabled, error messages will be shown as well.

As of version 1.3.0, there’s also a “Debug extra mode”. This mode is the same as “Debug mode” except that no headers are sent and no compression is used, i.e. you should see a raw text file that contains sitemap contents. This is especially useful when you encounter “Content Encoding Error” or related ones, because when such errors occur, you can’t easily identify the causes if headers are sent or compression is in use.

SQL Cycling

As mentioned above, it is better to query for items using a light query rather than a heavy one. This plugin comes with a feature called SQL cycling, which means we will do small queries several times instead of doing a heavy query one time.

The only setting you can use to customize this feature is the number of items each cycle will query for. By default the number is 1000, but it can be increased to serve higher number of items. Since it’s not a good idea to go over 30 queries either, if you have 50,000 items in one sitemap, you should change the SQL query limit to at least 1600.

Customization

Custom XSLT stylesheet

The default XSLT stylesheet should look OK in most cases, but if you do require a custom one, simply navigate to BWP Sitemaps >> XML Sitemaps >> Look and feel and set “Custom XSLT stylesheet URL” to an absolute URL containing the desired stylesheet.

Please make sure that you also have an XSLT style sheet for the sitemapindex in the same directory where the above custom stylesheet is found. For example, if your custom XSLT URL looks like this: http://example.com/my-xslt.xsl then http://example.com/my-xsltindex.xsl must be publicly accessible, too.

Exclude specific posts, terms, etc.

In version 1.1.0 more hooks have been added to default modules to allow easier customization of SQL queries used to build your sitemaps. For example, to exclude certain posts using IDs, you can do this:

  1. add_filter('bwp_gxs_excluded_posts', 'bwp_gxs_exclude_posts', 10, 2);
  2.  
  3. function bwp_gxs_exclude_posts($excluded_posts, $post_type)
  4. {
  5.     // $post_type let you easily exclude posts from specific post types
  6.     switch ($post_type)
  7.     {
  8.         case 'post': return array(1,2,3,4); break; // the default post type
  9.         case 'movie': return array(5,6,7,8); break; // the 'movie' post type
  10.     }
  11.  
  12.     return array();
  13. }
add_filter('bwp_gxs_excluded_posts', 'bwp_gxs_exclude_posts', 10, 2);

function bwp_gxs_exclude_posts($excluded_posts, $post_type)
{
	// $post_type let you easily exclude posts from specific post types
	switch ($post_type)
	{
		case 'post': return array(1,2,3,4); break; // the default post type
		case 'movie': return array(5,6,7,8); break; // the 'movie' post type
	}

	return array();
}

This is the preferred method to exclude posts as of version 1.3.0 as it allows the plugin to correctly split post-based sitemaps in the sitemapindex while respecting all excluded post IDs.

However, you can still do this the ancient way, which gives you more control over the actual query:

  1. add_filter('bwp_gxs_post_where', 'bwp_gxs_exclude_posts', 10, 2);
  2.  
  3. function bwp_gxs_exclude_posts($query_where_part, $post_type)
  4. {
  5.     // $post_type let you easily exclude posts from specific post types
  6.     switch ($post_type)
  7.     {
  8.         case 'post': return ' AND p.ID NOT IN (1,2,3,4) '; break; // the default post type
  9.         case 'movie': return ' AND p.ID NOT IN (5,6,7,8) '; break; // the 'movie' post type
  10.     }
  11.  
  12.     return '';
  13. }
add_filter('bwp_gxs_post_where', 'bwp_gxs_exclude_posts', 10, 2);

function bwp_gxs_exclude_posts($query_where_part, $post_type)
{
	// $post_type let you easily exclude posts from specific post types
	switch ($post_type)
	{
		case 'post': return ' AND p.ID NOT IN (1,2,3,4) '; break; // the default post type
		case 'movie': return ' AND p.ID NOT IN (5,6,7,8) '; break; // the 'movie' post type
	}

	return '';
}

Remember to use p as the table alias and have some spaces before and after what you return, just to make sure it won’t corrupt the module’s query. Prior to version 1.3.0 the table alias was wposts so it is recommended to update your modules to use the new p alias, though not required.

Similarly, to exclude terms from a specific taxonomy, you can basically do the same thing:

  1. add_filter('bwp_gxs_term_exclude', 'bwp_gxs_exclude_terms', 10, 2);
  2.  
  3. function bwp_gxs_exclude_terms($excluded, $taxonomy)
  4. {
  5.     // $taxonomy let you easily exclude terms from specific taxonomies
  6.     switch ($taxonomy)
  7.     {
  8.         case 'category': return array('cat-slug1', 'cat-slug2'); break;
  9.         case 'post_tag': return array('tag-slug1', 'tag-slug2'); break;
  10.     }
  11.  
  12.     return array();
  13. }
add_filter('bwp_gxs_term_exclude', 'bwp_gxs_exclude_terms', 10, 2);

function bwp_gxs_exclude_terms($excluded, $taxonomy)
{
	// $taxonomy let you easily exclude terms from specific taxonomies
	switch ($taxonomy)
	{
		case 'category': return array('cat-slug1', 'cat-slug2'); break;
		case 'post_tag': return array('tag-slug1', 'tag-slug2'); break;
	}

	return array();
}

External pages sitemap

As of version 1.1.0, it will be easier for you to add external pages (links to pages from the same domain but not from WordPress) to the Sitemap Index. To do this, all you have to do is enable the “External pages” sitemap and then add this to your theme’s functions.php:

  1. add_filter('bwp_gxs_external_pages', 'my_external_sitemap');
  2.  
  3. function my_external_sitemap()
  4. {
  5.     $external_pages = array(
  6.         array('location' => home_url('link-to-page.html'), 'lastmod' => '06/02/2011', 'priority' => '1.0'),
  7.         array('location' => home_url('another-page.html'), 'lastmod' => '05/02/2011', 'priority' => '0.8')
  8.         // repeat this for any other pages you would like to add
  9.     );
  10.     return $external_pages;
  11. }
add_filter('bwp_gxs_external_pages', 'my_external_sitemap');

function my_external_sitemap()
{
	$external_pages = array(
		array('location' => home_url('link-to-page.html'), 'lastmod' => '06/02/2011', 'priority' => '1.0'),
		array('location' => home_url('another-page.html'), 'lastmod' => '05/02/2011', 'priority' => '0.8')
		// repeat this for any other pages you would like to add
	);
	return $external_pages;
}

You can set each page’s location, last modified date and priority. Change frequency will be calculated automatically using the last modified date you provide. By default, you will get something similar to this:

BWP Sitemap - External pages sitemap

BWP Sitemap – External pages sitemap

Module API – Customize your sitemaps like never before

What is a sitemap module?

Before we start building a custom sitemap, let’s talk about how the module system works.

Module is simply a .php file that contains a module class, which has a unique name, i.e. just one module for one sitemap. In the module class you can use all API functions provided by the base module (BWP_GXS_MODULE) if you extends2 it.

Using API functions you will be able to do SQL cycling, calculate priority, calculate change frequency, etc. but the body of the module class needs to be your own codes, i.e. you fetch the needed data in your own way and pass it back to the plugin to handle. It is therefore very easy to create a new module, because most of the time all you have to do is to change the SQL query that is used to get contents from database.

Now if you open the module folder in bwp-google-xml-sitemaps/includes/modules you will notice that each module’s filename is similar to its corresponding sitemap. For example post.php is used to build all post-based sitemaps and taxonomy.php is used to build all taxonomy-based sitemaps.

It is important to understand that a module can be either a parent or a child module. A child module will be used first, and if it is not found, the parent module will be used instead. So if you have a custom sitemap named post_most_popular.xml, this plugin will request post_most_popular.php first, and if that fails, it will request post.php.

You might ask: “If I add more modules, what will happen when this plugin gets updated?” No worry, you have the option to set a custom module directory that will take precedence over the default module directory. Simply speaking, this plugin will look for modules in the custom directory first, and when it can not find the requested module file, it will look for modules in the default one, in the exact same way described above.

Up to this point I believe that you have a better understanding of what module is all about and how it operates, how about we get to the real thing, now ;)?

Basic API functions

If you do not like the default sitemapindex, you can always add or remove sitemaps (or modules) from it. Adding or removing a module is easy, using two basic module API functions, namely add_module() and remove_module(), respectively, like so:

  1. add_action('bwp_gxs_modules_built', 'bwp_gxs_add_modules');
  2. function bwp_gxs_add_modules()
  3. {
  4.     global $bwp_gxs;
  5.     $bwp_gxs->add_module('post', 'most popular');
  6. }
add_action('bwp_gxs_modules_built', 'bwp_gxs_add_modules');
function bwp_gxs_add_modules()
{
	global $bwp_gxs;
	$bwp_gxs->add_module('post', 'most popular');
}

In the above snippet, I’m adding a new sub-module named ‘most popular’ to the built-in module ‘post’.

Below’s a list of built-in modules:

post
page
taxonomy
archive
author
site

and some built-in sub-modules:

archive_monthly
archive_yearly
taxonomy_category
taxonomy_post_tag
...

Please note that the name you use for any module can only contain alphanumeric characters plus hyphens, underscores, and spaces. Spaces will be converted to underscores anyway, so it’s best to just use underscores (spaces are still allowed because some people might find them easier to work with).

So now you have a new module named post_most_popular, what are you supposed to do? Just create a custom module directory and then put a new module file named post_most_popular.php there, with your own codes, of course.

For the sake of simplicity, post_most_popular is actually included with this plugin as a sample module, so after you add the module, you should see post_most_popular.xml‘s contents right away (FYI: it lists posts with at least 2 comments, ordered by comment_count and post_modified).

If you are adding a parent module, you will also have to add a new rewrite rule. For example if you want a parent module named ‘most_popular’ (instead of a sub-module like ‘post_most_popular’), you will also need this:

  1. add_filter('bwp_gxs_rewrite_rules', 'add_rewrite_rules');
  2. function add_rewrite_rules()
  3. {
  4.     $my_rules = array(
  5.         'popular\.xml' => 'index.php?gxs_module=most_popular'
  6.     );
  7.     return $my_rules;
  8. }
add_filter('bwp_gxs_rewrite_rules', 'add_rewrite_rules');
function add_rewrite_rules()
{
	$my_rules = array(
		'popular\.xml' => 'index.php?gxs_module=most_popular'
	);
	return $my_rules;
}

You can have popular.xml, or most_popular.xml, or anything you see fit. After you have added the above snippet, make sure you pay a visit to your Permalink Settings and click Save Changes so that WordPress will recognize your new rewrite rules. Otherwise, you will be greeted with a 404 error when you try to visit http://yourdomain.com/popular.xml.

Now if you want to remove a module, the process is similar:

  1. add_action('bwp_gxs_modules_built', 'bwp_gxs_remove_modules');
  2. function bwp_gxs_remove_modules()
  3. {
  4.     global $bwp_gxs;
  5.     // This will remove all modules that have 'taxonomy' as their parent
  6.     $bwp_gxs->remove_module('taxonomy');
  7.     // This will remove 'taxonomy_post_tags' only
  8.     $bwp_gxs->remove_module('taxonomy', 'post_tag');
  9. }
add_action('bwp_gxs_modules_built', 'bwp_gxs_remove_modules');
function bwp_gxs_remove_modules()
{
	global $bwp_gxs;
	// This will remove all modules that have 'taxonomy' as their parent
	$bwp_gxs->remove_module('taxonomy');
	// This will remove 'taxonomy_post_tags' only
	$bwp_gxs->remove_module('taxonomy', 'post_tag');
}

Keep in mind that your sitemap’s name, your module’s name and your module’s filename must be the same when you add a new module.

Advanced API functions

For this section I will take the module file post.php (which is documented rather thoroughly) as an example so you will be able to learn all the advanced API functions easily.

Basically, developing a module from scratch involves three steps:

Step 1: Initialize required properties for the module class

Before you can initialize anything, you must define your class, and the class’ name must start with BWP_GXS_MODULE_, followed by the module’s name. For example the module post will have BWP_GXS_MODULE_POST as its class name.

If you happen to have a post type/taxonomy that has hyphens in its name, they will be automatically converted to underscores when the plugin looks for the child module class (e.g. BWP_GXS_MODULE_POST_THIS_HAS_HYPHEN will be used to serve post_this-has-hyphen).

To actually use the API functions, you have to extend the base module class, like so:

  1. <?php
  2. /**
  3.  * Some info about you and the module here would be nice
  4.  */
  5.  
  6. class BWP_GXS_MODULE_POST extends BWP_GXS_MODULE
  7. {
  8.     function __construct()
  9.     {
  10.     }
  11.  
  12.     function init_module_properties()
  13.     {
  14.     }
  15.  
  16.     function generate_data()
  17.     {
  18.     }
  19. }
<?php
/**
 * Some info about you and the module here would be nice
 */

class BWP_GXS_MODULE_POST extends BWP_GXS_MODULE
{
    function __construct()
    {
    }

    function init_module_properties()
    {
    }

    function generate_data()
    {
    }
}

The body of the module class is currently empty, but this is the expected structure for a module file.

Now it’s time to decide what this module will do, and what it will need to build data. The idea of this module is we will use it to display all kind of post type sitemaps, for example post.xml, post_movie.xml, etc. So what this module needs is the sub-module (to know what post type is being requested) and if sub-module is not found, we display the default post type, which is ‘post’.

To achieve that, you can make use of a property named $requested, which will hold the requested post type (or the sub-module). As of version 1.3.0 all module-related data such as what module/sub-module is being requested are automatically assigned to the module class and ready to be used.

For any properties that are dependent on module data you must init them inside init_module_properties. For any other properties that are not dependent on module data such as some basic get_option functions, you can put them inside __construct() (a function that gets called when the module class is initialized).

Step 2: Build the actual data

After all necessary properties are assigned with expected values, always use $this->build_data() inside the construct function to start building your data.

As of version 1.3.0 you don’t have to call build_data() inside __construct() anymore as it is called automatically in the plugin.

What’s important now is which builder function you will use, which can be either build_data() or generate_data(). Why the heck are there two similar functions for just one task? If you remember the SQL Cycling feature I talked about earlier, you will understand why we have two options here.

Simply speaking, the build_data() function ignores SQL Cycling while the generate_data() function allows you to make use of SQL Cycling. build_data() is recommended when you’re developing modules for sitemaps that do not have many items, which of course does not require SQL Cycling at all. generate_data() should be used in obviously opposite situations. Since there might be a lot of posts for a website/blog, for post.php we will use generate_data().

When you use generate_data() , you will have to query for posts using two DB API functions, namely $this->get_results() and $this->query_posts(). As you might have guessed, they’re no different than the two functions provided by WordPress: $wpdb->get_results()3 and query_posts()4. The same parameters and syntax are applied. Remember to always escape your query string with either $wpdb->escape()3 or $wpdb->prepare()3, as shown in the actual codes:

  1. // A standard custom query to fetch posts from database, sorted by their lastmod
  2. // You can use any type of queries for your modules
  3. $latest_post_query = '
  4.             SELECT * FROM ' . $wpdb->posts . "
  5.                 WHERE post_status = 'publish' AND post_type = %s" . '
  6.             ORDER BY post_modified DESC';
  7. // Use $this->get_results instead of $wpdb->get_results, remember to escape your query
  8. // using $wpdb->prepare or $wpdb->escape
  9. $latest_posts = $this->get_results($wpdb->prepare($latest_post_query, $requested));
// A standard custom query to fetch posts from database, sorted by their lastmod
// You can use any type of queries for your modules
$latest_post_query = '
			SELECT * FROM ' . $wpdb->posts . "
				WHERE post_status = 'publish' AND post_type = %s" . '
			ORDER BY post_modified DESC';
// Use $this->get_results instead of $wpdb->get_results, remember to escape your query
// using $wpdb->prepare or $wpdb->escape
$latest_posts = $this->get_results($wpdb->prepare($latest_post_query, $requested));

Now you’ve got the $latest_posts data set that contains all information about your posts. It would be pointless to continue the loop if the query returns nothing, so it is a good idea to have a simple check like below:

  1. // This check helps you stop the cycling sooner
  2. // It basically means if there is nothing to loop through anymore we return false so the cycling can stop.
  3. if (!isset($latest_posts) || 0 == sizeof($latest_posts))
  4.     return false;
// This check helps you stop the cycling sooner
// It basically means if there is nothing to loop through anymore we return false so the cycling can stop.
if (!isset($latest_posts) || 0 == sizeof($latest_posts))
	return false;

This snippet makes sure things are stopped correctly, and you won’t run into an endless loop somehow (should not happen, though).

Building each item is straightforward:

  1. // Always init your $data
  2. $data = array();
  3. for ($i = 0; $i < sizeof($latest_posts); $i++)
  4. {
  5.     $post = $latest_posts[$i];
  6.  
  7.     // Init your $data with the previous item's data. This makes sure no item is mal-formed.
  8.     $data = $this->init_data($data);
  9.  
  10.     if ($using_permalinks && empty($post->post_name))
  11.         $data['location'] = '';
  12.     else
  13.         $data['location'] = get_permalink();
  14.  
  15.     $data['lastmod']  = $this->get_lastmod($post);
  16.     $data['freq']     = $this->cal_frequency($post);
  17.     $data['priority'] = $this->cal_priority($post, $data['freq']);
  18.  
  19.     $this->data[] = $data;
  20. }
// Always init your $data
$data = array();
for ($i = 0; $i < sizeof($latest_posts); $i++)
{
	$post = $latest_posts[$i];

	// Init your $data with the previous item's data. This makes sure no item is mal-formed.
	$data = $this->init_data($data);

	if ($using_permalinks && empty($post->post_name))
		$data['location'] = '';
	else
		$data['location'] = get_permalink();

	$data['lastmod']  = $this->get_lastmod($post);
	$data['freq']     = $this->cal_frequency($post);
	$data['priority'] = $this->cal_priority($post, $data['freq']);

	$this->data[] = $data;
}

$this->init_data() allows you to init the current item with previous item’s data (except for the location of course). This is to make sure we don’t miss any item. This function takes one parameter: $data.

$this->get_lastmod() (introduced in version 1.3.0) allows you to get the proper last modified date from a post object. This function takes one parameter: $post object.

$this->cal_frequency() allows you to calculate change frequency based on item’s last modified time. This function takes two parameters: $post object, and last modified date (optional, used only when you can’t have a $post object).

$this->cal_priority() allows you to calculate priority based on item’s freshness, comment count, and change frequency. This function takes two parameters: $post object, and the current item’s change frequency (should be $data['freq']).

Step 3: Pass the built data back

To pass the data you just build using your module back to the plugin, simply use $this->data[] = $data; at the end of each loop, like so:

  1. for ($i = 0; $i < sizeof($latest_posts); $i++)
  2. {
  3.     // ... build data
  4.     $this->data[] = $data;
  5. }
for ($i = 0; $i < sizeof($latest_posts); $i++)
{
	// ... build data
	$this->data[] = $data;
}

Since we’re still using SQL Cycling, you will have to add this at the end of generate_data():

  1. return true;
return true;

This tells the module to continue its cycling process. Otherwise, the module will only loop one time.

Step 4: (Surprised!) Visit the configuration page and browse to your newly created sitemap. Make sure you enable debug mode so that no errors are left out. If you encounter something like ‘XML Parsing Error’, or ‘Content Encoding Error’, you have two options: 1) enable the debug extra mode, or 2) go to your module file and place exit; at the end of the build_data() or the generate_data() function (outside any loop) to see the actual errors in your module.

That’s it! Congratulations on your first module!

Create another sitemapindex

Creating a custom sitemapindex is similar to creating a custom sitemap. First you will have to add a new module, for example:

  1. add_action('bwp_gxs_modules_built', 'bwp_gxs_add_modules');
  2. function bwp_gxs_add_modules()
  3. {
  4.     global $bwp_gxs;
  5.     $bwp_gxs->add_module('mysitemapindex');
  6. }
add_action('bwp_gxs_modules_built', 'bwp_gxs_add_modules');
function bwp_gxs_add_modules()
{
	global $bwp_gxs;
	$bwp_gxs->add_module('mysitemapindex');
}

Then, similar to adding a parent module, you must add a new rewrite rule, like so:

  1. add_filter('bwp_gxs_rewrite_rules', 'add_rewrite_rules');
  2. function add_rewrite_rules()
  3. {
  4.     $my_rules = array(
  5.         'mysitemapindex\.xml' => 'index.php?gxs_module=mysitemapindex'
  6.     );
  7.  
  8.     return $my_rules;
  9. }
add_filter('bwp_gxs_rewrite_rules', 'add_rewrite_rules');
function add_rewrite_rules()
{
	$my_rules = array(
		'mysitemapindex\.xml' => 'index.php?gxs_module=mysitemapindex'
	);

	return $my_rules;
}

Make sure you flush all rewrite rules by visiting Settings >> Permalinks, and press Save Changes.

Next, in mysitemapindex.php, you need to set the sitemap’s type to “index”:

  1. function __construct()
  2. {
  3.     $this->type = 'index';
  4. }
function __construct()
{
	$this->type = 'index';
}

Since a sitemapindex’s item does not need priority or change frequency, in your builder function, be it build_data() or generate_data(), make sure you use something like this:

  1. $data = array();
  2. foreach ($items as $item)
  3. {
  4.     $data = $this->init_data($data);
  5.  
  6.     $data['location'] = $this->get_sitemap_url($slug);
  7.     $data['lastmod'] = $this->format_lastmod($int_timestamp); // use your own function to construct the last modified date
  8.  
  9.     $this->data[] = $data;
  10. }
$data = array();
foreach ($items as $item)
{
	$data = $this->init_data($data);

	$data['location'] = $this->get_sitemap_url($slug);
	$data['lastmod'] = $this->format_lastmod($int_timestamp); // use your own function to construct the last modified date

	$this->data[] = $data;
}

$this->get_sitemap_url() (previously get_xml_link) is yet another API function that will get the correct sitemap URL for you. It accepts one parameter: the sitemap slug, which is expected to be the same as your module’s name (i.e. ‘mysitemapindex’ in the above example).

Now try browsing to http://example.com/mysitemapindex.xml and you should see your new sitemapindex ready to be crawled!

Other Notes

URLs to all generated sitemaps are affected by the current permalink settings on your website/blog. If you don’t use pretty permalinks, your sitemaps’ URLs will be similar to this http://example.com/?bwpsitemap=module. For example your sitemapindex will become http://example.com/?bwpsitemap=sitemapindex.

Search engines can crawl such URLs just fine, and this plugin should be able to change all sitemap URLs for you every time you change your permalink setting. Please note that, however, cached sitemaps won’t be changed when you change permalink settings, you will have to manually flush the cache, or simply wait for them to be refreshed.

In addition, if you don’t like the word bwpsitemap, you can change that using a filter. Refer to the Hook References section for more details.

Known Issues

All sitemaps are dynamically generated so there are no actual sitemaps created in your website’s root (except for those in the cache folder). This could lead to a very common error called ‘Content Encoding Error’, i.e. the content of an xml sitemap is corrupted. This might be caused by minor bugs from this plugin itself or bugs from other plugins or themes. If you encounter such error, please visit the FAQ section for some possible solutions.

To-do List

  • Add Image sitemap (1.4.0)
  • Add VIdeo sitemap (1.4.0)
  • Review X-Robots tags (1.4.0)
  • Support custom taxonomies for news sitemap (1.4.x)

Hook References

  • bwp_gxs_query_var_non_perma – Used to change the default bwpsitemap query var when pretty permalink is not set. (filter)
  • bwp_gxs_xslt – Used to define the custom XSLT style sheet’s URL (filter)
  • bwp_gxs_cache_dir – Used to define the custom cache directory (filter)
  • bwp_gxs_module_dir – Used to define the custom module directory (filter)
  • bwp_gxs_module_mapping – Used to map a module to another module, for example you can map post_format to post_tag. This will be explained in more details later. (filter)
  • bwp_gxs_rewrite_rules – Used to define your own rewrite rules. This should be used when you add a custom sitemapindex. Example above. (filter)
  • bwp_gxs_excluded_posts – Allows you to return an array of post IDs to exclude from a specific post-based sitemap. (filter, additional variable: $post_type – the currently requested post type)
  • bwp_gxs_post_where – Allows you to filter the ‘where’ part in post modules’ queries. (filter, additional variable: $post_type – the currently requested post type)
  • bwp_gxs_excluded_terms (previously bwp_gxs_term_exclude) – Allows you to return an array of term slugs to exclude from a specific taxonomy-based sitemap. (filter, additional variable: $taxonomy – the currently requested taxonomy)
  • bwp_gxs_sitemap_lastmod – Allows you to modify last modified dates of sitemap entries in a sitemapindex programmatically. (filter, expects an ISO 8601 date, additional variable: $lastmod – Unix timestamp of a sitemap’s modification time, $item – the sitemap entry data, $part – part of the sitemap entry if split)
  • bwp_gxs_freq – Allows you to use your own algorithm to calculate change frequency. (filter, additional variable: $post object)
  • bwp_gxs_priority_score – Allows you to use your own algorithm to calculate priority. (filter, additional variables: $post object, $freq – calculated change frequency)
  • bwp_gxs_news_name – Allows you to set a custom sitename for your news sitemap without having to change the sitename setting inside WordPress (filter).
  • bwp_gxs_news_keyword_map – Allows you to map your categories in your language to Google News’s suggested categories in English (filter).
  • bwp_gxs_modules_built – Fire after all default modules are defined. Use this to add or remove modules from the default sitemapindex. (action)

Contribute to this Plugin

This plugin is licensed under GPL version 3, and it needs contributions from the community.

Buy me some special coffees!

My plugins and support for them are free. If you like my work and could buy me some (special) coffees, I would be much appreciated! They might help with some overnight times debugging my plugins, you know.

Module Submission

You can help the development of this plugin by either:

  • Make a cool module and submit it!
  • Improve a default module and submit it (if you know how to use Git, also check the Git Repository below.)

Support, Feedback, and Code Improvement

i18n (Translate the plugin)

If you are a translator, please help translating this plugin. Even if you aren't, you can become one, it is very easy and fun! If you want to know how, please read here: Create a .pot or .po File using Poedit.

References

  1. http://www.robotstxt.org/robotstxt.html []
  2. http://php.net/manual/en/keyword.extends.php []
  3. http://codex.wordpress.org/Function_Reference/wpdb_Clas ... wpdb_Class [] [] []
  4. http://codex.wordpress.org/Function_Reference/query_pos ... uery_posts []
Print Article Watch Log