Automating Indexing

Tutorial: Core Index - Manual & Cron job

Even though XTDir makes it very easy to index SobiPro sections in your Joomla!™ site, by default on Page Load mode, it still requires visits in to the site's frontend; or it requires you to login, click on the Index button to complete the Indexing operation.

Our job is to automate your life, making repeated and time consuming procedures a breeze. To this end we offer different indexing automation possibilities for XTDir:

  1. Page Load mode
  2. Cron Job mode
  3. Manually (click on the Index button)

Cron Job mode is only available in XTDir for SobiPro

To run smoothly, you will need to set up a CRON job to execute periodically. By unloading this processing task to a scheduled job, the extension will be able to run faster and in a predictable way.

A cron job task can be executed every minute. XTDir only performs a full index update when there is new information.

The CRON jobs recommended to:

  • Avoid excessive processing during a normal page load
  • Process new and updated entries
  • Regenerate XTDir's Tree Index
  • Update Promoted Entries positions
  • Update Primary and Second orders

There are two ways to execute the CRON job:

  • Web CRON job script
  • Command Line Interface (CLI) - Native CRON script (recommended)

The front-end processing feature is intended to provide the capability to perform an unattended, scheduled indexing SobiPro sections in your site.

The front-end indexing URL performs a single indexing step. You will only see a message upon completion, should it be successful or not. There are a few limitations, though:

It is not designed to be run from a normal web browser, but from an unattended CRON script, utilizing wget or CRON as a means of accessing the function.

The script is not capable of showing progress messages.

Do you want to automate your indexing despite your host not supporting CRON? Webcron.org fully supports XTDir's front-end indexing feature and is dirt cheap - you need to spend about 1 Euro for 1000 indexing runs.

Before beginning to use this feature, you must set up XTDir to support the front-end indexing option. First, go to XTDir's main page and click on the Component Configuration / Core Index of SobiPro Entries menu item. Find the option titled Core Index Mode, enable Manual - Cron job Task. Below it, you will find the option named Secret word. In that box you have to enter a password which will allow your CRON job to convince XTDir that it has the right to publish from the call. Think of it as the password required to enter the VIP area of a night club. After you are done, click the Save button on top to save the settings and close the dialog.

Enable Cron mode

Use only lower- and upper-case alphanumeric characters (0-9, a-z, A-Z) in your secret key. Other characters may need to be manually URL-encoded in the CRON job's command line. This is error prone and can cause the indexing operation to never start even though you'll be quite sure that you have done everything correctly.

Most hosts offer a CPanel of some kind. There has to be a section for something like "CRON Jobs", "scheduled tasks" and the like. The help screen in there describes how to set up a scheduled job. One missing part for you would be the command to issue. Simply putting the URL in there is not going to work.

If your host only supports entering a URL in their "CRON" feature, this will most likely not work with XTDir. There is no workaround. It is a hard limitation imposed by your host. We would like to help you, but we can't. As always, the only barrier to the different ways we can help you is server configuration.

If you are on a UNIX-style OS host (usually, a Linux host) you most probably have access to a command line utility called wget. It is almost trivial to use:

wget "http://www.yoursite.com/index.php?option=com_xtdir&view=cron&task=run&key=YourSecretKey"

Do not forget to surround the URL in double quotes. If you don't the command will fail and it will be your fault! The reason is that the ampersand is also used to separate multiple commands in a single command line. If you don't use the double quotes at the start and end of the indexing URL, your host will think that you tried to run multiple commands and load your site's homepage instead of the front-end indexing URL.

If you are unsure, check with your host. Sometimes you have to get from them the full path to wget in order for CRON to work, thus turning the above command line to something like:

/usr/bin/wget "http://www.yoursite.com/index.php?option=com_xtdir&view=cron&task=run&key=YourSecretKey"

Contact your host; they usually have a nifty help page for all this stuff. Read also the section on CRON jobs below.

wget is multi-platform command line utility program which is not included with all operating systems. If your system does not include the wget command, it can be downloaded at this address: http://wget.addictivecode.org/FrequentlyAskedQuestions#download. The wget homepage is here: http://www.gnu.org/software/wget/wget.html.

The ampersands above should be written as a single ampersand, not as an HTML entity (&). Failure to do so will result in a 403: Forbidden error message and no indexing will occur. This is not a bug, it is the way wget works.

Assuming that you have already bought some credits on webcron.org, here is how to automate your indexing using their service.

Before beginning to use this feature, you must set up XTDir to support the front-end indexing option. First, go to XTDir's main page and click on the Component Configuration / Core Index of SobiPro Entries menu item. Find the option titled Core Index Mode, enable Manual - Cron job Task. Below it, you will find the option named Secret word. In that box you have to enter a password which will allow your CRON job to convince XTDir that it has the right to publish from the call. Think of it as the password required to enter the VIP area of a night club. After you are done, click the Save button on top to save the settings and close the dialog.

Enable Cron mode

We strongly recommend using only alphanumeric characters, i.e. 0-9, a-z and A-Z. For the sake of this example, we will assume that you have entered ak33b4s3cRet in that field. We will also assume that your site is accessible through the URL http://www.example.com.

Log in to webcron.org. In the CRON area, click on the New Cron button. Here's what you have to enter at webcron.org's interface:

Name of Cron job: anything you like, e.g. "XTDir www.example.com"

Timeout: 180sec; if the indexing doesn't complete, increase it. Most sites will work with a setting of 180 or 600 here. If you have a many sections and entries which takes more than 5 minutes to process, you might consider using XTDir native CRON script (xtdir_indexer.php) instead, as it is much more cost-effective.

Url you want to execute: http://www.example.com/index.php?option=com_xtdir&view=cron&task=run&key=ak33b4s3cRet

Login and Password: Leave them blank

Execution time (the grid below the other settings): Select when you want your CRON job to run

Alerts: If you have already set up alert methods in webcron.org's interface, we recommend choosing an alert method here and not checking the "Only on error" so that you always get a notification when the indexing CRON job runs.

Now click on Submit and you are all set up!

If you are still having trouble with the above options, and your hoster does not provide any Cron job support, use any free online Cron job scheduler service available on the web. Do not worry on security because what these sites do is merely accessing the Url at certain interval, which is open to anyone.

Popular services include:

SiteGround and other hosts using cURL instead of wget

Finding the correct command to issue for the CRON job is tricky. This recipe applies not only to SiteGround, but many other commercial hosts as well.

In the CPanel for SiteGround there is a Cron job option, you create a Cron job using that and use:

curl -b /tmp/cookies.txt -c /tmp/cookies.txt -L -v "<url>"

as your command.

SiteGround and other hosts using lynx instead of wget

Lynx is a text-based browser that is installed in most hosting environments.

On most Linux systems, you can simply run the commands below. We would recommend running the CRON every thirty (30) minutes or less. On a busy site, you might want to run it every ten (10) minutes. The more frequently you run it, the less load there will be on the server.

lynx -source "http://your.domain.com/index.php?option=com_xtdir&view=cron&task=run&key=My-Secret" > /dev/null

If you do not have Lynx installed, you can use other alternatives such as wget, detailed above.

Don't worry, this operation actually runs very fast and has very little impact on the server, equivalent to a normal single page load.

To add a new CronJob in cPanel 11, login to your cPanel and click Cronjobs under the Advanced section as the screenshot below.

To add a new CronJob in cPanel 11

After clicking Cronjobs, you will be directed to a page similar to the one below. In this example, click Standard to proceed.

After clicking Cronjobs, click Standard to proceed

Enter the following command in the Command to run field on the screen shown below it:

lynx -source "http://your.domain.com/index.php?option=com_xtdir&view=cron&task=run&key=My-Secret" > /dev/null

Select Every Five Minutes, Every Hour, Every Day, Every Month, and Every Week Day so that the action above will be executed every five (5) minutes perpetually.

Select Every Five Minutes, Every Hour, Every Day, Every Month, and Every Week Day

Users report that they get no joy using this script on GoDaddy hosting.

If you have access to the command-line version of PHP, XTDir for SobiPro includes an even better - and faster - way of indexing your messages. All XTDir for SobiPro releases include the file cli/xtdir_indexer.php, which can be run from the command-line PHP interface (PHP CLI). In contrast with previous releases, it doesn't require the front-end indexing in order to work; it is self-contained, native indexing for your Joomla!™ site, even if your web server is down!

In order to schedule a indexing, you will have to use the following command line to your host's CRON interface:

/usr/local/bin/php /home/USER/webroot/cli/xtdir_indexer.php

where /usr/local/bin/php is the path to your PHP CLI executable and /home/USER/webroot is the absolute path to your web site's root. You can get this information from your host.

In order to give some examples, I will assume that your PHP CLI binary is located in /usr/local/bin/php - a common setting among hosts - and that your web site's root is located at /home/johndoe/httpdocs.

/usr/local/bin/php /home/johndoe/httpdocs/cli/xtdir_indexer.php

Special considerations:

  • Most hosts do not impose a time limit on scripts running from the command-line. If your host does and the limit is less than the required time to publish from your site, the job will fail.
  • This script is not meant to run from a web interface. If your host only provides access to the CGI or FastCGI PHP binaries, xtdir_indexer.php will not work with them. The solution to this issue is tied to the time constraint detailed above.
  • Some servers do not fully support this indexing method. The usual symptoms will be a job which starts but is intermittently or consistently aborted in mid-process without any further error messages and no indication of something going wrong. In such a case, trying running the indexing from the back-end of your site will work properly.

Go to your cPanel main page and choose the CRON Jobs icon from the Advanced pane. In the Add New CRON Job box on the page which loads, enter the following information:

  • Common Settings, Choose the frequency of your indexing, for example once per day.
  • Command Enter your indexing command. Usually, you have to use something like:

    /usr/bin/php5-cli /home/myusername/public_html/cli/xtdir_indexer.php

where myusername is your account's user name (most probably the same you use to login to cPanel). Do note the path for the PHP command line executable: /usr/bin/php5-cli. This is the default location of the correct executable file for cPanel 11 and later. Your host may use a different path to the executable. If the command never runs, ask them. We can't help you with that; only those who have set up the server know the changes they have made to the default setup.

Finally, click the Add New Cron Job button to activate the CRON job.

The location of the PHP CLI binary is /usr/bin/php-cli. This means that your CRON command line should look like:

/usr/bin/php-cli /home/myusername/public_html/cli/xtdir_indexer.php

Finally, it should be noted that you can use the command-line override feature to do more tricky configuration overrides, for example turning off the archive splitting or using a different indexing output directory to enhance your security. If it is something you can do in the Configuration page of the component, you can also do it using command line overrides.

There is an alternative to wget, as long as your PHP installation has the cURL extension installed and enabled. For starters, you need to save the following PHP script as xtdir.php somewhere your host's CRON feature can find it. Please note that this is a command-line script and does not need to be located in your site's root; it should be preferably located above your site's root, in a non web-accessible directory.

In order to configure it for your server, you only have to change the first three lines.

    <?php
        // Base URL of your site
        define('SITEURL', 'http://www.example.com');

        // Your secret key
        define('SECRETKEY', 'MySecretKey');

        // ====================== DO NOT MODIFY BELOW THIS LINE ======================
        $curl_handle = curl_init();
        curl_setopt($curl_handle, CURLOPT_URL, SITEURL . '/index.php?option=com_xtdir&view=cron&task=run&' . SECRETKEY);
        curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
        $buffer = curl_exec($curl_handle);
        curl_close($curl_handle);
        if (empty($buffer))
        {
            echo "Sorry, the indexing didn't work.";
        }
        else
        {
            echo $buffer;
        }
    ?>

Where www.yoursite.com and YourSecretKey should be set up as discussed in the previous section.

The ampersands above should be written as a single ampersand, not as an HTML entity (&amp;). Failure to do so will result in a 403: Forbidden error message and no indexing will occur. This is not a bug, it is the way wget and PHP work.

In order to call this script with a schedule, you need to put something like this to your crontab (or use your host's CRON feature to set it up):

0 3 * * 6 /usr/local/bin/php /home/USER/xtdir_indexing/xtdir.php

Where /usr/local/bin/php is the absolute path to your PHP command-line executable and /home/USER/xtdir_indexing/xtdir.php is the absolute path to the script above.

If you set up your CRON schedule with a visual tool (for example, a web interface), the command to execute part is "/usr/local/bin/php /home/USER/xtdir_indexing/xtdir.php".