Category Archives: Webmaster

DW Questions Answers WordPress Plugin and Multisite

Although the DWQA Questions and Answers plugin for WordPress has been updated to work with Multisite, it probably won’t work as expected.  At the time of this writing, the problem appears to be new user creation and the multisite default functionality.  In order to accept the DWQA default method of answering questions, which is that only registered users can post answers to questions asked by anonymous users, you have to allow them to register.  First, on my Multisite subsite I had no option to allow user registration.  Even after enabling it at my main blog as network administrator it was not available under the subsite options.

Other users have reported that once you are able to allow registration on a subsite, that clicking the link will redirect the user to the network main site and allow them to register there, which is probably not the desired action for a DW Questions and Answers site.  It also allows that user to be login to all other sites on the network.

What would be desired here is either a separate login that is not tied to the wordpress registered users login, or the use of social apis so users could use an existing social account to login for posting and answering questions.

WordPress Automatic Downloads and Plugin Updates Permissions

Going crazy following posts on the Internet trying to set permissions on your WordPress directory to get automatic plugin downloads and updates working?  So was I.  Several times on several sites.  Trying to keep security in mind I didn’t want to give too much.  After applying the DIRECT method to the wp-config.php file I had to do this to get plugins to download, create the directory and install.  Keep in mind all permissions were the default except for this:

chmod 775 wp-content/
chgrp nobody wp-content/
chgrp nobody -R wp-content/plugins/

‘Nobody’ should be the user that the apache process runs under. You do not need to set permissions to 777.

MySQL too many connections – solution

When greeted with a MySQL error this morning stating that a database connection could not be made from our PHP web application I had to do some testing.  First, try connecting to the database.  I did this from a remote host and was thrown back an “Error: too many connections.”

From there you should login as root from the localhost and issue:

SHOW FULL PROCESSLIST;

This will give a list of the connections. Take a good look at the big offenders because you’ll need to troubleshoot that application later and find out why it has so many connections.  Now that you know the problem, restart your MySQL and connections should resume as normal (with the excessive stale connections released) and you can go about the repair.

Also, in your my.cnf you can reduce the wait_timeout which will reduce the amount of time a connection can be held open by an application. The default is 28800 seconds, or 8 hours. I reduced mine to 30 minutes. You could go less but we have some apps that do their own connection pooling and don’t want to mess that up. In my.cnf:

wait_timeout = 1800

If you are absolutely sure you need more connections, you can increase this from the default 151 in your my.cnf as well:

# default max_connections is 151 (1 spare for super user)
max_connections = 201

Now you need to keep an eye on the situation by issuing the SHOW FULL PROCESSLIST once in a while to see if they are filling up fast. And don’t forget to visit any applications that may be showing up in there frequently.  I found two applications tha were using persistent connections, being held open for two hours, and they just didn’t need to.

PHP APC config syntax causes [apc-error] apc_mmap: mmap failed: Invalid argument

After upgrading Ubuntu server from 9.10 to 10.04LTS PHP’s APC cache wasn’t functioning.  Apache wouldn’t start, it hung in the process list and printed this error to /var/log/apache2/error.log

[apc-error] apc_mmap: mmap failed: Invalid argument

The apache process would show up in the process like this:

apc@hostaname# ps aux | grep apache
www-data 6958 104 0.0 139044 3624 ? R 12:47 0:19 /usr/sbin/apache2 -k start

This process would then have to be killed, APC commented out, and then the web server restarted just to continue on without APC until a solution was found.

The PHP manual states this regarding MMAP support in APC:

http://php.net/manual/en/apc.configuration.php

When APC is compiled with mmap support (Memory Mapping), it will use only one memory segment, unlike when APC is built with SHM (SysV Shared Memory) support that uses multiple memory segments. MMAP does not have a maximum limit like SHM does in /proc/sys/kernel/shmmax. In general MMAP support is recommeded because it will reclaim the memory faster when the webserver is restarted and all in all reduces memory allocation impact at startup.

APC was made to run by commenting out all lines from the PHP config file except for:

extension=apc.so
apc.enabled = 1

This config can exist in a number of places. In 9.10 APC had been compiled by PECL so it was in our /etc/php5/apache2/php.ini file. However, in 10.04 APC is a package so we removed the PECL version, installed the version using apt-get install php-apc and moved the configuration to /etc/php5/conf.d/apc.ini for better consistency.

pear uninstall apc
apt-get install php-apc

As I began to uncomment lines one by one, it turned out the culprit was in the apc.shm_size directive. The default size is 30M, but as soon as the directive was uncommented it crashed Apache. I was unable to specify any value at all, even the same or lesser value. I even tried with quotes and removing quotes. That’s when I started thinking syntax may be a problem because it works when using the default value (shm_size commented out) but fails with an “invalid argument” error. That makes me think APC is sending an invalid argument to MMAP. In which case I find this post that confirms my suspicion.

http://stackoverflow.com/questions/6716929/apc-configuration-on-ubuntu-10-4-problem-with-apc-shm-size-apc-shm-segments-an

It turns out that the “M” for Megabytes cannot be specified in the shm_size directive for APC in Ubuntu server 10.04 because it is using APC version 3.1.3p1. However, on 9.10 APC wasn’t included as a package so it was installed with PEAR PECL which installed a more recent version of APC (3.1.9) which did allow specifying the “M” in the shm_size directive.

If you wish this to work in your config file, it should read like this in older versions of APC:

apc.shm_size = 100

This would specify 100M shared memory segments, and would be equivalent to this in newer versions:

apc.shm_size = 100M

And you can also put quotees around the “100M” if you like.

After these changes I had Apache up and running again, the APC cache helping PHP along, and some of the quickest loading pages I’ve seen in a while.

How To Disable WordPress Plugins Manually

If you get stuck being unable to login to WordPress because of a misbehaving plugin, you have a couple options.

You can login to your WordPress database using command line MySQL client or phpMyAdmin, and from there look for the wp_options table. If you are using a security plugin that changes table prefixes you’ll have to replace wp_ with your prefix. Once there, look for a value in the option_name column called active_plugins. You can find it like this:

SELECT * FROM wp_options WHERE option_name like '%plugin%';

This will give you a list of plugin options which you can examine. If you are using WPMU or WP3 Multisite and this is a globally enabled plugin, you’ll need to then use table name with the ID of the site in the prefix, like wp_2_options instead of wp_options. It used to be that option_name “active_plugins” would contain all plugins, but it appears from examining this that option_name “transient_plugin_slugs” contains plugins. If you choose to modify it, copy and paste the original value somewhere for easy editing and replacement once fixed.

The option I chose for this Multisite plugin that prevented login was to simply rename the folder of it in the plugins directory. That would cause WordPress not to load the bad files, because they were no longer in the expected location.

Once into the admin dashboard page, you can then disable the plugin completely and take whatever actions you need to repair, replace or trash it for good.

Creating MySQL Users and Granting Permissions

When should you create a new user in MySQL? That’s a question that you have to ask whenever you add a new application or restructure your data. I’ve been in places where every application connects as root, and obviously that’s very bad.

Only you know your situation. When I build web applications sometimes I’ll have a shared database, such as geographic information or lists or musical artists, that I want multiple applications to have access to. But only read access, they can’t change it. So root is out. But they also have their own database (in MySQL) or schema (in PostGreSQL) in which they will need to INSERT, UPDATE, DELETE and SELECT from tables. They may even need to CREATE tables, such as temp tables, depending on the situation. In these cases, such applications would require a dedicated user, because their are no commonalities that would allow this user account with its privileges to be reused by another application.

In this guide I’ll only deal with command line actions from MySQL client, not CPanel actions, but you’ll be able to transfer the concept to the different interface.

To create users you’ll most likely have to be root:

mysql -u root -p

Create the new user for this application:

CREATE USER 'username'@'hostname' IDENTIFIED BY 'password';

You’ll have to think about this

hostname

setting. If the only one accessing your database with this username is your application, and your MySQL process is on your local machine, then ‘localhost’ will do as a hostname. If you have a separate machine running MySQL as I do, then you’ll need to enter the hostname or IP address of your web server machine, as that is where the application will connect from.

CREATE USER 'username'@'webserver' IDENTIFIED BY 'password';

It is possible to use wildcards for the hostname, although you should think this through as generally it is a bad idea. If you are on an internal network and you have your web server cluster and SQL machine on the same subnet, you could do this:

CREATE USER 'username'@'10.20.30.%' IDENTIFIED BY 'password';

Which will allow 256 machines to connect to your MySQL server as that user. Think about the future use of your network before you do this.

GRANTing PRIVILEGES

Now you have a user who can’t do much. You’ll need to give them access to something. Since the user I’m creating will be the one populating the database with information and not just reading it, they will need greater privileges than the average web application that just reads and reports. The GRANT syntax can be found in the MySQL manual. My specific requirement allowed me to enter:

mysql> GRANT ALTER, CREATE TEMPORARY TABLES, DELETE, INSERT, LOCK TABLES, SELECT, UPDATE ON dbname.* TO userguy@10.20.30.40

  • This GRANT is followed by a list of actions the user can take as seen in the manual page quoted above, (SELECT, DELETE, etc).
  • ON a certain database and tables – dbname.* means these privileges apply to ALL tables on the database called dbname.
  • TO username @ IP address – meaning if this user connects from another address they won’t receive these privileges

DON’T enter:
mysql> GRANT ALL ON *.* TO userguy@%

Which is the equivalent of giving them root privileges on all tables of all databases when connecting from anywhere. Pointless to try and manage users this way. And never grant *.* to a user anyway unless you aware of what all the * privileges are. Most people would be mortified to find out they used *.* and granted their web user the ability to DROP all their databases. Of course they don’t find out until after it has happened if they don’t know the user had that ability with *.* to start with.

Now, just to be clear everything is in effect, issue the

FLUSH PRIVILEGES

command.

Keep in mind, that in newer versions, MYSQL will create a user when a GRANT is issued if the user doesn’t already exist. However, I prefer to explicitly CREATE a user, and keep track of it in a file.

Connecting remotely requires configuration settings changes. If you are not connecting locally, you’ll need to modify your MySQL’s my.cnf file to allow remote connections. This is part of your extended configuration and security, and I won’t discuss it in this article.

Now you can go about creating users and granting privileges with a total security plan in mind. Everyone can access only what they need, no one can access or change what they shouldn’t, and users can only connect from specific locations.

Optimizing Mysql Data Types

Optimizing MySQL datatypes is an important task for anyone, whether you are running a single website or a cluster of geographically distributed MySQL machines. Efficiency in your database driven application all begins with selecting the right datatypes for your columns.

In the process of a recent project I had to get data from a third party source and store it in a local MySQL database, for which we chose to build a dedicated MySQL server machine with a RAID 10 disk array. I had no idea what the data we retrieve would look like. All we had was a list of columns and nothing else. In order to get started we had to guess at the length of the data based on the column names assuming they were descriptive of the data they held. An initial table contained column names followed by simple VARCHAR(255) declarations. Totally unacceptable, but it was a start.

What you got there?

Once the data came back I had to examine each column in each table to see what they held. There are many ways of doing this including using MAX() and MIN() funtions like so:


mysql> SELECT MAX(LENGTH(first_name)) FROM emp;
+-------------------------+
| MAX(LENGTH(first_name)) |
+-------------------------+
| 14 |
+-------------------------+

This tells me that the maximum length of an employees first name is only 16 characters long. So why a VARCHAR(255) or a CHAR(100) or anything else? Let’s look at an example of the longest names in the table:


mysql> SELECT DISTINCT(first_name) FROM emp WHERE LENGTH(first_name) = 14;
+------------------+
| first_name |
+------------------+
| Thomas Michael |
+------------------+

That query gives us only the distinct uses of first_name that are 14 characters long. An examination of the real data is important when selecting datatypes, so you may want to do this again using > 10 in place of = 14, and you can even include a count of how many employees have large first_names, and any other queries that may help you understand the data you are working with.

The Average Joe

Another useful query is to determine the average length of the first name column:


mysql> SELECT AVG(LENGTH(first_name)) FROM emp;
+-------------------------+
| AVG(LENGTH(first_name)) |
+-------------------------+
| 5.9771 |
+-------------------------+

With the average length of a column being only 6 characters, does it make sense to define a fixed field of CHAR(14) to accomodate the longest field when only one employee has a first name that long? Would it be better to use a VARCHAR column anyway? A quick examination of how many names have each character count is required:


SELECT LENGTH(first_name) AS size,
COUNT(*)
FROM emp
GROUP BY LENGTH(first_name);

+——+———-+
| size | COUNT(*) |
+——+———-+
| 1 | 0 |
| 2 | 21 |
| 3 | 85 |
| 4 | 626 |
| 5 | 1259 |
| 6 | 1626 |
| 7 | 958 |
| 8 | 345 |
| 9 | 183 |
| 10 | 38 |
| 11 | 37 |
| 12 | 9 |
| 13 | 9 |
| 14 | 1 |
+——+———-+

 

This tells me that most people in our system have first names with an average character length between 4 and 7. Posessing and understanding this information will allow me to make decisions on the best column type. This information is also valuable when creating partial indexes on VARCHAR columns.

Procedure Analyse

One function available with MySQL is PROCEDURE ANALYSE. The manual for it is here:

http://dev.mysql.com/doc/refman/5.0/en/procedure-analyse.html

Replace 5.0 in the URL with your version.

The description from the manual for ANALYSE is:

ANALYSE() examines the result from a query and returns an analysis of the results that suggests optimal data types for each column that may help reduce table sizes.

That just took the guesswork out of your column declaration optimizations! Knowing your data is still critical because ANALYSE is still capable of guessing wrong, but it is another tool to assist you. To view the output of ANALYSE for the first name column, issue a query as follows:


SELECT first_name
FROM emp
PROCEDURE ANALYSE();

+————————+———–+———–+————+————+——————+——-+————————-+——+———————-+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std | Optimal_fieldtype |
+————————+———–+———–+————+————+——————+——-+————————-+——+———————-+
| schema.emp.first_name | A | Zyrida | 1 | 16 | 0 | 0 | 5.9771 | NULL | VARCHAR(16) NOT NULL |
+————————+———–+———–+————+————+——————+——-+————————-+——+———————-+
1 row in set (0.01 sec)

As you can see in the output, PROCEDURE ANALYSE() returns a single row with:

  • the column name
  • the MIN and MAX values (not important for strings)
  • the MIN and MAX lengths (queries we ran manually above)
  • how many columns are empty or NULL
  • the AVG value (for numbers) or AVG length (for strings)
  • the population standard deviation (STD)
  • a recommended column definition called Optimal_fieldtype

Now, you have to be careful with this procedure when considering accepting the Optimal_fieldtype when passing the default arguments to PROCEDURE ANALYSE(). In a lot of cases, even in the case of a BOOLEAN type, it will return an ENUM, such as ENUM(‘0’, ‘1’). Close attention to detail is important in these cases. In the following example the column only holds one possible values at this time. It is an unfortunate case, but the data that is being returned to me uses this as a foreign key and as part of a primary key on many tables.


SELECT id
FROM company
PROCEDURE ANALYSE();
+--------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std | Optimal_fieldtype |
+--------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
| schema.company.id | 20000 | 20000 | 5 | 5 | 0 | 0 | 20000.0000 | 0.0000 | ENUM('20000') NOT NULL |
+--------------+-----------+-----------+------------+------------+------------------+-------+-------------------------+--------+------------------------+
1 row in set (0.00 sec)

In actuality, this ID is a numeric value, and further numeric ID values may be added over time, so a numeric data type would be more approriate than an ENUM. In order for PROCEDURE ANALYSE to return a correct number to us we’ll have to pass some arguments to it. According to the manual, the arguments are as follows:

max_elements (default 256) is the maximum number of distinct values that ANALYSE() notices per column. This is used by ANALYSE() to check whether the optimal data type should be of type ENUM; if there are more than max_elements distinct values, then ENUM is not a suggested type.

max_memory (default 8192) is the maximum amount of memory that ANALYSE() should allocate per column while trying to find all distinct values.

Retrying that query with arguments (1,24) we get:


SELECT id
FROM company
PROCEDURE ANALYSE(1,24);

+————–+———–+———–+————+————+——————+——-+————————-+——–+——————————-+
| Field_name | Min_value | Max_value | Min_length | Max_length | Empties_or_zeros | Nulls | Avg_value_or_avg_length | Std | Optimal_fieldtype |
+————–+———–+———–+————+————+——————+——-+————————-+——–+——————————-+
| schema.company.id | 20000 | 2000 | 5 | 5 | 0 | 0 | 20000.0000 | 0.0000 | SMALLINT(5) UNSIGNED NOT NULL |
+————–+———–+———–+————+————+——————+——-+————————-+——–+——————————-+
1 row in set (0.00 sec)

As you can see, optimize is now suggesting an Optimal fieldtype of SMALLINT(5) UNSIGNED NOT NULL which has a numeric value range of 0 to 65535 which is adequate for the purposes of our current and future ID allocation in this column.

Planning for the future

One thing to remember is that the data you are seeing today isn’t necessarily representative of the data you’ll get in the future, unless you have a really large set defined by someone else. You’ll have to be careful with the data types you define. Don’t make them unnecessarily large, but you can plan some wiggle room from the PROCEDURE ANALYSE optimal value. Use your mind here.

If you are creating your own data from scratch such as with a form entered on a website, then this technique will be most usefully applied by you after you’ve been up and running for some time and have some real data to analyze. If you are working with a small schema you’ll be able to ALTER your columns to the new data types after the fact.

Apache, Postgres and Drupal on a VPS

I really would prefer to have my own server but sticking a box in colo is expensive. Where I live, getting access to the physical colo space would be nearly impossible too. As a result I run on a VPS. Unfortunately VPS has some horrible limitations depending on who is on the box with you.

Recently I decided to move my personal blog off of b2evolution and stick it on Drupal. Too bad drupal is such a resource hog. Most CMS and blog software is though and it is really hard to find a minimalized, optimized blog software that uses any form of caching. Today, it hit the skids and my patience hit the wall. Argh!

I was converting my personal blog by hand because I only have about 30 entries so it didn’t pay to write a conversion script. Everytime I hit the ‘post’ button in Drupal I wound up with a blank screen, could not connect or worse, the results of any command in SSH terminal window showed a “could not allocate memory”. As a result, I had to do some fast tuning of something because I had to reboot the server after every blog post!

I chose to tackle Apache first because they have an Apache Performance Tuning Guide that helps a bunch. Postgres, well I’m running an old version that I really need to upgrade before I tackle configuration and optimization of it. That’s not a 30 minute job.

VPS, Apache and Low Memory

VPS has a low memory for sure. Even though you can sometimes burst more, in general it is pretty low. The very first thing in the Apache performance tuning guide is tackling memory.


You can, and should, control the MaxClients setting so that your server does not spawn so many children it starts swapping. This procedure for doing this is simple: determine the size of your average Apache process, by looking at your process list via a tool such as top, and divide this into your total available memory, leaving some room for other processes.

Using top and sorting by thread using ‘h’ I am able to see that the average Apache processes is using WAY TOO MUCH memory at 25M a piece – and 10 processes running. I don’t have time to tune the size now so I’ll tune the number of servers using very simple configuration parameters. Since we are using MPM Prefork, the directives can be found in extra/httpd-mpm.conf file under mpm_prefork_module.

Since I am supposed to be guaranteed 256M memory burstable to 1G I’ll optimize to the lower number. 256M / 25M is 10. Not including room for other processes. The current setting is 150!

From:


StartServers 5
MinSpareServers 5
MaxSpareServers 10
MaxClients 150
MaxRequestsPerChild 0

To:


StartServers 2
MinSpareServers 2
MaxSpareServers 5
MaxClients 10
MaxRequestsPerChild 0

I also changed it to only start 2 servers, only keep 2 spare servers instead of 5, and only allow 10 clients instead of 150. This will essentially create a queue if anyone is waiting but it shouldn’t dip into swap space and it will save a bunch of memory usage. This will of course be monitored. Once time permits and I am able to minimize the size of each Apache2 process (the other factor) then I will revisit this and likely increase the MaxClients accordingly.

Drupal multisite configuration problems

Drupal multisite setup configuration

I’ve adopted Drupal for a majority of my online activities in the past year. It has enough of the things I like and is much faster and more stable than previous CMS or forum software I’ve used. However, some documentation seems to be lacking. Multisite configuration with Apache using a single Drupal codebase is one area.

I finally decided to try a project in which I would use a common codebase for Drupal across all of my websites. That is, rather than having a directory for each website that has a Drupal installation in it, I thought I would take advantage of Drupal’s multisite functionality by having one drupal installation (codebase) in a central directory and have all the project websites point to it.

Drupal multisite advantages

The advantages of Drupal’s multisite feature are easy to spot.

First, a common codebase means only one codebase to change during an update – even though drupal requires you to update each websites database separately.
Second, using an intermediary cache like APC for PHP means that you can use less server memory caching files across multiple websites because the core files are the same.

Despite this great feature and its advantages, there is much confusion at the Drupal.org website from patrons as to what is the correct way to set this up. The Drupal install file goes into it pretty good, but leaves out one important detail – the web server configuration. Some purists say that is an Apache issue and we should leave that to the Apache mailing list but hey, PHP does a pretty good job of detailing the Apache configuration in their instructions so Drupal could too.

Most people who administer their own websites would follow a ritual in creating a new website. Create a directory for the new website, create the www subdirectory and then configure the Apache DirectoryRoot directive. With a single codebase used by multiple domains and websites you need a different approach.

The missing multisite ‘key’

The most important thing here is to make sure each website’s DocumentRoot in the Apache configuration points to the common drupal directory. This is an unusual configuration for most people but it works well.

The idea is that Drupal receives information from PHP (and Apache) to tell it which website it is supposed to serve. When it determines that, it will retrieve the correct configuration file (database passwords and URL base) from the Drupal ‘sites/’ subdirectory. That’s how it can determine which website to display.

Here is a quick breakdown.

Install Drupal in a common directory. Here I chose /var/lib/drupal-4.7.4

Caveats

Some people use a symbolic link to point Drupal to the correct distribution. I’ve done this but I suggest not doing it. Instead install your Drupal installation in a single directory titled according to it’s version number. That is, use the same directory structure and names that Drupal gave when you extracted the zip or tarball.

Why? If you have a couple sites and it comes time to upgrade you can run into trouble. It is easy to just recreate a symbolic link pointing ‘drupal/’ to ‘drupal-4.7.4’ but unfortunately that affects ALL your websites instantly. Not good on a production server. If you have 50 websites this means that you have 50 sites using the new codebase and the same 50 websites awaiting your hand to manually update their databases using Drupal’s supplied script. If any of those 50 are accessed during this wait period you’ll be in trouble.

The other disadvantage of the symbolic link comes if you are using third-party supplied modules. You forgot that 5 out of your 50 websites are using a module that isn’t being maintained on the same schedule as the Drupal project and guess what? It breaks your website. I’ve found that frequently a misbehaving module or even an errant ‘files’ path in the Drupal settings will disable all other Drupal modules. Best to avoid this altogether.

You can save yourself from both of these negative scenarios by simply putting Drupal in the supplied directory name and then adjusting your Apache DocumentRoot directive as you update them. It is an extra step but very easy.

The final advantage of not using a symbolic link is that you can hold a stable website. That is, you can have a couple different versions of Drupal on your server used by different websites. Several hosting providers do this with PHP and that is a very good example. If you upgrade to PHP 6 for one project you’ll find it doesn’t work with Drupal so you need to keep an old version of PHP installed for non-compliant websites. With Drupal I’ve found that once in a while a website gets working just right and I don’t want to update it. Or I quit monitoring it. Or it is an informational content only website with no subscriptions allowed. Or a million other reasons. Basically I will ‘freeze’ that website and not allow anymore code updates. Or, say you have a module that you use but the author abandoned it long ago. It won’t work with Drupal 6 for example. Simply freeze the website and only update your other websites with the new Drupal codebase. Keeping a couple distributions around can be handy but it means that you can’t point them to the codebase with a symbolic link.

Unzip the new codebase in parallel to the old.

# tar zxvf drupal-4.7.x.tar.gz

Backup the database for the website you are about to update.
Exact commands vary with which database you are using.

Edit Apache config file to change one website at a time.

# vi extra/httpd-vhosts.conf

Change:


DocumentRoot /var/www/example.com/www
ServerName www.example.com

To:


DocumentRoot /var/lib/drupal-4.7.4
ServerName www.example.com

Restart Apache so the config takes effect

# apachectl graceful

Visit the Drupal supplied database update URL for your website.

http://www.example.com/update.php

Watch for errors and check your logs. Visit the website and check the settings to make sure all modules still show up under the admin -> settings menu.

If successful, continue on with the next website on your server.

Smarty for PHP – Caching

I never thought I would need to worry about caching. Hardware these days is fast. But the more I thought about efficiency the more I realized that my pages don’t change that frequently so why go through the overhead of connecting to a database on every page load? Or for that matter, why load a static file, unserialize something stored, recalculate the ‘year’ in the copyright tag. For that matter, why perform any of these tasks every time the page is loaded when they may change infrequently? Well, luckily Smarty has caching built-in and I found a use for it.

From the Smarty web site:
Here’s where I get happy. I never thought I would need to worry about caching. Hardware these days is fast. But the more I thought about efficiency the more I realized that my pages don’t change that frequently so why go through the overhead of connecting to a database on every page load? Or for that matter, why load a static file, unserialize something stored, recalculate the ‘year’ in the copyright tag. For that matter, why perform any of these tasks every time the page is loaded when they may change infrequently? Well, luckily Smarty has caching built-in and I found a use for it.

From the Smarty web site:

Caching: Smarty provides fine-grained caching features for caching all or parts of a rendered web page, or leaving parts uncached. Programmers can register template functions as cacheable or non-cachable, group cached pages into logical units for easier management, etc.

Wow. Seems like a lot is covered. I can cache the whole page if I want. I can cache the whole page EXCEPT for a dynamic element like a banner ad, or I can register a dynamic function that picks a random Amazon.com book in the midst of an otherwise static page – all while keeping to Smarty’s idea of separating the logic and presentation.

Database queries and caching with Smarty

In a recent project I had to put a full-text search form on a web site. This wasn’t easy because it was searching through HTML files – a mini Google if you will. The trouble was the function that creates the ‘headline’ clip from the page (that highlights the search terms) seemed to be very slow. After asking mailing lists and spending days in the manual I could not get the problem solved. I had a deadline to meet and could not change it.

What I did was implement the current version of the search – problem and all – and used Smarty caching to save load on the system (and save my bottom). Luckily there was an HTML list (links) of popular queries, and a mechanism to record the queries. I used a script to regenerate those queries and use Smarty to cache the results. The only time we had to connect to the DB and perform the query was the first time a search was performed. This was a lifesaver, and it dropped the time to query down to a value that was barely perceptible. Mainly because it rarely had to query the db!

Further Database Queries and Smarty

Another benefit of Smarty caching can be recognized when you have complex queries. We must all work within boundaries. Sometimes you don’t get the chance to select a better DB system, or add hardware. In other words, you work with what you have and find a way to make it better.

I once had to write a page for an e-commerce store that would display the top sellers. Not just top sellers, but three columns of top sellers. There was the top sellers this week, top sellers this month, and top sellers of all time. These changed because of frequent specials and of course, the time spent in the store made a difference in the rank. It was hard to push a product introduced a month ago and put on special this week unless it showed up in the list of top sellers for this week.

But how do you go about that? Performing the SQL queries is easy enough but getting a result back is hard. Really, there is no reason to perform this query 2000 times a day as people drift in and out of the store. The problem is, the store was on a shared host with a dedicated remote database. There was time associated in the connect of course but even more important was the 3 complex queries. Also, they chose a shared host meaning they could not just throw money at the hardware to speed up their queries.

The only problem query in this scenario was the all time best sellers top list. Of course we have no trouble with the others because indices on the right columns make those a snap. But how about sorting through sales records for 6 years and getting a rank for the top sellers? Owch. That can take a long time. Well, a long time from the perspective of a web surfer.

Four seconds to be exact. Can you imagine that? Can you imagine a few people at the same time making that query? That’s the kind of thing that makes machines grind to a halt. You’ll see that type of thing in some of the recent content management systems (CMS) that don’t use any caching. I recently tested one that could only support about 4 users at a time on modest hardware because of their extensive use of, in my opinion, many unnecessary queries.

With Smarty though, this query becomes a non-issue. I ran a script early in the morning that erased the cache for that page and requested it again which regenerated the cache page. You can do this with a simple call to wget or lynx. Problem solved. One query a day is all it took. And because the page showed top sellers for this week and this month there was no need to have it in real time. Even if it was in real time there is no reason to think a sale today would affect sales from the last 6 years in such a dramatic way that it would push an item up the list all the way from the bottom to the top. But even if you wanted the ‘this week’ top sellers list in real time, or a list for the best sellers today, you would just include a dynamic block in Smarty, and only that section would be generated on that page load, the rest of the real hard queries would be cached.

With that in mind, Smarty caching took a script that I wasn’t even able to execute due to the time involved, and turned it into a vital part of the store that helped increase visibility for the top selling items. It also took that 4 second query down to a page load time of 0.4 seconds.

Database connects

As mentioned above, the shared host used a remote database. That’s a great thing to do actually, but it can kill you on the connect. Of course the connect is what kills you on most databases as long as everything else is up to par.

But why connect to the database at all? If your content is cached you don’t need to, again saving that overhead and speeding up your display that much faster. If you do need to connect to a database to record some statistics or similar, you can do it after the page is sent to the browser by putting your database connect and your subsequent queries in the part of your script AFTER the call to $smarty->display() – meaning the page has already been sent to the browser.

Summary

Smarty has a multi-faceted approach to caching. Smarty first takes your template page and caches that as a PHP file. That means it doesn’t have to parse the template on every request. If you choose to use the other caching features, Smarty will then cache the actual output of your page, saving all of the overhead of dynamic files, but allowing you the ability to update the cache whenever you want.

Even if you don’t intend to use caching in any large way it is still a good idea to use Smarty to get acquainted with the idea of an intermediate cache. The Smarty cache will reduce the load on your hardware enabling you to enhance the end-user experience and decrease your dependence on bigger, more expensive hardware. It also allows you to grow your site(s) and traffic without having to worry about hitting the wall and taking the system down during a growth spurt, which can potentially harm revenue. In other words, a Smarty cache let’s you plan for the future. And as you’ve seen in my recent case, Smarty cache can help you get out of a bind when you are dependent on other developers and a timeline.

In short, there simply is no reason not to use a cache. Smarty allows you to implement only as much or as little caching as you like. As a professional PHP or web programmer you need to be efficient as well as effective, and part of being efficient means caching infrequently updated dynamic content.