how_to:repairing_corrupted_database

Repairing corrupted database

With PostgreSQL (as with most high-end databases) the database system will automatically recover after power failures etc, if it can. If it can't then there is no manual 'repair' function, because the data cannot totally be recovered (or it would automatically have been recovered).

Common causes for unrecoverable data errors are: incorrect disk caching, virus scanners or faulty memory modules on the database server.

If you do get an unrecoverable error, the best solution is to restore the latest database backup. However, if you do this, you will lose any data between the database backup and the time when you restore it, so it should be done as soon as possible to avoid excessive data loss.

If that is not possible, then in some cases, you may be able to remove the damaged parts of the database and recover from there while keeping most of the important data intact. This article explains some possible ways to do this. Note that none of these methods are guaranteed, and are done at your own risk.

We can help you with these under our chargeable service which costs £40/hr (or part of) for this type of problem, but there are no guarantees we will be able to recover anything.

Before you start you should:

  1. stop the VPOP3 service
  2. stop the VPOP3DB (or PostgreSQL-VPOP3) service
  3. take a full copy of the VPOP3\pgsql\data folder
  4. restart the VPOP3DB (or PostgreSQL-VPOP3) service

Do not restart the VPOP3 service yet.

Once you have repaired the database you should ideally perform a manual database backup & restore to ensure there are no other problems before restarting VPOP3.

Finding the problem table/index

Often the error message will say something like:

Invalid page header in block x in relation base/16385/512312

These numbers are 'random' so will be different from one installation to another. To determine which database table/index the relation refers to, follow these instructions

Go to a command prompt in the VPOP3 directory and run 'psql' (in version 5 only). The default password is 'vpop3pass'

Then type (or copy/paste)

select n.nspname AS schema, c.relname AS tablename, c.relkind as kind from pg_class c inner join pg_namespace n on (c.relnamespace=n.oid) where c.relfilenode=<filename>;

In the above example error message <filename> would be replaced by 512312

If 'kind' is 'i' then the damaged relation is an index, so a database reindex should fix the problem. If it is 'r' then it is a normal table, or if it is 't' then it is a 'toast' table (see below).

If the problem is in a normal table, then depending on the table name, you may be able to delete the table without losing critical data. Please contact support@pscs.co.uk with the table name and we will be able to tell you what is stored in that table, and how to delete it if that is an appropriate action.

unexpected chunk number

If you get an 'unexpected chunk number error for toast value..' in the messages.msgdata table then this means that one (or more) message records have been corrupted.

'toast' records in PostgreSQL are where it stores large data fields. This error means that the links between the main table data and the 'toast' values, or within the 'toast' value itself has been damaged.

A similar problem is an “Invalid page header in block x” error. The same procedure to resolve the problem should be followed in that case as well.

The only solution we know of is to delete the relevant record (message) totally.

Unfortunately, there is no easy way to find which record is damaged, so you need to scan the entire table.

Go to a command prompt in the VPOP3\pgsql\bin directory and run 'psql -U postgres -p 5433 vpop3' (in version 3 or 4 omit the '-p 5433'). The default password is 'pgsqlpass'

Finding the problem message

Copy/paste the following into the psql prompt:

DO $f$
declare
    curid INT := 0;
    vcontent TEXT;
    badid INT;
begin
DROP TABLE IF EXISTS badids;
CREATE TEMP TABLE badids (msgdataid BIGINT);
FOR badid IN SELECT msgdataid FROM messages.msgdata ORDER BY msgdataid LOOP
    curid = curid + 1;
    if curid % 1000 = 0 then
        raise notice '% rows inspected', curid;
    end if;
    begin
        SELECT msgdata
        INTO vcontent
        FROM messages.msgdata where msgdataid = badid;
        vcontent := md5(vcontent);
    exception
        when others then
            INSERT INTO badids (msgdataid) VALUES(badid);
            raise notice 'data for message % is corrupt', badid;
            continue;
    end;
end loop;
end;
$f$;

This will read all the messages from the database, and tell you which message(s) it encountered an error with

However, sometimes it won't find all problem messages, so it is a good idea to run it again after fixing any problems, until this script doesn't find any more errors.

This script stores the bad message IDs in a temporary table called 'badids', so you can look at the values using 'SELECT * FROM badids;' or use the table to help with other functions if it helps (and you know SQL)

Getting message summary info

Once you find the problem message, if you type:

SELECT subject, messagetime, fromaddr, tolist FROM messages.msgdata WHERE msgdataid = <problem message id>;

and

SELECT username, folder FROM messages.foldermessages INNER JOIN messages.folders USING(folderid) INNER JOIN users.users ON messages.folders.userid=users.users.usernumber WHERE msgdataid = <problem message id>;

this will tell you summary information about the message which may help know which message will been deleted

Delete the problem message

Then, type:

DELETE FROM messages.foldermessages WHERE msgdataid= <problem message id>;
DELETE FROM messages.msgdata WHERE msgdataid= <problem message id>;

This will delete the message from the database, so it should now work OK, but without the message you have just deleted.

In some cases you may need to disable database triggers before doing this. To do this, stop VPOP3 first, then, before doing the DELETE commands do:

ALTER TABLE messages.foldermessages DISABLE TRIGGER USER;

After doing the deletions, re-enable the triggers by doing:

ALTER TABLE messages.foldermessages ENABLE TRIGGER USER;

Failure to re-enable the triggers will cause big problems for VPOP3!

In rare cases the database files may be even more corrupted, so you need to run this command before deleting the records:

SET zero_damaged_pages=on;

This should be considered a 'last resort' option - http://www.postgresql.org/docs/9.1/static/runtime-config-developer.html#GUC-ZERO-DAMAGED-PAGES

Rebuild database

After you have done this, we strongly recommend doing a full backup/restore of the database as there will be other problems which will cause problems at a later date, such as the automatic clean-up processes (meaning the database size will increase uncontrollably).

Also See

how_to/repairing_corrupted_database.txt · Last modified: 2024/01/15 11:35 by paul