User Tools

Site Tools


how_to:repairing_corrupted_database

This is an old revision of the document!


Repairing corrupted database

With PostgreSQL (as with most high-end databases) the database system will automatically recover after power failures etc, if it can. If it can't then there is no manual 'repair' function, because the data cannot totally be recovered (or it would automatically have been recovered).

Common causes for unrecoverable data errors are: incorrect disk caching, virus scanners or faulty memory modules on the database server.

If you do get an unrecoverable error, the best solution is to restore the latest database backup. However, if you do this, you will lose any data between the database backup and the time when you restore it, so it should be done as soon as possible to avoid excessive data loss.

If that is not possible, then in some cases, you may be able to remove the damaged parts of the database and recover from there while keeping most of the important data intact. This article explains some possible ways to do this. Note that none of these methods are guaranteed, and are done at your own risk.

We can help you with these under our chargeable service, but there are no guarantees we will be able to recover anything.

Before you start you should:

  1. stop the VPOP3 service
  2. stop the VPOP3DB (or PostgreSQL-VPOP3) service
  3. take a full copy of the VPOP3\pgsql\data folder
  4. restart the VPOP3DB (or PostgreSQL-VPOP3) service

Do not restart the VPOP3 service yet.

Once you have repaired the database you should ideally perform a manual database backup & restore to ensure there are no other problems before restarting VPOP3.

unexpected chunk number

If you get an 'unexpected chunk number error for toast value..' in the messages.msgdata table then this means that one (or more) message records have been corrupted.

'toast' records in PostgreSQL are where it stores large data fields. This error means that the links between the main table data and the 'toast' values, or within the 'toast' value itself has been damaged.

The only solution we know of is to delete the relevant record (message) totally.

Unfortunately, there is no easy way to find which record is damaged, so you need to use a 'divide & conquer' approach.

Go to a command prompt in the VPOP3 directory and run 'psql' (in version 5 only). The default password is 'vpop3pass'

Then type:

SELECT MAX(msgdataid) FROM messages.msgdata;

This will tell you the biggest msgdataid value. Take a note of that. For this example, assume this is 12000.

Then, divide the biggest number by 2 (6000 in this sample) and type

SELECT SUM(LENGTH(msgdata)) FROM messages.msgdata WHERE msgdataid BETWEEN 1 AND 6000;

If that works OK, then you know the bad record is from 6001 to 12000, but if it gives an error, you know it is between 1 and 6000.

Then, divide the range appropriately down until you have isolated the damaged message. For example if we check

  • BETWEEN 1 and 6000 - error (so must be 1 to 6000)
  • BETWEEN 1 and 3000 - OK (so must be 3001 to 6000)
  • BETWEEN 3001 and 4500 - error (so must be 3001 to 4500)
  • BETWEEN 3001 and 3750 - OK (so must be 3751 to 4500)
  • BETWEEN 3751 and 4125 - OK (so must be 4126 to 4500)
  • BETWEEN 4126 and 4313 - error (so must be 4126 to 4313)

etc

Once you find the problem message, if you type:

SELECT subject, messagetime, fromaddr, tolist FROM messages.msgdata WHERE msgdataid = <problem message id>;

and

SELECT username, folder FROM messages.foldermessages INNER JOIN users.users ON messages.foldermessages.userid=users.users.username WHERE msgdataid= <problem message id>;

this will tell you summary information about the message which may help know which message will been deleted

Sometimes the above queries won't find the problem message, because they are just retrieving the lengths of messages, so if that metadata is still valid, the database server may not read the actual message content, and thus not encounter the message. In that case, it will take longer to find the problem as you will need to get the entire message contents, eg:

SELECT msgdata FROM messages.msgdata WHERE msgdataid BETWEEN 1 AND 6000;

Then, type:

DELETE FROM messages.foldermessages WHERE msgdataid= <problem message id>;
DELETE FROM messages.msgdata WHERE msgdataid= <problem message id>;

This will delete the message from the database, so it should now work OK, but without the message you have just deleted.

After you have done this, we strongly recommend doing a full backup/restore of the database in case there is any other damage to the database files.

Also See

how_to/repairing_corrupted_database.1370961785.txt.gz · Last modified: 2018/11/14 10:44 (external edit)