User Tools

Site Tools


how_to:repairing_corrupted_database

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
how_to:repairing_corrupted_database [2014/07/17 11:25] – [unexpected chunk number] paulhow_to:repairing_corrupted_database [2024/01/15 11:35] (current) paul
Line 9: Line 9:
 If that is not possible, then in some cases, you may be able to remove the damaged parts of the database and recover from there while keeping most of the important data intact. This article explains some possible ways to do this. Note that none of these methods are guaranteed, and are done at your own risk. If that is not possible, then in some cases, you may be able to remove the damaged parts of the database and recover from there while keeping most of the important data intact. This article explains some possible ways to do this. Note that none of these methods are guaranteed, and are done at your own risk.
  
-We can help you with these under our chargeable service, but there are no guarantees we will be able to recover anything.+We can help you with these under our chargeable service which costs £40/hr (or part of) for this type of problem, but there are no guarantees we will be able to recover anything.
  
 Before you start you should: Before you start you should:
Line 20: Line 20:
  
 Once you have repaired the database you should ideally perform a manual database backup & restore to ensure there are no other problems before restarting VPOP3. Once you have repaired the database you should ideally perform a manual database backup & restore to ensure there are no other problems before restarting VPOP3.
 +
 +=====Finding the problem table/index=====
 +Often the error message will say something like:
 +
 +  Invalid page header in block x in relation base/16385/512312
 +  
 +These numbers are 'random' so will be different from one installation to another. To determine which database table/index the relation refers to, follow these instructions
 +
 +Go to a command prompt in the VPOP3 directory and run 'psql' (in version 5 only). The default password is 'vpop3pass'
 +
 +Then type (or copy/paste)
 +
 +  select n.nspname AS schema, c.relname AS tablename, c.relkind as kind from pg_class c inner join pg_namespace n on (c.relnamespace=n.oid) where c.relfilenode=<filename>;
 +  
 +In the above example error message <filename> would be replaced by 512312
 +
 +If 'kind' is 'i' then the damaged relation is an index, so a database reindex should fix the problem. If it is 'r' then it is a normal table, or if it is 't' then it is a 'toast' table (see below).
 +
 +If the problem is in a normal table, then depending on the table name, you may be able to delete the table without losing critical data. Please contact support@pscs.co.uk with the table name and we will be able to tell you what is stored in that table, and how to delete it if that is an appropriate action.
  
 =====unexpected chunk number===== =====unexpected chunk number=====
Line 31: Line 50:
 The only solution we know of is to delete the relevant record (message) totally. The only solution we know of is to delete the relevant record (message) totally.
  
-Unfortunately, there is no easy way to find which record is damaged, so you need to use a 'divide & conquer' approach+Unfortunately, there is no easy way to find which record is damaged, so you need to scan the entire table
  
-Go to a command prompt in the VPOP3 directory and run 'psql' (in version 5 only). The default password is 'vpop3pass'+Go to a command prompt in the VPOP3\pgsql\bin directory and run 'psql -U postgres -p 5433 vpop3' (in version 3 or 4 omit the '-p 5433'). The default password is 'pgsqlpass'
  
 ====Finding the problem message==== ====Finding the problem message====
-Type: 
  
-  SELECT MAX(msgdataid) FROM messages.msgdata; +Copy/paste the following into the psql prompt:
-   +
-This will tell you the biggest msgdataid value. Take a note of that. For this example, assume this is 12000.+
  
-Thendivide the biggest number by 2 (6000 in this sampleand type+<code> 
 +DO $f$ 
 +declare 
 +    curid INT := 0; 
 +    vcontent TEXT; 
 +    badid INT; 
 +begin 
 +DROP TABLE IF EXISTS badids; 
 +CREATE TEMP TABLE badids (msgdataid BIGINT); 
 +FOR badid IN SELECT msgdataid FROM messages.msgdata ORDER BY msgdataid LOOP 
 +    curid = curid + 1; 
 +    if curid % 1000 = 0 then 
 +        raise notice '% rows inspected'curid; 
 +    end if; 
 +    begin 
 +        SELECT msgdata 
 +        INTO vcontent 
 +        FROM messages.msgdata where msgdataid = badid; 
 +        vcontent := md5(vcontent)
 +    exception 
 +        when others then 
 +            INSERT INTO badids (msgdataid) VALUES(badid); 
 +            raise notice 'data for message % is corrupt', badid; 
 +            continue; 
 +    end; 
 +end loop; 
 +end; 
 +$f$; 
 +</code>
  
-  SELECT SUM(LENGTH(msgdata)) FROM messages.msgdata WHERE msgdataid BETWEEN 1 AND 6000; +This will read all the messages from the databaseand tell you which message(s) it encountered an error with
-   +
-If that works OKthen you know the bad record is from 6001 to 12000, but if it gives an error, you know it is between 1 and 6000. +
  
-Thendivide the range appropriately down until you have isolated the damaged messageFor example if we check +However, sometimes it won't find all problem messages, so it is a good idea to run it again after fixing any problems, until this script doesn't find any more errors.  
-  * BETWEEN 1 and 6000 - error (so must be 1 to 6000) + 
-  * BETWEEN 1 and 3000 - OK (so must be 3001 to 6000) +This script stores the bad message IDs in a temporary table called 'badids', so you can look at the values using 'SELECT FROM badids;' or use the table to help with other functions if it helps (and you know SQL)
-  BETWEEN 3001 and 4500 - error (so must be 3001 to 4500) +
-  * BETWEEN 3001 and 3750 - OK (so must be 3751 to 4500) +
-  * BETWEEN 3751 and 4125 - OK (so must be 4126 to 4500) +
-  * BETWEEN 4126 and 4313 - error (so must be 4126 to 4313) +
-etc+
  
 ===Getting message summary info=== ===Getting message summary info===
Line 62: Line 99:
   SELECT subject, messagetime, fromaddr, tolist FROM messages.msgdata WHERE msgdataid = <problem message id>;   SELECT subject, messagetime, fromaddr, tolist FROM messages.msgdata WHERE msgdataid = <problem message id>;
 and and
-  SELECT username, folder FROM messages.foldermessages INNER JOIN users.users ON messages.foldermessages.userid=users.users.username WHERE msgdataid= <problem message id>;+  SELECT username, folder FROM messages.foldermessages INNER JOIN messages.folders USING(folderid) INNER JOIN users.users ON messages.folders.userid=users.users.usernumber WHERE msgdataid = <problem message id>;
  
 this will tell you summary information about the message which may help know which message will been deleted this will tell you summary information about the message which may help know which message will been deleted
- 
-===If you can't find the problem message=== 
-Sometimes the above queries won't find the problem message, because they are just retrieving the lengths of messages, so if that metadata is still valid, the database server may not read the actual message content, and thus not encounter the message. In that case, it will take longer to find the problem as you will need to get the entire message contents, eg: 
- 
-  SELECT msgdata FROM messages.msgdata WHERE msgdataid BETWEEN 1 AND 6000; 
  
  
Line 80: Line 112:
 This will delete the message from the database, so it should now work OK, but without the message you have just deleted. This will delete the message from the database, so it should now work OK, but without the message you have just deleted.
  
 +In some cases you may need to disable database triggers before doing this. To do this, stop VPOP3 first, then, before doing the DELETE commands do:
 +
 +  ALTER TABLE messages.foldermessages DISABLE TRIGGER USER;
 +
 +After doing the deletions, re-enable the triggers by doing:
 +
 +  ALTER TABLE messages.foldermessages ENABLE TRIGGER USER;
 +  
 +Failure to re-enable the triggers will cause big problems for VPOP3!
 +  
 +In rare cases the database files may be even more corrupted, so you need to run this command before deleting the records:
 +
 +  SET zero_damaged_pages=on;
 +  
 +This should be considered a 'last resort' option - http://www.postgresql.org/docs/9.1/static/runtime-config-developer.html#GUC-ZERO-DAMAGED-PAGES
 ====Rebuild database==== ====Rebuild database====
-After you have done this, we strongly recommend doing a full [[backup_vpop3#manual_database_backup|backup]]/[[restore_a_backup_of_vpop3|restore]] of the database in case there is any other damage to the database files.+After you have done this, we strongly recommend doing a full [[backup_vpop3#manual_database_backup|backup]]/[[restore_a_backup_of_vpop3|restore]] of the database as there will be  other problems which will cause problems at a later date, such as the automatic clean-up processes (meaning the database size will increase uncontrollably).
 ====Also See==== ====Also See====
 [[PostgreSQL Server won't start due to damaged log files]] [[PostgreSQL Server won't start due to damaged log files]]
how_to/repairing_corrupted_database.1405592730.txt.gz · Last modified: 2018/11/14 10:44 (external edit)