While working on SQL Server database, you might encounter various errors. In this post, we will discuss three known error codes – 823, 824, and 825, the causes behind these errors, and the solutions to fix them.
SQL Server is mostly written in C++ language. Like other programs, it uses Windows APIs to perform file I/O operations. When these APIs fail with an operating system error, SQL Server returns the error message 823.
The complete error message reads as follows:
The following information contained in this message:
- OS Error 1117 means that an error occurred while writing.
- The physical file with the problem is tpcc.MDF (and not the .ldf file).
- The offset of the file where the IO operation was carried out. Dividing this number by 8192, you will get the logical page number.
You can see more information on the OS Error 1117 by using the command NET HELPMSG 1117.
Knowing exactly which logical page is having problems is really important.
This error message usually shows a problem with the storage system, a hardware, or a driver. Usually, you get this error when there are inconsistencies in the file system or if the database file is damaged.
Let’s go deep!
In the case of a file read, SQL Server will have already retried the read request four times before it returns the error 823. If the retry operation succeeds, the query will not fail but error message 825 will be written into the ERRORLOG and Event Log.
You might get the error 824 when a logical consistency check fails after reading or writing a database page.
The following message is a real case I found. I faced a database (tpcc) with the corrupt transaction log. This was the message:
In the error message, Windows reported that the page is successfully read from the disk (so we have no error 823), however, SQL Server has discovered something wrong with the page read.
SQL Server, in fact, performs a series of logical consistency checks to ensure that certain key aspects of the data engaged in the data transfer were kept throughout the I/O operation. Logical consistency checks include checksum, Torn Page, Short Transfer, Bad Page ID, Stale Read, and Page Audit Failure.
This error indicates that a read operation had to be reissued at least once but eventually the read is successful.
From the error message, we can see that SQL Server tried to read from the disk twice and got an error. However, during the third time, it was read without problems. It is a big problem because it shows a severe problem with the disk hardware that might cause data loss and database corruption.
Checks to be Performed and Actions to be Taken
- Now that we have detailed information on errors 823, 824, and 825, let’s discuss what actions to take to resolve the issues.
- When we are faced with problems that generate operational blocks, a document with a list of actions to be performed is certainly useful.
- This is my personal checklist with the actions to take:
- Perform a check on the storage system
You need to make sure that the storage system is fine. It is too dangerous to have a database on a disk in bad state. A hardware technician can evaluate all the components of the disk storage system and also of controller and drivers. The technician can run tests and certify that the storage system is fine.
Check the Database
Run DBCC CHECKDB on all databases (both system database and user database) to verify the logical and physical consistency of the database.
Make sure you have a valid backup. This means that you must verify not only if the backups are present but also try to restore the database from backup.
Remember that Backups are Your Safety
For safety, in case of a server breakdown, backups must be copied to a server other than the one hosting the SQL Server. In case of fire, the backups should be stored in a building other than the one hosting the SQL Server. You can even copy them to a different location in case of major disasters, such as earthquake.
Move the Database to Another Disk
While waiting for the system checks on the storage system, move the database to another disk that possibly has some redundancy (RAID). Furthermore, a particularly good practice is to set an alert for this type of error.
You can do it by using this script:
The alternative solution to fix these errors is Stellar Repair for MS SQL. It is third-party SQL database repair software, which repairs corrupt MDF and NDF files. It maintains the database integrity without hampering the original data.
It is also recommended by Microsoft MVPs worldwide. It repairs all the database objects like tables, triggers, indexes, stored procedures, etc.
In this post, we have discussed three major errors – 823, 824, and 825. These errors are related to the stage in which data is read and written from the disk. The errors 823 and 825 are hardware related and can occur when you try to read data from the disk but the disk returns a read or write error.
Whenever a read or write API returns an error, the error 823 is generated. If after a maximum of four reads you can access the data, the error 825 is written to the log. This means that the disks are not very reliable.
The error 824 is generated when you read data from disk but reading data fails the consistency check. For example, a database has wrong values in the page header.
Finally, we have discussed the checks to be performed and the actions to be taken in case of these errors.