RECO: Distributed Database Recovery- Oracle Processes – Oracle Process and Certification Exam

RECO has a very focused job: it recovers transactions that are left in a prepared state because of a crash or loss of connection during a two-phase commit (2PC). A 2PC is a distributed protocol that allows for a modification that affects many disparate databases to be committed atomically. It attempts to close the window for distributed failure as much as possible before committing. In a 2PC between N databases, one of the databases—typically (but not always) the one the client logged into initially—will be the coordinator.

This one site will ask the other N-1 sites if they are ready to commit. In effect, this one site will go to the N-1 sites and ask them to be prepared to commit. Each of the N-1 sites reports back its prepared state as YES or NO. If any one of the sites votes NO, the entire transaction is rolled back. If all sites vote YES, then the site coordinator broadcasts a message to make the commit permanent on each of the N-1 sites.

Say a site votes YES and is prepared to commit, but before it gets the directive from the coordinator to actually commit, the network fails or some other error occurs, then the transaction becomes an in-doubt distributed transaction.

The 2PC tries to limit the window of time in which this can occur, but cannot remove it. If there is a failure right then and there, the transaction will become the responsibility of RECO. RECO will try to contact the coordinator of the transaction to discover its outcome.

Until it does that, the transaction will remain in its uncommitted state. When the transaction coordinator can be reached again, RECO will either commit the transaction or roll it back.

It should be noted that if the outage is to persist for an extended period of time, and you have some outstanding transactions, you can commit/roll them back manually.

You might want to do this since an in-doubt distributed transaction can cause writers to block readers—this is the one time this can happen in Oracle.

Your DBA could call the DBA ofthe other database and ask them to query the status of those in-doubt transactions. Your DBA can then commit or roll them back, relieving RECO of this task.

CKPT: Checkpoint Process

The checkpoint process doesn’t, as its name implies, do a checkpoint (checkpoints were discussed in Chapter 3 in the section on redo logs)—that’s mostly the job of DBWn. It simply assists with the checkpointing process by updating the file headers of the datafiles.

The CKPT is a mandatory process and is always started, so if you do a ps on UNIX/Linux, you’ll normally see it there (I say “normally” because as of Oracle 12c, it’s possible for the checkpoint process to run within an operating system thread and therefore won’t be displayed as a process).

The job of updating datafiles’ headers with checkpoint information used to belong to the LGWR; however, as the number of files increased along with the size of a database over time, this additional task for LGWR became too much of a burden. If LGWR had to update dozens, or hundreds, or even thousands, of files, there would be a good chance sessions waiting to commit these transactions would have to wait far too long. CKPT removes this responsibility from LGWR.

Leave a Reply Cancel reply