In this post let us know something
about the main concept of RAC – Cache Fusion.
Now before going into Cache Fusion –
first let’s first memorize once again how actually a normal database instance
behaves when there is a request for data block.
Let us suppose a user process is
requesting a data block, and since this process cannot directly read from the
disk, first the requested block must be read(Physical Read) into Buffer Cache
of SGA. Once it is read into buffer cache it will remains in the Buffer Cache
for further requests. Whenever there is a request for the same data block,
since it is already in the buffer, it can be directly read (Buffer Read) from
the buffer, thus avoiding another Physical Read. If a data block is found in
buffer cache it is called a ‘Cache Hit’ and if it is not found, then it is
called a ‘Cache Miss’.
In order to maintain data integrity,
when there are concurrent requests for the same data block, Oracle uses Locking
Mechanism and multi-version consistency control. A data block can reside in
various buffers with different versions, for example a dirtied block, where the
previous version of the data block will be maintained in UNDO and the copy of
the current version in REDO. Whenever a user request for the block that was already
in buffer and dirtied, the UNDO segment provides the required information to
construct the read-consistent (CR) image of the data blocks. So, Multi-version
data blocks help to achieve read consistency.
The read consistency model
guarantees that the data block seen by a statement is consistent with respect
to a single point in time and does not change during the statement execution.
Readers of data do not wait for other writer’s data or for other readers of the
same data. At the same time, writers do not wait for other readers for the same
data. Only writers wait for other writers if they attempt to write.
In a single instance the following
happens when reading a block
*
When a reader reads a recently modified block, it might find an
active
transaction in the block.
*
The reader will need to read the undo segment header to decide
whether
the transaction has been committed or not.
*
If the transaction is not committed, the process creates a
consistent
read (CR) version of the block in the buffer cache
using
the data in the block and the data stored in the undo segment.
*
If the undo segment shows the transaction is committed, the process
has
to revisit the block and clean out the block and generate the redo
for
the changes.
Now let us see how it goes in a RAC
environment –
In RAC, there are 2 or more
instances accessing same database files that are residing in shared storage
area. Each instance has its own SGA and background processes, which means each
instance has its own buffer cache (local to each instance). These buffer
cache’s act individually at instance level and fuse together at database level
to form a single entity (Global Cache) so as to share the data blocks between
them. This is what we called ‘Cache Fusion’. Cache Fusion uses a high-speed IPC
interconnect to provide cache-to-cache transfers of data blocks between
instances in a cluster. This data block shipping eliminates the disk I/O and
optimizes read/write concurrency.
Now the question is how the
integrity of the data is maintained in a RAC environment, if there are
concurrent requests for the same data block – Here too Oracle uses locking and
queuing mechanisms to coordinate lock resources, data and inter-instance data
requests.
Cache Fusion was implemented by a
controlling mechanism called Global Cache Service (GCS), which is responsible
for block transfers between instances. The Global Cache Service is implemented
by various background processes, such as
Global Cache Service Processes (LMSn)
Global Enqueue Service Daemon (LMD)
Global Enqueue Service Daemon (LMD)
[Before going into those processes,
let us see how oracle treats the data blocks and how it manages them –
Oracle treats the data blocks as
resources. Each of these resources can be held in different modes, which is
important mechanism to maintain data integrity. These modes are classified into
3 types depending on whether resource holder intends to modify the data or read
the data.
They are –
Null (N) mode —Null mode is usually held as a placeholder.
Shared (S) mode — In this mode, data block is not modified by another session, but will allow concurrent shared access.
Exclusive (X) mode — This level grants the holding process exclusive access. Other processes cannot write to the resource. It may have consistent read blocks.
Shared (S) mode — In this mode, data block is not modified by another session, but will allow concurrent shared access.
Exclusive (X) mode — This level grants the holding process exclusive access. Other processes cannot write to the resource. It may have consistent read blocks.
Furthermore, these resources act in
one of 2 roles – Local (L) and Global (G).
A resource (data block) is assigned
a local role, when a block is first read into the cache and no other instance
request for the same data block.
A resource is assigned a Global
role, when block is dirtied locally and transmitted to another instance.]
Now let us see what those daemon’s
do –
Global Cache Service Daemon (LMSn)
Upon a request from an Instance GCS
organizes the block shipping to other instances by retaining block copies in
memory. Each such copy is called a past image (PI), which in the event of a
node failure, Oracle can reconstruct the current version of a block by using a
saved PI. It is also possible to have more than 1 PI of the data block;
depending on how many times the block was requested in dirty stage.
Do not confuse read-consistent (CR)
image with past image (PI), they are not same as they appear. PI is not a read
consistent image of the data block, to make it so, you need to apply UNDO,
which in turn converts into a CR image.
Keep in mind that if you want to
read a data block, it must be in read consistent state. You are not allowed to
read the changes made by others.
Global Enqueue Service Daemon (LMD)
The global enqueue service (GES)
tracks the status of all Oracle enqueuing mechanisms. The GES performs
concurrency control on dictionary cache locks, library cache locks, and
transactions. It performs this operation for resources that are accessed by
more than one instance. The GES controls access to data files and control files
but not for the data blocks. GES processing includes the coordination for
enqueues other than the data blocks. The resources managed by the GES include
the following:
Transaction locks – It is acquired in the exclusive mode when a transaction
initiates its first row level change. The lock is held until the transaction is
committed or rolled back.
Library Cache locks – When a database object (such as a table, view, procedure, function, package, package body, trigger, index, cluster, or synonym) is referenced during parsing or compiling of a SQL, DML or DDL, PL/SQL, or Java statement, the process parsing or compiling the statement acquires the library cache lock in the correct mode.
Dictionary Cache Locks – Global enqueues are used in the cluster database mode. The data dictionary structure is the same for all Oracle instances in a cluster database, as it is for instances in a single-instance database. However, in real application clusters, Oracle synchronizes all the dictionary caches throughout the cluster. Real application clusters use latches to do this, just as in the case of a single-instance Oracle database.
Table locks – These are the GES locks that protect the entire table(s). A transaction acquires a table lock when a table is modified. A table lock can be held in any of several modes: null (N), row share (RS), row exclusive (RX), share lock (S), share row exclusive (SRX), or exclusive (X).
GCS (LMSn + LMD) keeps track of the
resources, location and their statuses (mode, role) and this information is
recorded in Global Resource Directory (GRD). Each instance maintains its own
GRD and manages a portion of the directory. Whenever a block is transferred out
of a local cache to another instance’s cache the GRD is updated. A GRD knows
where exactly a recent version of the data block is available.
To perform any operation on a data
block we need to know the current state of the particular data block. To know
its current state, it requires 3 things –
1.
What is its current role?(Local(L) or Global(G))
2.
What is its current mode?(Null (N) or Shared (S) or Exclusive (E))
3.
Whether the requesting block has any Past Images (PI)? (0 or 1)
But where can you get this
information from? Yes – you are right – from GRD.
The state of the data block is
represented in a 3 letter code – (mode,role,PI) – NL0, SL0, XL1 etc.,
Now let me show the different
scenarios of data block transfer –
1. Reading the data
block from the disk – In this scenario, initially a
data block is read (disk read) from the data file, since no copy of this data
block is currently available in any of the instances. Once the block is read
into the buffer cache, it is in the state of SL0. This indicates that the block
now is in shared mode with local role and doesn’t have any past images. The
resource information will be updated accordingly in GRD.
2. Reading the data
block from the cache – In this scenario, a data block is
currently available in one of the instances buffer cache, so there is no need
to read from the disk, thus avoiding a physical read. Let us say, Instance 2
request for a data block, in that regard it sends a request to GCS. GCS in turn
passes the request to the owning instance (instance 1). Upon receiving the
request, instance 1 forwards the data block to the requesting instance
(instance 2) keeping the data block in shared mode and also retains its Local
role. No past image is created on instance 1 as the data block was not dirtied
yet. Now the state of the data block in instance 2 is SL0 (similar to that of
reading from a disk).
3. Modifying the data
block – In this scenario, Instance 2
requests for a block to modify, and pass the request to GCS. GCS in turn pass
the request to the Instance 1 (owner of that data block). Instance 1 modified
this data block, but not committed yet. Upon receiving the request, instance 1
sends the data block to instance 2. Before sending, the resource is downgraded
to NULL mode and keeps a copy of current version of the block (PI). Now the
role becomes Global, since it is dirtied. It also informs Instance 2 that it
retained a PI copy and a NULL resource, which specifies that instance 2 can
held the block in exclusive mode (X) with a global role (G). Upon receipt of
the block, instance 2 informs GCS about the mode and role of the block (X, G).
4. Writing dirty
buffers to disk – In this scenario, Instance 1
wants to write the buffer to disk, so a request is send to GCS. GCS forwards
the request to instance 2(current holder of block). Upon receiving the request,
instance 2 writes the block to the disk and informs GCS. Instance 2 also
informs GCS that the resource role now become local because the instance has
completed write of the current block. Upon receiving the message GCS orders all
PI holders to discard their PI’s and they no longer need for recovery as the
current block is written and buffer is released.