Tuesday, 9 July 2013

RAC QA

Oracle RAC Interview Questions (10g)

What are Oracle Clusterware processes for 10g on Unix and Linux
Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.
Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource’s configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user
Event manager daemon (evmd) —A background process that publishes events that crs creates.
Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.
RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.
What are Oracle database background processes specific to RAC
•LMS—Global Cache Service Process
•LMD—Global Enqueue Service Daemon
•LMON—Global Enqueue Service Monitor
•LCK0—Instance Enqueue Process
To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.
________________________________________
What are Oracle Clusterware Components
Voting Disk —
Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.
Central reference of all nodes and keeps the heart beat information between nodes in terms of votes
Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster
Oracle Cluster Interconnect — In other words also called as private interconnect, with high speed gigabit channel or infiniband. which ishigh speed , low latency network connection attached to private switches rather the public interfaces. This will be used by cluster to provide node membership and also inter cluster communication (aka global cache services , transfer of blocks from one node to another)
________________________________________
How do you backup the OCR
There is an automatic backup mechanism for OCR. The default location is :
$ORA_CRS_HOMEcdata”clustername”
To display Backups: 
#ocrconfig -showbackup
To restore a backup :
#ocrconfig –restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.

With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig –manualbackup

How do you backup voting disk & do you need to take a backup of voting Disk.
#dd if=voting_disk_name of=backup_file_name
Voting disk backup is not required since this is non-persistent data and just records about the node (cluster membership) availability, when you start the cluster it is empty.
________________________________________
How do I identify the voting disk location 
#crsctl query css votedisk
________________________________________
How do I identify the OCR file location 
check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck
________________________________________
How to identify which version of Clusterware is running & active.
#crsctl query  –activeversion
________________________________________
Is ssh required for normal Oracle RAC operation ?
“ssh” are not required for normal Oracle RAC operation. However “ssh” should be enabled for Oracle RAC and patchset installation.
________________________________________
What is the purpose of Private Interconnect ?
Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster.
________________________________________
Why do we have a Virtual IP (VIP) in Oracle RAC?
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don’t really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.
________________________________________
What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report? 
This is most likely due to a fault in interconnect network.
Check netstat -s
if you see “fragments dropped” or “packet reassemblies failed” , Work with your system administrator find the fault with network.
________________________________________
How many nodes are supported in a RAC Database?
10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.
________________________________________
How to debug srvctl?
Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.
________________________________________
what is the purpose of the ONS daemon?
The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.
This in order to facilitate:
a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.
________________________________________
what is the ora.rac1.GSD in crs_stat?
It is called Global Service Daemon, which is there for backward compatibility, in other words in Oracle 9i, the RAC is running with Veritas cluster and the GSD will be synchronizing the instance details to Cluster. You can add the 9i instances to Oracle 10g or 11g RAC Clusters
________________________________________
What is GRD?
Global Resource directory , a common memory structures across SGA’s, in other words this is the combination of GCS/GES memory structures (infact synchronizing all the times through  cluster interconnect messages). All the resources in the cluster group form a central repository called GRD. which is integrated and distributed across the nodes  memory structures. Each instance masters some of the resources (buffer) based on their weightage and accessibility) and together all formed called GRD.  Basically a combination of GES and GCS.
________________________________________
What is Dynamic Remastering?
Every instance (node) owns the block(buffers) called master node for that blocks when accessed first time. If the other node requires the same, it has to send a request to the owning node to get the blocks (CR/CUR modes). If the requests are more than 50 in an hour the mastership will be transferred to the other node.
________________________________________
What is Node Mastering?
The node which starts first in the cluster acts a master node, If this crashes the other nodes who detects first will be the master node for that cluster.
________________________________________
What is cache fusion, Which background process manages it?
Other words called as Global cache services, Cache fusion is to manage the buffers with in local or the global and transfers from instance to another , basically LMS process take cares of the transfers, upon  receive the message (BST/AST) it manages the lock escalation and descalation with help of Global Enqueue services (LMD/LMON/LCK )processes, process the buffer and made a copy (if required) and then transfer that buffer to requested instance via cluster interconnect.
________________________________________
What is cache coherency which background process manages it?
Cache coherency means that the contents of the caches in different nodes are in well defined state with respect to each other, is a technique of keeping multiple copies of buffers consistent between different Oracle instances. LMS process manages this in terms of local/global lock levels or with Message queues (AST/BST)
How does instance recovery works in RAC databases?


________________________________________
Can you change Private interconnect IP address or Ethernet interface after cluster installation?
Yes, we can, before to that we should stop the clusterware and use oifcfg tool to delete the old and set the new interfaces, for ex:
#oifcfg –getif
#oifcfg –delif
#oifcfg –setif
________________________________________
Can you change the Virutal IP after you install clusterware installation?
Yes, its possible, but you have to run vipca or using srvctl, please see here for complete steps
________________________________________
What is the Load Balancing Advisory?
To assist in the balancing of application workload across designated resources, Oracle Database 10g Release 2 provides the Load Balancing Advisory. This Advisory monitors the current workload activity across the cluster and for each instance where a service is active; it provides a percentage value of how much of the total workload should be sent to this instance as well as service quality flag.
________________________________________
What is hangcheck timer used for ? 
The hangcheck timer checks regularly the health of the system. If the system hangs or stop the node will be restarted automatically.
There are 2 key parameters for this module:
-> hangcheck-tick: this parameter defines the period of time between checks of system health. The default value is 60 seconds; Oracle recommends setting it to 30seconds.
-> hangcheck-margin: this defines the maximum hang delay that should be tolerated before hangcheck-timer resets the RAC node.
________________________________________
Is the hangcheck timer still needed with Oracle RAC 10g?
Yes

________________________________________
What files can I put on Linux OCFS2?
For optimal performance, you should only put the following files on Linux OCFS2:
- Datafiles
- Control Files
- Redo Logs
- Archive Logs
- Shared Configuration File (OCR)
- Voting File
- SPFILE
________________________________________
What are various cluster file systems
OCFS2, Veritas Cluster Filesystems, NFS, ASM, etc.
________________________________________
Does ASM cannot be used as Cluster filesystem.
Well yes, ASM is storage management provides storage mirroring,striping along with cluster filesystem capabilities, it also acts as volume manager eliminating the need of disk cache I/O layer from OS.
________________________________________
Can I change the name of my cluster after I have created it when I am using Oracle Clusterware?
No, you must properly uninstall Oracle Clusterware and then re-install.
________________________________________
What command would you use to check the availability of the RAC system?
crs_stat -t -v (-t -v are optional)
________________________________________
Can you have many database versions in the same RAC?
Yes, but Clusterware version must be greater than the greater database version.
________________________________________
How do I identify the OCR file location
check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck
________________________________________
How do I identify the voting disk location
#crsctl query css votedisk
– this works only when the clusterware is up in 10g, where in CRS start up is not required

________________________________________
What is TAF?  How does it works.
Transparent application failover is the mechansim to provide high availability of the database services to the connections happens via TNS along with RAC specific scripts called racgmon  etc.
For this we
o Configure TNS entry for that service in both nodes
o Update the remote listener parameter with the service name
o Add service name to high availability stack i.e to CRS using srvctl
srvctl add service –d dbname –servicename –p <preferrednode> -r<availablenode>
________________________________________
What is FAN? How does it works
Fast application notification is different from the TAF, which commonly used to trap the failures of the services using ONS (oracle notification services) for example JDBC connection does not failover on TAF events, where ONS can be used here.

What are RAC based services? What are difference between normal database service and RAC services?
•Is a means of grouping sessions that are doing the same kind of work
•Provides single-system image instead of multiple instances image
•Is a part of the regular administration tasks that provide dynamic service-to-instance allocation
•Is the base for high availability of connections
•Provides a new performance-tuning dimension
•Normal services not maintained in data dictionaries where the Rac services maintained in data dictionary.
________________________________________
What happens if one of the node is not able to access the voting disk?
The master node OCSSD in the instance verifies the votes in the voting disk periodically and ensure quorum is matched, if quorum is not matched then it posts the failure nodes OCSSD to evict the cluster.
Or Node OCSSD recognises it and evicts the node
________________________________________
What happens if all of the nodes not able to access the voting disk?
This can be lead to split brain syndrome, each node acts as master node, with the disktimeout setting clusterware waits until that period can be delayed using disktimeout setting to reasonable value using crsctl. all nodes reboot.
________________________________________
What happens if one of the node is not able to communicate via private interconnect?
Node OCSSD recognises it and evicts the node
________________________________________
What happens if all of the nodes not able to communicate via private interconnect?
This can be lead to split brain syndrome, each node acts as master node, with the csstimeout setting clusterware waits until that period , can be delayed using csstimeout setting to reasonable value using crsctl. all nodes reboot.
________________________________________
What is split brain syndrome?
Each node acts as master since there is communication or common storage access break down.
________________________________________
Which background daemons initiates node eviction?
ocssd
________________________________________
Which background daemon starts clusterware or the resources?
crsd
________________________________________
Can you tell me under which user crs,cssd,oprocd, evmd, start and what is anything fails.
Process Functionality Failure of the Process Run as
CRSD Resource monitoring, failover, node recovery Restarts automatically, does not cause node restart root
OCSSD Basic node membership, Group Services, and Basic locking Node restart, evicts node oracle
EVMD Spawns child process event logger and generate callouts Automaitcally restarted does not cause reboot oracle
OPROCD Provides basic cluster integrity services, I/O fencing to disk Node restart root
Work Load Balancing & Failover questions:-
________________________________________
What are my options for load balancing with Oracle RAC? Why do I get an uneven number of connections on my instances?
All the types of load balancing available currently (9i-10g) occur at connect time.
This means that it is very important how one balances connections and what these connections do on a long term basis.
Since establishing connections can be very expensive for your application, it is good programming practice to connect once and stay connected. This means one needs to be careful as to what option one uses. Oracle Net Services provides load balancing or you can use external methods such as hardware based or clusterware solutions.
The following options exist prior to Oracle RAC 10g Release 2 (for 10g Release 2 see Load Balancing Advisory):
Random 
Either client side load balancing or hardware based methods will randomize the connections to the instances.
On the negative side this method is unaware of load on the connections or even if they are up meaning they might cause waits on TCP/IP timeouts.
Load Based
Server side load balancing (by the listener) redirects connections by default depending on the RunQ length of each of the instances. This is great for short lived connections. Terrible for persistent connections or login storms. Do not use this method for connections from connection pools or applicaton servers
Session Based
Server side load balancing can also be used to balance the number of connections to each instance. Session count balancing is method used when you set a listener parameter, prefer_least_loaded_node_listener-name=off. Note listener name is the actual name of the listener which is different on each node in your cluster and by default is listener_nodename.
Session based load balancing takes into account the number of sessions connected to each node and then distributes the connections to balance the number of sessions across the different nodes.
________________________________________
How can a customer mask the change in their clustered database configuration from their client or application? (I.E. So I do not have to change the connection string when I add a node to the Oracle RAC database)
The combination of Server Side load balancing and Services allows you to easily mask cluster database configuration changes. As long as all instances register with all listeners (use the LOCAL_LISTENER and REMOTE_LISTENER parameters), server side load balancing will allow clients to connect to the service on currently available instances at connect time.
The load balancing advisory (setting a goal on the service) will give advice as to how many connections to send to each instance currently providing a service. When a service is enabled on an instance, as long as the instance registers with the listeners, the clients can start getting connections to the service and the load balancing advisory will include that instance is its advice.
With Oracle RAC 11g Release 2, the Single Client Access Name (SCAN) provides a single name to be put in the client connection string (as the address). Clients using SCAN never have to change even if the cluster configuration changes such as adding nodes.
________________________________________
What is the Load Balancing Advisory?
To assist in the balancing of application workload across designated resources, Oracle Database 10g Release 2 provides the Load Balancing Advisory. This Advisory monitors the current workload activity across the cluster and for each instance where a service is active; it provides a percentage value of how much of the total workload should be sent to this instance as well as service quality flag. The feedback is provided as an entry in the Automatic Workload Repository and a FAN event is published. The easiest way for an application to take advantage of the load balancing advisory, is to enable Runtime Connection Load Balancing with an integrated client.
________________________________________
How do I enable the load balancing advisory?
The load balancing advisory requires the use of services and Oracle Net connection load balancing.
To enable it, on the server: set a goal (service_time or throughput, and set CLB_GOAL=SHORT ) on your service.
For client, you must be using the connection pool.
For JDBC, enable the datasource parameter FastConnectionFailoverEnabled.
For ODP.NET enable the datasource parameter Load Balancing=true.
________________________________________
Why do we have a Virtual IP (VIP) in Oracle RAC 10g or 11g? Why does it just return a dead connection when its primary node fails?
The goal is application availability.
When a node fails, the VIP associated with it is automatically failed over to some other node. When this occurs, the following things happen.
(1) VIP detects public network failure which generates a FAN event.
(2) the new node re-arps the world indicating a new MAC address for the IP.
(3) connected clients subscribing to FAN immediately receive ORA-3113 error or equivalent. Those not subscribing to FAN will eventually time out.
(4) New connection requests rapidly traverse the tnsnames.ora address list skipping over the dead nodes, instead of having to wait on TCP-IP timeouts
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error.
As a result, you don’t really have a good HA solution without using VIPs and FAN. The easiest way to use FAN is to use an integrated client with Fast Connection Failover (FCF) such as JDBC, OCI, or ODP.NET.
________________________________________
What are my options for setting the Load Balancing Advisory GOAL on a Service?
The load balancing advisory is enabled by setting the GOAL on your service either through PL/SQL DBMS_SERVICE package or EM DBControl Clustered Database Services page. There are 3 options for GOAL:
None – Default setting, turn off advisory
THROUGHPUT – Work requests are directed based on throughput. This should be used when the work in a service completes at homogenous rates. An example is a trading system where work requests are similar lengths.
SERVICE_TIME – Work requests are directed based on response time. This should be used when the work in a service completes at various rates. An example is as internet shopping system where work requests are various lengths
Note: If using GOAL, you should set CLB_GOAL=SHORT

Model - 2
What is RAC?
RAC stands for Real Application cluster. It is a clustering solution from Oracle Corporation that ensures high availability of databases by providing instance failover, media failover features.

What is RAC and how is it different from non RAC databases?  
RAC stands for Real Application Cluster, you have n number of instances running in their own separate nodes and based on the shared storage. Cluster is the key component and is a collection of servers operations as one unit. RAC is the best solution for high performance and high availably. Non RAC databases has single point of failure in case of hardware failure or server crash.
Give the usage of srvctl :
srvctl start instance -d db_name -i "inst_name_list" [-o start_options]
srvctl stop instance -d name -i "inst_name_list" [-o stop_options]
srvctl stop instance -d orcl -i "orcl3,orcl4" -o immediate
srvctl start database -d name [-o start_options]
srvctl stop database -d name [-o stop_options]
srvctl start database -d orcl -o mount
Mention the Oracle RAC software components :
Oracle RAC is composed of two or more database instances. They are composed of Memory structures and background processes same as the single instance database.Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion.Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor

What is GRD?
GRD stands for Global Resource Directory. The GES and GCS maintains records of the statuses of each datafile and each cahed block using global resource directory.This process is referred to as cache fusion and helps in data integrity.
What are the different network components are in 10g RAC?
public, private, and vip components
Private interfaces is for intra node communication. VIP is all about availability of application. When a node fails then the VIP component fail over to some other node, this is the reason that all applications should based on vip components means tns entries should have vip entry in the host list

Give Details on ACMS:
ACMS stands for Atomic Controlfile Memory Service.In an Oracle RAC environment ACMS is an agent that ensures a distributed SGA memory update(ie)SGA updates are globally committed on success or globally aborted in event of a failure.
What is Cache Fusion?
Cache fusion is the mechanism to transfer the data block from memory to memory of one node to the other.If two nodes require the same block for query or update, the block must be transfered from the cache of one node to the other. RAC system must equipped with low-latency and high speed inter-connect to make it happen.
Give Details on Cache Fusion:
Oracle RAC is composed of two or more instances. When a block of data is read from datafile by an instance within the cluster and another instance is in need of the same block,it is easy to get the block image from the insatnce which has the block in its SGA rather than reading from the disk. To enable inter instance communication Oracle RAC makes use of interconnects. The Global Enqueue Service(GES) monitors and Instance enqueue process manages the cahce fusion.
Cache Fusion is essentially a memory-to-memory transfer of data between the nodes in the RAC environment. Before Cache Fusion, a node was required to write some of the data to disk before it could be transferred to the next node in the cluster. Cache Fusion does a straight memory-to-memory transfer. In addition, each node's SGA has a map of what data is contained in the other node's data caches.
The performance improvement is phenomenal. Oracle leverages the vendor's high speed interconnects between the nodes to achieve the cache-to-cache data transfers. Before Cache Fusion, when you added a node to the cluster to increase performance of the application, it didn't always provide you with the performance improvement that you hoped for. With Cache Fusion, you can easily cost justify the addition of another node into a RAC cluster to increase the performance of the application running on it. Oracle sales pitches describe it as 'near linear horizontal scalability'.
What are the major RAC wait events?
In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs.The most common wait events related to this are gc cr request and gc buffer busy

GC CR request :the time it takes to retrieve the data from the remote cache

Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)

GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block.

Give details on GTX0-j :
The process provides transparent support for XA global transactions in a RAC environment.The database autotunes the number of these processes based on the workload of XA global transactions.

Give details on LMON:
This process monitors global enques and resources across the cluster and performs global enqueue recovery operations.This is called as Global Enqueue Service Monitor.

Give details on LMD:
This process is called as global enqueue service daemon. This process manages incoming remote resource requests within each instance.

Give details on LMS:
This process is called as Global Cache service process.This process maintains statuses of datafiles and each cahed block by recording information in a Global Resource Dectory(GRD).This process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances.This processing is a part of cache fusion feature.

Give details on LCK0:
This process is called as Instance enqueue process.This process manages non-cache fusion resource requests such as libry and row cache requests.

Give details on RMSn:
This process is called as Oracle RAC management process.These pocesses perform managability tasks for Oracle RAC.Tasks include creation of resources related Oracle RAC when new instances are added to the cluster.

Give details on RSMN:
This process is called as Remote Slave Monitor.This process manages background slave process creation andd communication on remote instances. This is a background slave process.This process performs tasks on behalf of a co-ordinating process running in another instance.

What components in RAC must reside in shared storage?
All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shred storage.

What is the significance of using cluster-aware shared storage in an Oracle RAC environment?
All instances of an Oracle RAC can access all the datafiles,control files, SPFILE's, redolog files when these files are hosted out of cluster-aware shared storage which are group of shared disks.

Give few examples for solutions that support cluster storage:
ASM(automatic storage management),raw disk devices,network file system(NFS), OCFS2 and OCFS(Oracle Cluster Fie systems).

What is an interconnect network?
An interconnect network is a private network that connects all of the servers in a cluster. The interconnect network uses a switch/multiple switches that only the nodes in the cluster can access.

How can we configure the cluster interconnect?
Configure User Datagram Protocol(UDP) on Gigabit ethernet for cluster interconnect.On unix and linux systems we use UDP and RDS(Reliable data socket) protocols to be used by Oracle Clusterware.Windows clusters use the TCP protocol.

Can we use crossover cables with Oracle Clusterware interconnects?
No, crossover cables are not supported with Oracle Clusterware intercnects.

What is the use of cluster interconnect?
Cluster interconnect is used by the Cache fusion for inter instance communication.

How do users connect to database in an Oracle RAC environment?
Users can access a RAC database using a client/server configuration or through one or more middle tiers ,with or without connection pooling.Users can use oracle services feature to connect to database.

What is the use of a service in Oracle RAC environment?
Applications should use the services feature to connect to the Oracle database.Services enable us to define rules and characteristics to control how users and applications connect to database instances.

What are the characteristics controlled by Oracle services feature?
The charateristics include a unique name, workload balancing and failover options,and high availability characteristics.
What enables the load balancing of applications in RAC?
Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database.

What is a virtual IP address or VIP?
A virtl IP address or VIP is an alternate IP address that the client connectins use instead of the standard public IP address. To configureVIP address, we need to reserve a spare IP address for each node, and the IP addresses must use the same subnet as the public network.

What is the use of VIP?
If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections.

Give situations under which VIP address failover happens:
VIP addresses failover happens when the node on which the VIP address runs fails, all interfaces for the VIP address fails,all interfaces for the VIP address are disconnected from the network.

What is the significance of VIP address failover?
When a VIP address failover happens, Clients that attempt to connect to the VIP address receive a rapid connection refused error .They don't have to wait for TCP connection timeout messages.

What are the administrative tools used for Oracle RAC environments?
Oracle RAC cluster can be administered as a single image using OEM(Enterprise Manager),SQL*PLUS,Servercontrol(SRVCTL),clusterverificationutility(cvu),DBCA,NETCA

How do we verify that RAC instances are running?
Issue the following query from any one node connecting through SQL*PLUS.
$connect sys/sys as sysdba
SQL>select * from V$ACTIVE_INSTANCES;
The query gives the instance number under INST_NUMBER column,host_:instancename under INST_NAME column.

What is FAN?
Fast application Notification as it abbreviates to FAN relates to the events related to instances,services and nodes.This is a notification mechanism that Oracle RAc uses to notify other processes about the configuration and service level information that includes service status changes such as,UP or DOWN events.Applications can respond to FAN events and take immediate action.

Where can we apply FAN UP and DOWN events?
FAN UP and FAN DOWN events can be applied to instances,services and nodes.
State the use of FAN events in case of a cluster configuration change?
During times of cluster configuration changes,Oracle RAC high availability framework publishes a FAN event immediately when a state change occurs in the cluster.So applications can receive FAN events and react immediately.This prevents applications from polling database and detecting a problem after such a state change.

Why should we have seperate homes for ASm instance?
It is a good practice to have ASM home seperate from the database hom(ORACLE_HOME).This helps in upgrading and patching ASM and the Oracle database software independent of each other.Also,we can deinstall the Oracle database software independent of the ASM instance.

What is the advantage of using ASM?
Having ASM is the Oracle recommended storage option for RAC databases as the ASM maximizes performance by managing the storage configuration across the disks.ASM does this by distributing the database file across all of the available storage within our cluster database environment.

What is rolling upgrade?
It is a new ASM feature from Database 11g.ASM instances in Oracle database 11g release(from 11.1) can be upgraded or patched using rolling upgrade feature. This enables us to patch or upgrade ASM nodes in a clustered environment without affecting database availability.During a rolling upgrade we can maintain a functional cluster while one or more of the nodes in the cluster are running in different software versions.

Can rolling upgrade be used to upgrade from 10g to 11g database?
No,it can be used only for Oracle database 11g releases(from 11.1).

State the initialization parameters that must have same value for every instance in an Oracle RAC database:
Some initialization parameters are critical at the database creation time and must have same values.Their value must be specified in SPFILE or PFILE for every instance.The list of parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_PASSWORD_FILE
UNDO_MANAGEMENT
What is ORA-00603: ORACLE server session terminated by fatal error or ORA-29702: error occurred in Cluster Group Service operation?
RAC node name was listed in the loopback address...

Can the DML_LOCKS and RESULT_CACHE_MAX_SIZE be identical on all instances?
These parameters can be identical on all instances only if these parameter values are set to zero.
What two parameters must be set at the time of starting up an ASM instance in a RAC environment?The parameters CLUSTER_DATABASE and INSTANCE_TYPE must be set.

Mention the components of Oracle clusterware:
Oracle clusterware is made up of components like voting disk and Oracle Cluster Registry(OCR).
What is a CRS resource?
Oracle clusterware is used to manage high-availability operations in a cluster.Anything that Oracle Clusterware manages is known as a CRS resource.Some examples of CRS resources are database,an instance,a service,a listener,a VIP address,an application process etc.

What is the use of OCR?
Oracle clusterware manages CRS resources based on the configuration information of CRS resources stored in OCR(Oracle Cluster Registry).

How does a Oracle Clusterware manage CRS resources?
Oracle clusterware manages CRS resources based on the configuration information of CRS resources stored in OCR(Oracle Cluster Registry).

Name some Oracle clusterware tools and their uses?
OIFCFG - allocating and deallocating network interfaces
OCRCONFIG - Command-line tool for managing Oracle Cluster Registry
OCRDUMP - Identify the interconnect being used
CVU - Cluster verification utility to get status of CRS resources

What are the modes of deleting instances from ORacle Real Application cluster Databases?
We can delete instances using silent mode or interactive mode using DBCA(Database Configuration Assistant).

How do we remove ASM from a Oracle RAC environment?
We need to stop and delete the instance in the node first in interactive or silent mode.After that asm can be removed using srvctl tool as follows:
srvctl stop asm -n node_name
srvctl remove asm -n node_name
We can verify if ASM has been removed by issuing the following command:
srvctl config asm -n node_name

How do we verify that an instance has been removed from OCR after deleting an instance?
Issue the following srvctl command:
srvctl config database -d database_name
cd CRS_HOME/bin
./crs_stat

How do we verify an existing current backup of OCR?
We can verify the current backup of OCR using the following command : ocrconfig -showbackup
What are the performance views in an Oracle RAC environment?
We have v$ views that are instance specific. In addition we have GV$ views called as global views that has an INST_ID column of numeric data type.GV$ views obtain information from individual V$ views.
What are the types of connection load-balancing?
There are two types of connection load-balancing:server-side load balancing and client-side load balancing.

What is the difference between server-side and client-side connection load balancing?
Client-side balancing happens at client side where load balancing is done using listener.In case of server-side load balancing listener uses a load-balancing advisory to redirect connections to the instance providing best service.

What are the three greatest benefits that RAC provides??
The three main benefits are availability, scalability, and the ability to use low cost commodity hardware. RAC allows an application to scale vertically, by adding CPU, disk and memory resources to an individual server. But RAC also provides horizontal scalability, which is achieved by adding new nodes into the cluster. RAC also allows an organization to bring these resources online as they are needed. This can save a small or midsize organization a lot of money in the early stages of a project.
In a RAC environment, if a node in the cluster fails, the application continues to run on the surviving nodes contained in the cluster. If your application is configured correctly, most users won't even know that the node they were running on became unavailable.

What are the major RAC wait events?
In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing
differs.The most common wait events related to this are gc cr request and gc buffer busy

GC CR request: the time it takes to retrieve the data from the remote cache

Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks
requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block.

What are the different network components in Oracle 10g RAC?
We have public, private, and VIP components. Private interfaces is for intra node communication. VIP is all about availability of application. When a node fails then the VIP component will fail over to some other node, this is the reason that all applications should be based on VIP components.  This means that tns entries should have VIP entry in the host list.
Model – 3
Q What is SCAN?

Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.

________________________________________
Q what is dynamic remastering ? When will the dynamic remastering happens?
dynamic remastering is ability to move the ownership of resource from one instance to another instance in RAC. dynamic resource remastering is used to implement for resource affinity for increased performance. resource affinity optimized the system in situation where update transactions are being executed in one instance. when activity shift to another instance the resource affinity correspondingly move to another instance. If activity is not localized then resource ownership is hashed to the instance.

In 10g dynamic remastering happens in file+object level.the process of remastering is very stringent. For one instance should touch more than 50 times than the other instance in particular period(say 10 mints). this touch ratio and time can be tuned by gc_affinity_limit and _gc_affinity_time parameter.

________________________________________
Q why we required to maintain odd number of voting disks?
Odd number of disk are to avoid split brain, When Nodes in cluster can't talk to each other they run to lock the Voting disk and whoever lock the more disk will survive, if disk number are even there are chances that node might lock 50% of disk (2 out of 4) then how to decide which node to evict. 
whereas when number is odd, one will be higher than other and each for cluster to evict the node with less number

________________________________________
Q How you check the health of Your RAC Database?
 'crsctl' command from root or oracle user can be used to check the clusterware health But for starting or stopping we have to use root user or any privilege user.

[oracle@TEST_NODE1 ~]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy

________________________________________
Q How you check the services in RAC Node?
 We can check the service or start the services with 'srvctl' command.load balanced/TAF service named RAC online.

[oracle@TEST_NODE1 ~]$ srvctl start service -d orcl -s RAC
[oracle@TEST_NODE1 ~]$ crsstat

________________________________________
Q If there is some issue with virtual IP how will you troubleshoot it?How will you change virtual ip?
 To change the VIP (virtual IP) on a RAC node, use the command

[oracle@testnode oracle]$ srvctl modify nodeapps -A new_address


________________________________________
Q How you will backup your RAC Database?
 Backup strategy of RAC Database:
An RAC Database consists of
1)OCR
2)Voting disk &
3)Database files, controlfiles, redolog files & Archive log files

________________________________________
Q Do you have any idea of load balancing in application?How load balancing is done?
http://practicalappsdba.wordpress.com/category/for-master-apps-dbas/

________________________________________
Q What is RAC?
RAC stands for Real Application cluster. It is a clustering solution from Oracle Corporation that ensures high availability of databases by providing instance failover, media failover features.

________________________________________
Q What is RAC and how is it different from non RAC databases?
RAC stands for Real Application Cluster, you have n number of instances running in their own separate nodes and based on the shared storage. Cluster is the key component and is a collection of servers operations as one unit. RAC is the best solution for high performance and high availably. Non RAC databases has single point of failure in case of hardware failure or server crash.

________________________________________
Q Give the usage of srvctl ?
srvctl start instance -d db_name -i "inst_name_list" [-o start_options]
srvctl stop instance -d name -i "inst_name_list" [-o stop_options]
srvctl stop instance -d orcl -i "orcl3,orcl4" -o immediate
srvctl start database -d name [-o start_options]
srvctl stop database -d name [-o stop_options]
srvctl start database -d orcl -o mount

________________________________________
Q Mention the Oracle RAC software components ?
Oracle RAC is composed of two or more database instances. They are composed of Memory structures and background processes same as the single instance database.Oracle RAC instances use two processes GES(Global Enqueue Service), GCS(Global Cache Service) that enable cache fusion.Oracle RAC instances are composed of following background processes:
ACMS—Atomic Controlfile to Memory Service (ACMS)
GTX0-j—Global Transaction Process
LMON—Global Enqueue Service Monitor
LMD—Global Enqueue Service Daemon
LMS—Global Cache Service Process
LCK0—Instance Enqueue Process
RMSn—Oracle RAC Management Processes (RMSn)
RSMN—Remote Slave Monitor

________________________________________

Q What is GRD?
GRD stands for Global Resource Directory. The GES and GCS maintains records of the statuses of each datafile and each cahed block using global resource directory.This process is referred to as cache fusion and helps in data integrity.

________________________________________
Q What are the different network components are in 10g RAC?
public, private, and vip components
Private interfaces is for intra node communication. VIP is all about availability of application. When a node fails then the VIP component fail over to some other node, this is the reason that all applications should based on vip components means tns entries should have vip entry in the host list

________________________________________

Q Give Details on ACMS:
ACMS stands for Atomic Controlfile Memory Service.In an Oracle RAC environment ACMS is an agent that ensures a distributed SGA memory update(ie)SGA updates are globally committed on success or globally aborted in event of a failure.

________________________________________
Q What are the major RAC wait events?
In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs.The most common wait events related to this are gc cr request and gc buffer busy

GC CR request :the time it takes to retrieve the data from the remote cache

Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)

GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block.

________________________________________

Q Give details on GTX0-j 
The process provides transparent support for XA global transactions in a RAC environment.The database autotunes the number of these processes based on the workload of XA global transactions.

________________________________________

Q Give details on LMON
This process monitors global enques and resources across the cluster and performs global enqueue recovery operations.This is called as Global Enqueue Service Monitor.

________________________________________

Q Give details on LMD
This process is called as global enqueue service daemon. This process manages incoming remote resource requests within each instance.

________________________________________

Q Give details on LMS
This process is called as Global Cache service process.This process maintains statuses of datafiles and each cahed block by recording information in a Global Resource Dectory(GRD).This process also controls the flow of messages to remote instances and manages global data block access and transmits block images between the buffer caches of different instances.This processing is a part of cache fusion feature.

________________________________________

Q Give details on LCK0
This process is called as Instance enqueue process.This process manages non-cache fusion resource requests such as libry and row cache requests.

________________________________________

Q Give details on RMSn
This process is called as Oracle RAC management process.These pocesses perform managability tasks for Oracle RAC.Tasks include creation of resources related Oracle RAC when new instances are added to the cluster.

________________________________________
Q How to export and import crs resources while migrating Oracle RAC to new server.
Below script generate svrctl add script for database, instance, service and 11G listeners from OCR from current RAC.
Save the result of the script and run it at new RAC.

for DBNAME in $(srvctl config database)
do

# Generate DB resource

srvctl config database -d $DBNAME -a | awk -v dbname="$DBNAME" \
'BEGIN { FS=":" }
$1~/Oracle home/ || $1~/ORACLE_HOME/ {dbhome = "-o" $2}
$1~/Spfile/ || $1~/SPFILE/ {spfile = "-p" $2}
$1~/Disk Groups/ {dg = "-a" $2}
END { if (avail == "-a ") {avail = ""}; printf "%s %s %s %s %s\n", "srvctl add database -d ", dbname, dbhome, spfile, dg }'

# Generate Instance resource

srvctl status database -d $DBNAME | awk -v dbname="$DBNAME" \
'$4~/running/ { printf "%s %s %s %s %s %s\n", "srvctl add instance -d ",dbname, " -i ", $2 ," -n ", $7 }
$5~/running/ { printf "%s %s %s %s %s %s \n", "srvctl add instance -d ",dbname, " -i ", $2 ," -n ", $8 }'

# Modify instance for 10G - ASM dependency

if [ $(echo $ORACLE_HOME | grep "1020" | wc -l ) -eq 1 ]
then
srvctl status database -d $DBNAME | awk -v dbname="$DBNAME" \
'$2~/1$/ { printf "%s %s %s %s %s \n", "srvctl modify instance -d ",dbname, " -i ", $2 ," -s +ASM1" }
$2~/2$/ { printf "%s %s %s %s %s \n", "srvctl modify instance -d ",dbname, " -i ", $2 ," -s +ASM2" }
$2~/3$/ { printf "%s %s %s %s %s \n", "srvctl modify instance -d ",dbname, " -i ", $2 ," -s +ASM3" }
$2~/4$/ { printf "%s %s %s %s %s \n", "srvctl modify instance -d ",dbname, " -i ", $2 ," -s +ASM4" }'
fi

echo "srvctl start database -d $DBNAME"

# Generate Service resource

snamelist=$(srvctl status service -d $DBNAME | awk '{print $2}')

for sname in $snamelist
do
srvctl config service -d $DBNAME -s $sname| awk -v dbname="$DBNAME" -v sname=$sname \
'BEGIN { FS=":"}
$1~/Preferred instances/ {pref = "-r" $2}
$1~/PREF/ {pref = "-r" $2; sub(/AVAIL/, "", pref) }
$1~/Available instances/ {avail = "-a" $2}
$2~/AVAIL/ {avail = "-a" $3}
$1~/Failover type/ {ft = "-e" $2}
$1~/Failover method/ {fm = "-m" $2}
$1~/Runtime Load Balancing Goal/ {g = "-B" $2}
END { if (avail == "-a ") {avail = ""}; printf "%s %s %s %s %s %s %s %s %s %s\n", "srvctl add service -d ",dbname, "-s ", sname, pref, avail ,ft, fm,g, "-P BASIC"}'
echo "srvctl start service -d $DBNAME -s $sname"
done
done

# Listener at 11G Home. 10G listener can't ba added with srvctl.


srvctl config listener | awk \
'BEGIN { FS=":"; state = 0; }
$1~/Name/ {lname = "-l" $2; state=1};
$1~/Home/ && state == 1 {ohome = "-o" $2; state=2;}
$1~/End points/ && state == 2 {lport = "-p " $3; state=3;}
state == 3 {if (ohome != "-o ") {printf "%s %s %s %s\n", "srvctl add listener ", lname, ohome, lport;} state=0;}'

________________________________________
 Q Give details on RSMN
This process is called as Remote Slave Monitor.This process manages background slave process creation andd communication on remote instances. This is a background slave process.This process performs tasks on behalf of a co-ordinating process running in another instance.

________________________________________
Q What components in RAC must reside in shared storage?
All datafiles, controlfiles, SPFIles, redo log files must reside on cluster-aware shred storage.

________________________________________

Q What is the significance of using cluster-aware shared storage in an Oracle RAC environment?
All instances of an Oracle RAC can access all the datafiles,control files, SPFILE's, redolog files when these files are hosted out of cluster-aware shared storage which are group of shared disks.

________________________________________

Q Give few examples for solutions that support cluster storage
ASM(automatic storage management),raw disk devices,network file system(NFS), OCFS2 and OCFS(Oracle Cluster Fie systems).

________________________________________

Q What is an interconnect network?
An interconnect network is a private network that connects all of the servers in a cluster. The interconnect network uses a switch/multiple switches that only the nodes in the cluster can access.

________________________________________

Q How can we configure the cluster interconnect?
Configure User Datagram Protocol(UDP) on Gigabit ethernet for cluster interconnect.On unix and linux systems we use UDP and RDS(Reliable data socket) protocols to be used by Oracle Clusterware.Windows clusters use the TCP protocol.

________________________________________

Q Can we use crossover cables with Oracle Clusterware interconnects?
No, crossover cables are not supported with Oracle Clusterware intercnects.

________________________________________

Q What is the use of cluster interconnect?
Cluster interconnect is used by the Cache fusion for inter instance communication.

________________________________________

Q How do users connect to database in an Oracle RAC environment?
Users can access a RAC database using a client/server configuration or through one or more middle tiers ,with or without connection pooling.Users can use oracle services feature to connect to database.

________________________________________

Q What is the use of a service in Oracle RAC environment?
Applications should use the services feature to connect to the Oracle database.Services enable us to define rules and characteristics to control how users and applications connect to database instances.

________________________________________

Q What are the characteristics controlled by Oracle services feature?
The charateristics include a unique name, workload balancing and failover options,and high availability characteristics.

________________________________________
Q What enables the load balancing of applications in RAC?
Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database.

________________________________________

Q What is a virtual IP address or VIP?
A virtl IP address or VIP is an alternate IP address that the client connectins use instead of the standard public IP address. To configureVIP address, we need to reserve a spare IP address for each node, and the IP addresses must use the same subnet as the public network.

________________________________________

Q What is the use of VIP?
If a node fails, then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections.

________________________________________

Q Give situations under which VIP address failover happens
VIP addresses failover happens when the node on which the VIP address runs fails, all interfaces for the VIP address fails, all interfaces for the VIP address are disconnected from the network.

________________________________________

Q What is the significance of VIP address failover?
When a VIP address failover happens, Clients that attempt to connect to the VIP address receive a rapid connection refused error .They don't have to wait for TCP connection timeout messages.

________________________________________

Q What are the administrative tools used for Oracle RAC environments?
Oracle RAC cluster can be administered as a single image using OEM(Enterprise Manager),SQL*PLUS,Servercontrol(SRVCTL),clusterverificationutility(cvu),DBCA,NETCA

________________________________________

Q How do we verify that RAC instances are running?
Issue the following query from any one node connecting through SQL*PLUS.
$connect sys/sys as sysdba
SQL>select * from V$ACTIVE_INSTANCES;
The query gives the instance number under INST_NUMBER column,host_:instancename under INST_NAME column.

________________________________________

Q What is FAN?
Fast application Notification as it abbreviates to FAN relates to the events related to instances,services and nodes.This is a notification mechanism that Oracle RAc uses to notify other processes about the configuration and service level information that includes service status changes such as,UP or DOWN events.Applications can respond to FAN events and take immediate action.

________________________________________

Q Where can we apply FAN UP and DOWN events?
FAN UP and FAN DOWN events can be applied to instances,services and nodes.
State the use of FAN events in case of a cluster configuration change?
During times of cluster configuration changes,Oracle RAC high availability framework publishes a FAN event immediately when a state change occurs in the cluster.So applications can receive FAN events and react immediately.This prevents applications from polling database and detecting a problem after such a state change.

________________________________________

Q Why should we have seperate homes for ASm instance?
It is a good practice to have ASM home seperate from the database hom(ORACLE_HOME).This helps in upgrading and patching ASM and the Oracle database software independent of each other.Also,we can deinstall the Oracle database software independent of the ASM instance.

________________________________________

Q What is the advantage of using ASM?
Having ASM is the Oracle recommended storage option for RAC databases as the ASM maximizes performance by managing the storage configuration across the disks.ASM does this by distributing the database file across all of the available storage within our cluster database environment.

________________________________________

Q What is rolling upgrade?
It is a new ASM feature from Database 11g.ASM instances in Oracle database 11g release(from 11.1) can be upgraded or patched using rolling upgrade feature. This enables us to patch or upgrade ASM nodes in a clustered environment without affecting database availability.During a rolling upgrade we can maintain a functional cluster while one or more of the nodes in the cluster are running in different software versions.

________________________________________

Q Can rolling upgrade be used to upgrade from 10g to 11g database?
No,it can be used only for Oracle database 11g releases(from 11.1).

________________________________________

Q State the initialization parameters that must have same value for every instance in an Oracle RAC database
Some initialization parameters are critical at the database creation time and must have same values.Their value must be specified in SPFILE or PFILE for every instance.The list of parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_passWORD_FILE
UNDO_MANAGEMENT

________________________________________
Q What is ORA-00603: ORACLE server session terminated by fatal error or ORA-29702: error occurred in Cluster Group Service operation?
RAC node name was listed in the loopback address...

________________________________________

Q Can the DML_LOCKS and RESULT_CACHE_MAX_SIZE be identical on all instances?
These parameters can be identical on all instances only if these parameter values are set to zero.
What two parameters must be set at the time of starting up an ASM instance in a RAC environment?The parameters CLUSTER_DATABASE and INSTANCE_TYPE must be set.

________________________________________

Q Mention the components of Oracle clusterware
Oracle clusterware is made up of components like voting disk and Oracle Cluster Registry(OCR).

________________________________________
Q What is a CRS resource?
Oracle clusterware is used to manage high-availability operations in a cluster.Anything that Oracle Clusterware manages is known as a CRS resource.Some examples of CRS resources are database,an instance,a service,a listener,a VIP address,an application process etc.

________________________________________

Q What is the use of OCR?
Oracle clusterware manages CRS resources based on the configuration information of CRS resources stored in OCR(Oracle Cluster Registry).

________________________________________

Q How does a Oracle Clusterware manage CRS resources?
Oracle clusterware manages CRS resources based on the configuration information of CRS resources stored in OCR(Oracle Cluster Registry).

________________________________________

Q Name some Oracle clusterware tools and their uses?
OIFCFG - allocating and deallocating network interfaces
OCRCONFIG - Command-line tool for managing Oracle Cluster Registry
OCRDUMP - Identify the interconnect being used
CVU - Cluster verification utility to get status of CRS resources

________________________________________

Q What are the modes of deleting instances from ORacle Real Application cluster Databases?
We can delete instances using silent mode or interactive mode using DBCA(Database Configuration Assistant).

________________________________________

Q How do we remove ASM from a Oracle RAC environment?
We need to stop and delete the instance in the node first in interactive or silent mode.After that asm can be removed using srvctl tool as follows:
srvctl stop asm -n node_name
srvctl remove asm -n node_name
We can verify if ASM has been removed by issuing the following command:
srvctl config asm -n node_name

________________________________________

Q How do we verify that an instance has been removed from OCR after deleting an instance?
Issue the following srvctl command:
srvctl config database -d database_name
cd CRS_HOME/bin
./crs_stat

________________________________________

Q How do we verify an existing current backup of OCR?
We can verify the current backup of OCR using the following command : ocrconfig -showbackup
What are the performance views in an Oracle RAC environment?
We have v$ views that are instance specific. In addition we have GV$ views called as global views that has an INST_ID column of numeric data type.GV$ views obtain information from individual V$ views.
What are the types of connection load-balancing?
There are two types of connection load-balancing:server-side load balancing and client-side load balancing.

________________________________________

Q What is the difference between server-side and client-side connection load balancing?
Client-side balancing happens at client side where load balancing is done using listener.In case of server-side load balancing listener uses a load-balancing advisory to redirect connections to the instance providing best service.

________________________________________

Q What are the three greatest benefits that RAC provides??
The three main benefits are availability, scalability, and the ability to use low cost commodity hardware. RAC allows an application to scale vertically, by adding CPU, disk and memory resources to an individual server. But RAC also provides horizontal scalability, which is achieved by adding new nodes into the cluster. RAC also allows an organization to bring these resources online as they are needed. This can save a small or midsize organization a lot of money in the early stages of a project.
In a RAC environment, if a node in the cluster fails, the application continues to run on the surviving nodes contained in the cluster. If your application is configured correctly, most users won't even know that the node they were running on became unavailable.

________________________________________
Q What are the major RAC wait events?
In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs.The most common wait events related to this are gc cr request and gc buffer busy

GC CR request: the time it takes to retrieve the data from the remote cache

Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks
requested by an Oracle session. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect.)
GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block.

________________________________________

Q What are the different network components in Oracle 10g RAC?

We have public, private, and VIP components. Private interfaces is for intra node communication. VIP is all about availability of application. When a node fails then the VIP component will fail over to some other node, this is the reason that all applications should be based on VIP components.  This means that tns entries should have VIP entry in the host list.

________________________________________
Q Tune the following RAC DATABASE (DBNAME=PROD) which is 3 node RAC.

PROD1             PROD2                     PROD3
CPU 8               CPU 15                    CPU 8
32 GB RAM       12 GB RAM             16 GB RAM


What are you looking for here? What tuning information do you expect?
It is a 3 node cluster with different hardware configuration running RAC.
I would put 20% of the memory for Oracle in each node. So that would mean that the SGA is different in each of the nodes.
Also since the CPU's are different PROD2 can have more number of max number of processes as compared to the rest of them.

But as I said this is just configuration, this is not tuning. Question is not clear.


________________________________________
Q Write a sample script for RMAN for the recovery if all the instance are down.(First explain the procedure how you will restore)
Bring all nodes down.
Start one Node
Restore all datafiles and archive logs.
Recover 1 Node.
Open the database.
bring other nodes up.
Confirm that all nodes are operational.

________________________________________
Q. Clients are performing some operation and suddenly one of the datafile is experiencing problem what do you do? The cluster is a two node one.
A. Bring the datafile offline recover the datafile.

________________________________________
Q. How can you connect to a specific node in a RAC environment?
A. tnsnames.ora ensure that you have INSTANCE_NAME specified in it.

________________________________________
Q How to move OCR and Voting disk to new storage device?
Moving OCR
==========
You must be logged in as the root user, because root owns the OCR files. Also an ocrmirror must be in place before trying to replace the OCR device.

Make sure there is a recent backup of the OCR file before making any changes:

ocrconfig –showbackup

If there is not a recent backup copy of the OCR file, an export can be taken for the current OCR file. Use the following command to generate an export of the online OCR file:

In 10.2

# ocrconfig –export -s online

In 11g

# ocrconfig -manualbackup

The new OCR disk must be owned by root, must be in the oinstall group, and must have permissions set to 640. Provide at least 100 MB disk space for the OCR.

On one node as root run:

# ocrconfig -replace ocr 
# ocrconfig -replace ocrmirror 

Now run ocrcheck to verify if the OCR is pointing to the new file

Moving Voting Disk
==================

Note: crsctl votedisk commands must be run as root

Shutdown the Oracle Clusterware (crsctl stop crs as root) on all nodes before making any modification to the voting disk. Determine the current voting disk location using:

crsctl query css votedisk

Take a backup of all voting disk:

dd if=voting_disk_name of=backup_file_name

To move a Voting Disk, provide the full path including file name:

crsctl delete css votedisk –force
crsctl add css votedisk –force

After modifying the voting disk, start the Oracle Clusterware stack on all nodes

# crsctl start crs

Verify the voting disk location using

crsctl query css votedisk

________________________________________
Q What is runfixup.sh script in Oracle Clusterware 11g release 2 installation
With Oracle Clusterware 11g release 2, Oracle Universal Installer (OUI) detects when the minimum requirements for an installation are not met, and creates shell scripts, called fixup scripts, to finish incomplete system configuration steps. If OUI detects an incomplete task, then it generates fixup scripts (runfixup.sh). You can run the fixup script after you click the Fix and Check Again Button.
The Fixup script does the following:
■ If necessary sets kernel parameters to values required for successful installation,
including:
– Shared memory parameters.
– Open file descriptor and UDP send/receive parameters.
■ Sets permissions on the Oracle Inventory (central inventory) directory.
■ Reconfigures primary and secondary group memberships for the installation
owner, if necessary, for the Oracle Inventory directory and the operating system
privileges groups.
■ Sets shell limits if necessary to required values.

________________________________________
Q When exactly during the installation process are clusterware components created?

After fulfilling the pre-installation requirements, the basic installation steps to follow are:

1. Invoke the Oracle Universal Installer (OUI)

2. Enter the different information for some components like:
- name of the cluster
- public and private node names
- location for OCR and Voting Disks
- network interfaces used for RAC instances
-etc.

3. After the Summary screen, OUI will start copying under the $CRS_HOME (this is the $ORACLE_HOME for Oracle Clusterware) in the local node the libraries and executables.
- here we will have the daemons and scripts init.* created and configured properly.

Oracle Clusterware is formed of several daemons, each one of which have a special function inside the stack. Daemons are executed via the init.* scripts (init.cssd, init.crsd and init.evmd).

- note that for CRS only some client libraries are recreated, but not all the executables (as for the RDBMS).

4. Later the software is propagated to the rest of the nodes in the cluster and the oraInventory is updated.

5. The installer will ask to execute root.sh on each node. Until this step the software for Oracle Clusterware is inside the $CRS_HOME.

Running root.sh will create several components outside the $CRS_HOME:

- OCR and VD will be formated.

- control files (or SCLS_SRC files ) will be created with the correct contents to start Oracle Clusterware.

These files are used to control some aspects of Oracle Clusterware like:
- enable/disable processes from the CSSD family (Eg. oprocd, oslsvmon)
- stop the daemons (ocssd.bin, crsd.bin, etc).
- prevent Oracle Clusterware from being started when the machine boots.
- etc.

- /etc/inittab will be updated and the init process is notified.

In order to start the Oracle Clusterware daemons, the init.* scripts first need to be run. These scripts are executed by the daemon init. To accomplish this some entries must be created in the file /etc/inittab.

- the different processes init.* (init.cssd, init.crsd, etc) will start the daemons (ocssd.bin, crsd.bin, etc). When all the daemons are running then we can say that the installation was successful

- On 10.2 and later, running root.sh on the last node in the cluster also will create the nodeapps (VIP, GSD and ONS). On 10.1, VIPCA is executed as part of the RAC installation.

6. After running root.sh on each node, we need to continue with the OUI session. After pressing the 'OK' button OUI will include the information for the public and cluster_interconnect interfaces. Also CVU (Cluster Verification Utility) will be executed.


________________________________________
Q What are Oracle Clusterware processes for 10g on Unix and Linux

Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user; failure of this process results in cluster restart.

Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database, an instance, a service, a Listener, a virtual IP (VIP) address, an application process, and so on) based on the resource's configuration information that is stored in the OCR. This includes start, stop, monitor and failover operations. This process runs as the root user

Event manager daemon (evmd) —A background process that publishes events that crs creates.

Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. OPROCD performs its check, stops running, and if the wake up is beyond the expected time, then OPROCD resets the processor and reboots the node. An OPROCD failure results in Oracle Clusterware restarting the node. OPROCD uses the hangcheck timer on Linux platforms.

RACG (racgmain, racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources. Runs server callout scripts when FAN events occur.

________________________________________

Q What are Oracle database background processes specific to RAC

•LMS—Global Cache Service Process

•LMD—Global Enqueue Service Daemon

•LMON—Global Enqueue Service Monitor

•LCK0—Instance Enqueue Process

To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction, Oracle RAC instances use two processes, the Global Cache Service (GCS) and the Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). The GRD contents are distributed across all of the active instances.

________________________________________
Q What are Oracle Clusterware Components

Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. The voting disk must reside on shared disk.

Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster

________________________________________
Q How do you troubleshoot node reboot 

Please check metalink ...

Note 265769.1 Troubleshooting CRS Reboots
Note.559365.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions.

________________________________________
Q How do you backup the OCR

There is an automatic backup mechanism for OCR. The default location is : $ORA_CRS_HOME\cdata\"clustername"\

To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore

With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the command:
# ocrconfig -manualbackup 


________________________________________
Q How do you backup voting disk

#dd if=voting_disk_name of=backup_file_name

________________________________________
Q How do I identify the voting disk location 

#crsctl query css votedisk

________________________________________
Q How do I identify the OCR file location 

check /var/opt/oracle/ocr.loc or /etc/ocr.loc ( depends upon platform)
or
#ocrcheck

________________________________________
Q Is ssh required for normal Oracle RAC operation ?

"ssh" are not required for normal Oracle RAC operation. However "ssh" should be enabled for Oracle RAC and patchset installation.

________________________________________
Q What is SCAN?

Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster.


________________________________________
Q What is the purpose of Private Interconnect ?

Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. This communication is based on the TCP protocol.
RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). Cache Fusion is the remote memory mapping of Oracle buffers, shared between the caches of participating nodes in the cluster. 

________________________________________
Q Why do we have a Virtual IP (VIP) in Oracle RAC?

Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. As a result, you don't really have a good HA solution without using VIPs.
When a node fails, the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately

________________________________________
Q What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report? 

This is most likely due to a fault in interconnect network.
Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your system administrator find the fault with network. 

________________________________________
Q How many nodes are supported in a RAC Database?

10g Release 2, support 100 nodes in a cluster using Oracle Clusterware, and 100 instances in a RAC database.

________________________________________

Q Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however sqlplus can start it on both nodes? How do you identify the problem?

Set the environmental variable SRVM_TRACE to true.. And start the instance with srvctl. Now you will get detailed error stack.

________________________________________

Q what is the purpose of the ONS daemon?

The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node.
The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners.

This in order to facilitate:

a. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes.
b. the 10gR2 Load Balancing Advisory, the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications.

________________________________________
Q How do users connect to database in an Oracle RAC environment?
Users can access a RAC database using a client/server configuration or through one or more middle tiers, with or without connection pooling. Users can use oracle services feature to connect to database.

________________________________________
Q What is the use of a service in Oracle RAC environment?
Applications should use the services feature to connect to the Oracle database. Services enable us to define rules and characteristics to control how users and applications connect to database instances.

________________________________________
Q What are the characteristics controlled by Oracle services feature?
The characteristics include a unique name, workload balancing and failover options, and high availability characteristics.

________________________________________
Q What is a voting disk?
A voting disk is a file that manages information about node membership.

________________________________________
Q What are the administrative tasks involved with voting disk?
Following administrative tasks are performed with the voting disk :
1) Backing up voting disks
2) Recovering Voting disks
3) Adding voting disks
4) Deleting voting disks
5) Moving voting disks

________________________________________
Q How do we backup voting disks?
1) Oracle recommends that you back up your voting disk after the initial cluster creation and after we complete any node addition or deletion procedures.
2) First, as root user, stop Oracle Clusterware (with the crsctl stop crs command) on all nodes. Then, determine the current voting disk by issuing the following command:
crsctl query votedisk css
3) Then, issue the dd or ocopy command to back up a voting disk, as appropriate.
Give the syntax of backing up voting disks:-
On Linux or UNIX systems:
dd if=voting_disk_name of=backup_file_name
where,
voting_disk_name is the name of the active voting disk
backup_file_name is the name of the file to which we want to back up the voting disk contents
On Windows systems, use the ocopy command:
ocopy voting_disk_name backup_file_name

________________________________________
Q What is the Oracle Recommendation for backing up voting disk?
Oracle recommends us to use the dd command to backup the voting disk with a minimum block size of 4KB.

________________________________________
Q How do you restore a voting disk?
To restore the backup of your voting disk, issue the dd or ocopy command for Linux and UNIX systems or ocopy for Windows systems respectively.
On Linux or UNIX systems:
dd if=backup_file_name of=voting_disk_name
On Windows systems, use the ocopy command:
ocopy backup_file_name voting_disk_name
where,
backup_file_name is the name of the voting disk backup file
voting_disk_name is the name of the active voting disk

________________________________________
Q How can we add and remove multiple voting disks?
If we have multiple voting disks, then we can remove the voting disks and add them back into our environment using the following commands, where path is the complete path of the location where the voting disk resides:
crsctl delete css votedisk path
crsctl add css votedisk path


________________________________________
Q How do we stop Oracle Clusterware?When do we stop it?
Before making any modification to the voting disk, as root user, stop Oracle Clusterware using the crsctl stop crs command on all nodes.


________________________________________
Q How do we add voting disk?
To add a voting disk, issue the following command as the root user, replacing the path variable with the fully qualified path name for the voting disk we want to add:
crsctl add css votedisk path -force

________________________________________
Q How do we move voting disks?
To move a voting disk, issue the following commands as the root user, replacing the path variable with the fully qualified path name for the voting disk we want to move:
crsctl delete css votedisk path -force
crsctl add css votedisk path -force

________________________________________
Q How do we remove voting disks?
To remove a voting disk, issue the following command as the root user, replacing the path variable with the fully qualified path name for the voting disk we want to remove:
crsctl delete css votedisk path -force

________________________________________
Q What should we do after modifying voting disks?
After modifying the voting disk, restart Oracle Clusterware using the crsctl start crs command on all nodes, and verify the voting disk location using the following command:
crsctl query css votedisk

________________________________________
Q When can we use -force option?
If our cluster is down, then we can include the -force option to modify the voting disk configuration, without interacting with active Oracle Clusterware daemons. However, using the -force option while any cluster node is active may corrupt our configuration.

No comments:

Post a Comment