Correct Instrumentation Is Key
In the mid 1980s realized that no matter how many counters and ratios they looked at. it was still pure guesswork (hence luck or thereof) whether a person managed to identify and remove the correct (in other words, the biggest) bottleneck of a given application or business unit. So they instrumented the whole mainframe environment, including DB2, MVS(later OS/390, at present z/OS), and other components. The instrumentation aimed at providing time-based measurements on the session level, and proved so powerful that today, many years later, it's possible to predict within a very small margin what, say, a CPU-upgrade will mean in terms of response time for each application.
FUN FACTThe DB2 database code for AIX and Windows were written by different teams with little or no contact to the mainframe coders. Consequently, DB2 on AIX and Windows are not instrumented. Amazing, sad and true.
Around 1991 or 1992 Juan Loaiza and others from Oracle development were forced to instrument the Oracle kernel in the same way. Here's the story, as told to tribute to one of the truly great minds inside Oracle Development.
I think what you are referring to are the wait statistics that were implemented in 7.0. This stuff was developed because we were running a benchmark that we could not get to perform. We had spent several weeks trying to figure out what was happening with no success. The symptoms were clear -- the system was mostly idle -- we just couldn't figure out why.
We looked the statistics and ratios and kept coming up theories, the module was that none of them were right. So we wasted weeks tuning and fixing things that were not the problem. Finally we ran out of ideas and were forced to go back and instrument the code to figure out what the problem was.
Once the waits were instrumented the problem was diagnosed in minutes. We were having "free buffer" waits because the DBWR was not writing blocks fast enough, It's amazing how hard that was to figure out with statistics, and how easy it was to figure out once the waits were instrumented.
The "credit" for this should go to a number of people. I remember that Mark Porter was involved, and Keshevan Srinivasan did most of the actual instrumentation of the code. There were probably others involved but it has been so many years that I don't remember it clearly anymore.
It is worth noticing that if any of numerous suggestions from "Guess&Grimacing" sessions held in Oracle Development had helped, the instrumentation of the kernel may never have taken place. In the mid 1990s Anjo Kolk invented the YAPP method(as he describes in Chapter 4). In the process, he became the first human on Earth to take full advantage of the instrumentation
The History of SQL Trace
Let me switch gears for a while and tell the story of Oracle's SQL trace feature, which would eventually become the extended SQL trace feature used today. The Oracle extended SQL trace feature exists today because, fortunately, Oracle customers aren't the only ones who have to wrestle with Oracle application performance problems. The men and women who build the Oracle kernel fight the same problems too, both in customer situations and in competitive benchmarks.The following quote is from Juan Loaiza, an Oracle kernel architect, and at the time of this writing a vice president at Oracle Corporation(you can read the full story in Chapter 2, where Juan tells of the motives for inventing SQL trace and then extending it to the state in which we know today):
I've always thought that diagnosing performance issues boils down to figuring out where the time is going in the system. If you can attribute the time correctly, then the source of the problem becomes obvious
Certainly our professional community is indebted to Juan Loaiza and his team for giving us the microscope to see how the Oracle kernel is spending all of our user's time.
SQL trace is least as old as Oracle version 5, which was released in 1986. The syntax to turn it on and off was simple:
select trace('sql', 1) from dual
select trace('sql', 0) from dual
Apparently, not too many people knew about SQL trace in Oracle version 5, and probably fewer than that actually used it. In an old Oracle internal document called "Optimizing", Oracle described the trace function as follows (quoted verbatim from source):
The Kernel provides a trace function to provide information about internal operations. The trace function is essentially a debug aid for Software Development
-- It is not documented.
-- it is not supported.
-- [It] will not be a function in ORACLE version 6.
Version 6 In Oracle version 6, SQL tracing become a core, documented feature for everyone to use. In version 6, Oracle introduced the syntax that also works in 7, 8, 9, and 10: