Wednesday, February 16, 2011

Part 1 and 2 of Statistical Sampling

Is Statistics
a Criminal Act?
OR
Remember X? That Letter You Learned to Hate in College?

Posted to AuditSkills.com In July 2009

By Bruce Truitt

I am pushing 60. Recovering auditor. Working on my addictions to criteria, findings, and paper. Still play amateur baseball and professional rock 'n' roll. No idea what I want to be if I grow up. I also teach statistics. WAIT! WAIT! DON'T STOP READING! Don't judge me TOO harshly! I am NOT a criminal!

Sorry, didn't mean to yell. Thanks for not leaving the room. No, really. I truly appreciate it.

You see, I often find that reaction to this small confession fuses catatonia, revulsion, and prayer into an awkward social ooze. Its effect is deadly, not unlike hour-long renditions of "Lush Life," "The Chicken Dance," or "Feelings." Maybe I should just tell folks "I am a doctor" or even "I am an accountant." At least the conversation would continue, albeit with a turn to what ails you physically or fiscally.

I have assiduously researched the genesis of these abysses in social discourse. I have discovered direct and robust correlations (sorry, can't help it) between several ordered factors:

  1. The lion's share of my students, associates, colleagues, and friends are college graduates.
  2. As college graduates, virtually all of them had to take statistics. (Now there's a motivator!)
  3. Fewer than 5% of them remember anything from their statistics courses. (How's that for ROI?)
  4. Almost all of them put their statistics book at the top of the stack of items sold back to their collegiate bookstore. (Well, there was some ROI, I guess.)
  5. Those who did not had no idea where their stats book was. (Out of mind, out of sight.)
  6. All of them felt a strong sense of lower digestive tract cathartic relief when their "sadistics" final was over. (!!!)

So, bastioned by these "whats," and, being a card-carrying nerd, I probed the "whys." The results were, again, curiously linear (oops, I did it again):

  1. Their statistics course was taught by a non-native English speaker who mumbled some Klingon dialect while scribbling incomprehensible Sumerian hieroglyphics on the blackboard, with their backs to their defenseless prey. (Ye shall know the truth and the truth shall make you sleep!)
  2. Periodically they would turn around to wax ecstatic over the divine design of the theory and the eternal elegance of the mathematics. (Birth, death, infinity, standard deviation... It's all the same right?)
  3. All those present were absent yet recalled a misty, eerie out-of-body after-life aura baby-you-can-drive-my-Karma sorta thingy. (It was everything I could do to not mention past-life regression here. You're quite welcome.)
  4. Accompanying these euphoric transcendences was an odd weightlessness akin to "floating on a sea of Greek symbols and Latin letters." (Who says these languages are dead!)

I remember sharing their pain. Perhaps the most statistically significant (yuk, yuk!) recollection was that each chapter in the book was "an unknowable universe unto itself." Let me explain. Imagine moving the Berlin Wall to the Amazon jungle and then transforming (arrrgh!) that construct into a Mobius strip. You know there's something on the other side, but you can't go over or around it to see if there really is an "other" side or just a word-problem-induced hallucination, much less check your lack of understanding.

We really are going somewhere with this, I promise.

Then it hit me. The "crime" was that statistics was made more complicated than it had to be. But why?!?

I realized that in virtually all statistics books, the seed, the primum mobile, the Alpha-Omega, the thread, the DNA, the glue that ties it all together was revealed not at the beginning but somewhere between chapters 7 and 11. Not a convenient story. "The Formula," in fact—in many ways the only formula you really need—was so tardy in its presentation that it was buried under the collapsed Berlin Wall in the Amazon jungle, in a fetid, Faulknerian rot.

Wanna know what that formula is? Here ‘tis, though, given the foregoing, not in mathematical terms:

Sample Size = Confidence x Variation
Precision

Are you transformed now? I hope so, but what's it all about, Alfie? Stay tuned. Same bat-time. Same bat-channel!

Nerds of the world unite!

Part 2

Confessions of a Recovering Auditor
OR
Ye Shall Know the Formula, and the Formula Shall Set Ye Free!

August 2009

By Bruce Truitt

Welcome back, Dear Reader.

Our first installment in these tales of woe and intrigue (q.v., "Is Statistics a Criminal Act?") ended with your humble scribe positing that "the seed, the primum mobile, the Alpha-Omega, the thread, the DNA, the glue that ties it all together" was the following, AKA "The Formula":

Sample Size = Confidence x Variation
Precision

In Part One, we also established that a whole passel of folks sprinted from their final "Sadistics" exam to the nearest trading post to turn their textbook into cash (and immediately thereafter into either anti-inflammatories or an adult beverage), assuming that the dumpster wasn't too seductive en route, and noting that "passel" really should be officially recognized by the National Bureau of Standards.

Yet, those of you who avoided this rush to rubles and somehow found the Herculean stamina required to take Statistics in multiple collegiate departments ("Inconceivable!" quoth Wallace Shawn) found yourselves in Dante's eighth level of academic Hell. You likely discovered that statistics terminology varies by department and—will the fun never stop?—by textbook or author. Next time the Ambien fails, compare statistical revelations from engineering, physics, psychology, sociology, math, education, and business textbooks. Vertigo by variation is assured! Even limiting ourselves to the parlance of auditing texts is daunting:

  • "Risk of over-reliance" AKA "Confidence level" AKA "Alpha"
  • "Estimated deviation rate" AKA "Expected population attribute error percentage" AKA "Standard deviation" AKA "Standard error, nee Relative error"
  • "Tolerable error rate" AKA "Upper error limit" AKA "Desired precision" AKA "Margin of error" AKA "Confidence interval"

With this many aliases, crimes must be afoot, or, at least, quests for immortality in a footnote.

(This part of the page deliberately blank for deep, cleansing breaths.)

Anyhoo, fact is all these terminologies do collapse into one of the three words in "The Formula"—confidence, variation, or precision. Repeat with me: Confidence. Variation. Precision.

Or, even more logically:

Sample Size = How Confident Must I Be In My Results x
How Much Variation Is In The Population
How Precise Do I Want To Be

Or, to totally eliminate the lingering stench of math:

The Amount Of Work I Gotta Do = How Often I Wanna Be Right x
How Screwy The World Is
How Close I Wanna Be To The Bullseye

So, let's play with The Formula a bit. For example, what happens to sample size if confidence goes up, i.e., if you want to be right more often?

Sample Size ↕ ? = Confidence ↑ x Variation
Precision

Right you are! Sample size goes up if you want to be more confident in your results. That's reasonable with or without math. You gotta do more work to be right more often.

Okus dokus. What happens to sample size if variation in the population increases, that is, if the world gets more screwy?

Sample Size ↕ ? = Confidence x Variation ↑
Precision

Right again—you're good! The sample size must again go up. This also makes sense. If the stuff you are sampling turns out to be more messed up than you thought it would be, you have to sample more to figure out how screwy it really is.

Then, what happens to sample size if you want to be more precise, i.e., the numeric value of precision goes down. Does sample size go up or down if you want to get closer to the bull's eye?

Sample Size ↕ ? = Confidence x Variation
Precision ↓

No math needed here either. Sample size has to rise. If you wanna get closer to the bulls eye, you gotta fire more arrows (sample more). Take it from one who rarely hit the blasted bull's eye and often missed the whole target.

All this makes perfect logical sense and is easily understood without those pesky Greek and Latin letters and {formulas [formulas (formulas) formulas] formulas}.

More confidence—more work. More variation—more work. More precise—more work. Period.

It does not matter whether you think about The Formula logically, linguistically, or mathematically, as long as you think about, speak it, share it, dream it, mantra it ... without ceasing until next we speak ...

Confidence—Variation - Precision

Confidence—Variation - Precision

Confidence—Variation - Precision

Confidence—Variation - Precision

Confidence—Variation - Precision

Confidence—Variation - Precision

Confidence—Variation - Precision

Confidence—Variation - Precision

Confidence—Variation - Precision ...

----------

Bruce Truitt has 25+ years' experience in applied statistics and government auditing, with particular focus on quantitative methods and reporting in health and human services fraud, waste, and abuse. His tools and methods are used by public and private sector entities in all 50 states and 33 foreign countries and have been recognized by the National State Auditors Association for Excellence in Accountability.

He also teaches the US Government Auditor's Training Institute's "Practical Statistical Sampling for Auditors" course, is on the National Medicaid Integrity Institute's faculty, and taught Quantitative Methods in Saint Edward's University's Graduate School of Business.

Bruce holds a Master of Public Affairs from the LBJ School of Public Affairs, as well as Masters' Degrees in Foreign Language Education and Russian and East European Studies from The University of Texas at Austin.

No comments:

Post a Comment