Common Accountics Science and Econometric Science Statistical Mistakes

Bob Jensen at Trinity University

Accountics is the mathematical science of values.
Charles Sprague [1887] as quoted by McMillan [1998, p. 1]
http://www.trinity.edu/rjensen/395wpTAR/Web/TAR395wp.htm#_msocom_1

Tom Lehrer on Mathematical Models and Statistics ---
http://www.youtube.com/watch?v=gfZWyUXn3So
You must watch this to the ending to appreciate it.

David Johnstone asked me to write a paper on the following:
"A Scrapbook on What's Wrong with the Past, Present and Future of Accountics Science"
Bob Jensen
February 19, 2014
SSRN Download:  http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2398296 

Abstract

For operational convenience I define accountics science as research that features equations and/or statistical inference. Historically, there was a heated debate in the 1920s as to whether the main research journal of academic accounting, The Accounting Review (TAR) that commenced in 1926, should be an accountics journal with articles that mostly featured equations. Practitioners and teachers of college accounting won that debate.

TAR articles and accountancy doctoral dissertations prior to the 1970s seldom had equations.  For reasons summarized below, doctoral programs and TAR evolved to where in the 1990s there where having equations became virtually a necessary condition for a doctoral dissertation and acceptance of a TAR article. Qualitative normative and case method methodologies disappeared from doctoral programs.

What’s really meant by “featured equations” in doctoral programs is merely symbolic of the fact that North American accounting doctoral programs pushed out most of the accounting to make way for econometrics and statistics that are now keys to the kingdom for promotion and tenure in accounting schools ---
http://www.trinity.edu/rjensen/Theory01.htm#DoctoralPrograms

The purpose of this paper is to make a case that the accountics science monopoly of our doctoral programs and published research is seriously flawed, especially its lack of concern about replication and focus on simplified artificial worlds that differ too much from reality to creatively discover findings of greater relevance to teachers of accounting and practitioners of accounting. Accountics scientists themselves became a Cargo Cult.

How to Mislead With Charts
"How to Lie with Charts," Harvard Business Review, December 2014 ---
https://hbr.org/2014/12/vision-statement-how-to-lie-with-charts
The above link is only a teaser. You have to pay to see the rest of the article.

"BP Misleads You With Charts," by Andrew Price, Good Blog, May 27, 2010 --- Click Here
http://www.good.is/post/bp-misleads-you-with-charts/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+good%2Flbvp+%28GOOD+Main+RSS+Feed%29

"Correlation or Causation? Need to prove something you already believe? Statistics are easy: All you need are two graphs and a leading question," by Vali Chandrasekaran, Business Week, December 1, 2011 ---
http://www.businessweek.com/magazine/correlation-or-causation-12012011-gfx.html

How to Mislead With Statistics
"Reminder: The FBI’s ‘Police Homicide’ Count Is Wrong," by Reuben Fischer-Baum, Nate Silver's 5:38 Blog, November 12, 2014 ---
http://fivethirtyeight.com/datalab/reminder-the-fbis-police-homicide-count-is-wrong/ 

How to Mislead With Statistics
"Some Stats Are Just Nonsense
," by Cullen Roche, Pragmatic Capitalism via Business Insider, November 15, 2014 ---
http://www.businessinsider.com/historical-statistical-and-nonsensical-2014-11

How to Mislead With Statistics
Common Accountics Science and Econometric Science Statistical Mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm

Kurtosis --- https://en.wikipedia.org/wiki/Kurtosis
W. S. Gosset (Student) provided this useful aid to help us remember the difference between platykurtic and leptokurtic distributions ---
http://davegiles.blogspot.com/2015/07/student-on-kurtosis.html


How to Mislead With Statistics and Visualization
"I'm Business Insider's math reporter, and these 10 everyday things drive me insane, by Andy Kiersz, Business Insider, August 2, 2015 ---
http://www.businessinsider.com/things-annoying-for-a-quant-reporter-2015-4 

Bob Jensen's threads on common statistical analysis and reporting mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm

Bob Jensen's threads on multivariate data visualization ---
http://www.trinity.edu/rjensen/352wpvisual/000datavisualization.htm 


Power of a Type 1 (alpha error) Statistical Test --- https://en.wikipedia.org/wiki/Statistical_power

From Econometrics Beat by David Giles on

Questions About the Size and Power of a Test ---
http://davegiles.blogspot.com/2015/07/questions-about-size-and-power-of-test.html

Type 2 Error --- https://en.wikipedia.org/wiki/Type_I_and_type_II_errors#Type_II_error
Jensen Comment
In most cases testing for Type 2 error is more ideal than testing for Type 1 error, but the Type 2 error tests are generally not robust in terms of imprecise knowledge of the underlying error distribution of the process. Type 2 error is sometimes tested in quality control in manufacturing where the underlying distribution of a process that meets specifications is well known (with operating characteristic curves) .
See Operating Characteristics --- https://en.wikipedia.org/wiki/False_positives_and_false_negatives

It's relatively rare in academic study to see tests of Type 2 error. I can't recall a single accountancy study of real-world data that tests for Type 2 error.


DATABASE BIASES AND ERRORS
My casual studies of accountics science articles suggests that over 90% of those studies rely exclusively on one or more public database whenever the studies use data. I find few accountics science research into bias and errors of those databases. Here's a short listing of research into these biases and errors, some of which were published by accountics scientists ---
 

DATABASE BIASES AND ERRORS ---
http://www.kellogg.northwestern.edu/rc/crsp-cstat-references.htm

This page provides references for articles that study specific aspects of CRSP, Compustat and other popular sources of data used by researchers at Kellogg. If you know of any additional references, please e-mail researchcomputing-help@kellogg.northwestern.edu.

What went wrong with accountics science?
http://www.trinity.edu/rjensen/Theory01.htm#WhatWentWrong


Nonparametric (Distribution-Free) Statistics --- http://en.wikipedia.org/wiki/Nonparametric_statistics

Nonparametric Regression --- http://en.wikipedia.org/wiki/Nonparametric_regression

Nonparametric regression is a form of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.

From the Econometrics Beat Blog of David Gilles on February 19, 2015
http://davegiles.blogspot.com/2015/02/applied-nonparametric-econometrics.html

Applied Nonparametric Econometrics

 
Recently, I received a copy of a new econometrics book, Applied Nonparametric Econometrics, by Daniel Henderson and Christopher Parmeter.

 
The title is pretty self-explanatory and, as you'd expect with any book published by CUP, this is a high-quality item.

 
The book's Introduction begins as follows:
"The goal of this book is to help bridge the gap between applied economists and theoretical econometricians/statisticians. The majority of empirical research in economics ignores the potential benefits of nonparametric methods and many theoretical nonparametric advances ignore the problems faced by practitioners. We do not believe that applied economists dismiss these methods because they do not like them.  We believe that they do not employ them because they do not understand how to use them or lack formal training on kernel smoothing."
The authors provide a very readable, but careful, treatment of the main topics in nonparamteric econometrics, and a feature of this book is the set of empirical examples. The book's website provides the data that are used (for replication purposes), as well as a number of routines in R. The latter provide useful additions to those that are available in the np package for R (Hayfield and Racine, 2008).

Jensen Comment
In general nonparametric analyses of data make fewer assumptions about model structures and error distributions. The hope is that the models will be more robust
http://en.wikipedia.org/wiki/Robust_statistics

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distributions. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations, for example, one and three; under this model, non-robust methods like a t-test work badly.

Added Jensen Comment
To illustrate one of the possible sacrifices of of nonparametric analysis versus parametric analysis is to compare the information content of an ordinal scale versus ratio scales (with a common zero points) and interval scales (no common zero points). For example, 100 students in a class are commonly given ordinal-ranks A,B,C,D, or F. We can convert them to numerics that are still only ordinal such as 4,3,2,1, or 0. There are no outliers that identify best student in the class as long as multiple students receive A grades in a given course.

Ordinal scales are not usually conducive to parametric analyses with assumed distributions like the normal distribution. But there are nonparametric tests for ordinal scales.

Temperature is recorded in one of two common interval scales --- Celsius or Fahrenheit. These scales can identify outliers but they are not ratio scales because they do not have a common zero point. Parametric analyses can be conducted but care must be taken since the choice of a scale can affect the outcomes.

Weight and distance and many other things can be measured on ratio scales such as the Metric System. Ratio scales have a common zero point no matter what the units of measurement. This greatly increases the types of parametric alternatives for statistical analysis.

My point is that nonparametric analysis is more useful when the measurements are crude such with binary or ordinal scales. Parametric analyses add power to the analyses when the measurements are more refined to identify degrees of separation between points and degrees of exceptionalism (outliers).

Both forms of analysis can be misleading when applied to data derived from non-stationary processes. For example, the grade point averages derived from ordinal grades of students in 40 courses spread across four years of study can be misleading for students who are severely trending upward or downward. For example, a grade point average of 2.83 for a typical male student who was very immature when entering college can be very misleading when rotten grades the first two years are offset by outstanding grades in the last two years. Comparing this outcome with the 2.83 gpa student who started out with all top grades and then became very troubled with terrible grades the last two years suggests that the two students are similar when in fact they are very dissimilar in the nonstationarities of academics.

A very common way that analysts mislead is to apply parametric analyses to data measured in binary or ordinal scales. For example a regression least-squares regression analysis might be applied to models with dummy (binary) variables. Or least-squares regression might be conducted on Likert-scale Reponses of persons where respondents are only given five choices (1 for lowest versus five for highest) that are really only ordinal but assumed to be ratio scaled ---
http://en.wikipedia.org/wiki/Likert_scale

Bob Jensen's threads:
Common Accountics Science and Econometric Science Statistical Mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm
---
 


"Statistical Controls Are Great - Except When They're Not!" by David Giles, Econometrics Beat, December 1, 2014 ---
http://davegiles.blogspot.com/2014/12/statistical-controls-are-great-except_1.html

A blog post today, titled, How Race Discrimination in Law Enforcement Actually Works", caught my eye. Seemed like an important topic. The post, by Ezra Klein, appeared on Vox.
 
I'm not going to discuss it in any detail, but I think that some readers of this blog will enjoy reading it. Here are a few selected passages, to whet your collective appetite:
 
"You see it all the time in studies. "We controlled for..." And then the list starts. The longer the better." (Oh boy, can I associate with that. Think of all of those seminars you've sat through.......)
"The problem with controls is that it's often hard to tell the difference between a variable that's obscuring the thing you're studying and a variable that is the thing you're studying."
"The papers brag about their controls. They dismiss past research because it had too few controls." (How many seminars was that?)
"Statistical Controls Are Great - Except When They're Not!"

 


Eight Econometrics Multiple-Choice Quiz Sets from David Giles
You might have to go to his site to get the quizzes to work.
Note that there are multiple questions for each quiz set.
Click on the arrow button to go to a subsequent question.

Would You Like Some Hot Potatoes?
http://davegiles.blogspot.com/2014/10/would-you-like-some-hot-potatoes.html

 
O.K., I know - that was a really cheap way of getting your attention.

 
However, it worked, and this post really is about Hot Potatoes - not the edible variety, but some teaching apps. from "Half-Baked Software" here at the University of Victoria.

 
To quote: 
"The Hot Potatoes suite includes six applications, enabling you to create interactive multiple-choice, short-answer, jumbled-sentence, crossword, matching/ordering and gap-fill exercises for the World Wide Web. Hot Potatoes is freeware, and you may use it for any purpose or project you like."
I've included some Hot Potatoes multiple choice exercises on the web pages for several of my courses for some years now. Recently, some of the students in my introductory graduate econometrics course mentioned that these exercises were quite helpful. So, I thought I'd share the Hot Potatoes apps. for that course with readers of this blog.

 
There are eight multiple-choice exercise sets in total, and you can run  them from here:

 
Quiz 1 ; Quiz 2 ; Quiz 3 ; Quiz 4; Quiz 5 ; Quiz 6Quiz  7 ; Quiz 8 .

 
I've also put the HTML and associated PDF files on the code page for this blog. If you're going to download them and use them on your own computer or website, just make sure that the PDF files are located in the same folder (directory) as the HTML files.
 
I plan to extend and update these Hot Potatoes exercises in the near future, but hopefully some readers will find them useful in the meantime.
 
From my "Recently Read" list:

"Statistical Inference: The Big Picture," by Robert E. Kass, Statistical Science 2011, Vol. 26, No. 1, 1–9 DOI: 10.1214/10-STS337 © Institute of Mathematical Statistics ---
http://www.stat.cmu.edu/~kass/papers/bigpic.pdf

Abstract.
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labeled here statistical pragmatism , serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mischaracterize the process of statistical inference and I propose an alternative “big picture” depiction.

Common Accountics Science and Econometric Science Statistical Mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm

Statistical Science Reading List for June 2014 Compiled by David Giles in Canada ---
http://davegiles.blogspot.com/2014/05/june-reading-list.html

Put away that novel! Here's some really fun June reading:

The Cult of Statistical Significance: How Standard Error Costs Us Jobs, Justice, and Lives ---
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm

Common Accountics Science and Econometric Science Statistical Mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm


November 7, 2014 posting by David Giles in his Econometrics Beat blog.

The Econometrics of Temporal Aggregation V - Testing for Normality

 
This post is one of a sequence of posts, the earlier members of which can be found here, here, here, and here. These posts are based on Giles (2014).

Some of the standard tests that we perform in econometrics can be affected by the level of aggregation of the data. Here, I'm concerned only with time-series data, and with temporal aggregation. I'm going to show you some preliminary results from work that I have in progress with
Ryan Godwin. Although these results relate to just one test, our work covers a range of testing problems.

I'm not supplying the EViews program code that was used to obtain the results below - at least, not for now. That's because what I'm reporting is based on work in progress. Sorry!

 
As in the earlier posts, let's suppose that the aggregation is over "m" high-frequency periods. A lower case symbol will represent a high-frequency observation on a variable of interest; and an upper-case symbol will denote the aggregated series.

So,
               Yt = yt + yt - 1 + ......+ yt - m + 1 .

If we're aggregating monthly (flow) data to quarterly data, then m = 3. In the case of aggregation from quarterly to annual data, m = 4, etc.

Now, let's investigate how such aggregation affects the performance of the well-known Jarque-Bera (1987) (J-B) test for the normality of the errors in a regression model. I've discussed some of the limitations of this test in an
earlier post, and you might find it helpful to look at that post (and this oneat this point. However, the J-B test is very widely used by econometricians, and it warrants some further consideration.

Consider the following a small Monte Carlo experiment.

Continued at
http://davegiles.blogspot.com/2014/11/the-econometrics-of-temporal.html#more

Jensen Comment
Perhaps an even bigger problem in aggregation is the assumption of stationarity.

"The Error Term in the History of Time Series Econometrics," , by David Giles, Econometrics Beat, December 12, 2014 ---
http://davegiles.blogspot.com/2014/12/the-error-term-in-history-of-time.html

 
While we're on the subject of the history of econometrics ......... blog-reader Mark Leeds kindly drew my attention to this interesting paper published by Duo Qin and Christopher Gilbert in Econometric Theory in 2001.

 
I don't recall reading this paper before - my loss.

 
Mark supplied me with a pre-publication version of the paper, which you can download here if you don't have access to Econometric Theory.

 
Here's the abstract:
"We argue that many methodological confusions in time-series econometrics may be seen as arising out of ambivalence or confusion about the error terms. Relationships between macroeconomic time series are inexact and, inevitably, the early econometricians found that any estimated relationship would only fit with errors. Slutsky interpreted these errors as shocks that constitute the motive force behind business cycles. Frisch tried to dissect further the errors into two parts: stimuli, which are analogous to shocks, and nuisance aberrations. However, he failed to provide a statistical framework to make this distinction operational. Haavelmo, and subsequent researchers at the Cowles Commission, saw errors in equations as providing the statistical foundations for econometric models, and required that they conform to a priori distributional assumptions specified in structural models of the general equilibrium type, later known as simultaneous-equations models (SEM). Since theoretical models were at that time mostly static, the structural modelling strategy relegated the dynamics in time-series data frequently to nuisance, atheoretical complications. Revival of the shock interpretation in theoretical models came about through the rational expectations movement and development of the VAR (Vector AutoRegression) modelling approach. The so-called LSE (London School of Economics) dynamic specification approach decomposes the dynamics of modelled variable into three parts: short-run shocks, disequilibrium shocks and innovative residuals, with only the first two of these sustaining an economic interpretation."

Jensen Comment
Note that this problem can arise in what we often do not think of as "time series" econometrics.

From Two Former Presidents of the AAA
"Some Methodological Deficiencies in Empirical Research Articles in Accounting." by Thomas R. Dyckman and Stephen A. Zeff , Accounting Horizons: September 2014, Vol. 28, No. 3, pp. 695-712 ---
http://aaajournals.org/doi/full/10.2308/acch-50818   (not free)

This paper uses a sample of the regression and behavioral papers published in The Accounting Review and the Journal of Accounting Research from September 2012 through May 2013. We argue first that the current research results reported in empirical regression papers fail adequately to justify the time period adopted for the study. Second, we maintain that the statistical analyses used in these papers as well as in the behavioral papers have produced flawed results. We further maintain that their tests of statistical significance are not appropriate and, more importantly, that these studies do not�and cannot�properly address the economic significance of the work. In other words, significance tests are not tests of the economic meaningfulness of the results. We suggest ways to avoid some but not all of these problems. We also argue that replication studies, which have been essentially abandoned by accounting researchers, can contribute to our search for truth, but few will be forthcoming unless the academic reward system is modified.

The free SSRN version of this paper is at
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2324266

This Dyckman and Zeff paper is indirectly related to the following technical econometrics research:
"The Econometrics of Temporal Aggregation - IV - Cointegration," by David Giles, Econometrics Blog, September 13, 2014 ---
http://davegiles.blogspot.com/2014/09/the-econometrics-of-temporal.html 

Common Accountics Science and Econometric Science Statistical Mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm

David Johnstone asked me to write a paper on the following:
"A Scrapbook on What's Wrong with the Past, Present and Future of Accountics Science"
Bob Jensen
February 19, 2014
SSRN Download:  http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2398296 

The Cult of Statistical Significance: How Standard Error Costs Us Jobs, Justice, and Lives ---
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm

Common Accountics Science and Econometric Science Statistical Mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm


Question about stationarity from a reader of Econometrics Blog by David Giles on April 7, 2015 ---
http://davegiles.blogspot.com/2015/04/question-from-reader.html

"I’ve a simple but not explicitly answered question within the text books on stationary series. I’m estimating a model with separate single equations (I don’t take into account the interactions among them ). I’ve only non-stationary series in some equations (type 1), only stationary in some (type 2), and a combination of the both in the others (type 3). For the first two cases I apply the usual procedures and for the last case the Pesaran (2011) test. I want to find the short term effects of some variables on the others. I’ve two questions: 
1) If the Pesaran test turns out inconclusive or rejects cointegration, what’s the next step ? Differencing  all the series and applying an OLS? Or differencing only the non-stationary ones? Or another method?
2) As I mentioned I’m looking for the short-run effects. In the type 2 equations, I guess running an OLS in levels gives the long-run effects. Therefore I run an OLS in differences. Some claim that differencing an already stationary series causes problems. I’m confused. What do you think?"
Let's start out by making sure what Ozan means by "the usual procedures" for his "Type 1" and "Type 2" equations.

I'm presuming he means:

Type 1
: All of the series are I(1). Then:

(i) If the variables are not cointegrated, estimate a model using the first-differences of the data (or, perhaps, the log-differences of the data), using OLS or IV.

(ii) If the variables are cointegrated:

(a) Estimate an error-correction model to determine the short-run effects.

(b) Estimate a model in the levels (or, perhaps, log-levels) of the variables to determine the long-run cointegrating relationship between them

Type 2
: All of the series are I(0). Then you can:

(i) Model the variables in the levels of the data (or, perhaps, the log-levels) of the data, using OLS or IV estimation.

(ii) Estimate the model using the first-differences (or, perhaps, the log-differences) of the variables. The transformed variables won't be I(0), but they will still be stationary. There is nothing wrong with this. However, one possible down-side is that you may have "over-differenced" the data, and this may show up in the form of an error term that follows an MA process, rather than being serially independent. On this point, see the discussion below.

Now, what about the "Type 3" equations?

In this case, Ozan uses the ARDL/Bounds testing methodology, which I've discussed in some detail
here, and in earlier posts. Now, in response to his two questions:

(1) In this case, you could apply either of the two approaches that you mention. However, I'd lean towards the option of differencing all of the variables. The reason for saying this is that if the tests that you've used to test for stationarity / non-stationarity have led you to a wrong conclusion, differencing everything is a conservative, but relatively safe way to proceed. You don't to unwittingly fail to difference a variable that is I(1). The "costs" of doing so are substantial. On the other hand, unnecessarily differencing a variable that is actually I(0) incurs a relatively low "cost". (See the comments for Type 2 (ii), above.)

(2) See the discussion for Type 2 (ii) above. However, to get at the short-run effects (and avoid the over-differencing issue), I'd be more inclined to explore some simple dynamic models of the old-fashioned ARDL type - not the Pesaran type. (See
here.) That is, consider models of the form:

           yt = α + β0xt + β1xt-1 + β2xt-2 + ..... + βkxt-k + γ1yt-1 + γ2yt-2 + ..... + γpyt-p + εt  .

I'd start with a fairly general specification (with large values for k and p), and then simplify the model using AIC or SIC, to get a parsimonious dynamic model.

Then, for instance, if I were to end up with a model of the form:

         yt = α + β0xt + γ1yt-1 + ut ,

the short-run marginal effect between x and y would be β0; while the long-run effect would be given by [β0 / (1 - γ1)], etc.

Article by two former presidents of the American Accounting Association that accuses accountics scientists of being naive regarding the use of non-stationary data in (usually) multivariate linear model empirical studies.

"Some Methodological Deficiencies in Empirical Research Articles in Accounting." by Thomas R. Dyckman and Stephen A. Zeff , Accounting Horizons: September 2014, Vol. 28, No. 3, pp. 695-712 ---
http://aaajournals.org/doi/full/10.2308/acch-50818   (not free)

This paper uses a sample of the regression and behavioral papers published in The Accounting Review and the Journal of Accounting Research from September 2012 through May 2013. We argue first that the current research results reported in empirical regression papers fail adequately to justify the time period adopted for the study. Second, we maintain that the statistical analyses used in these papers as well as in the behavioral papers have produced flawed results. We further maintain that their tests of statistical significance are not appropriate and, more importantly, that these studies do not�and cannot�properly address the economic significance of the work. In other words, significance tests are not tests of the economic meaningfulness of the results. We suggest ways to avoid some but not all of these problems. We also argue that replication studies, which have been essentially abandoned by accounting researchers, can contribute to our search for truth, but few will be forthcoming unless the academic reward system is modified.

The free SSRN version of this paper is at
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2324266

This Dyckman and Zeff paper is indirectly related to the following technical econometrics research:
"The Econometrics of Temporal Aggregation - IV - Cointegration," by David Giles, Econometrics Blog, September 13, 2014 ---
http://davegiles.blogspot.com/2014/09/the-econometrics-of-temporal.html 


"Statistical Inference: The Big Picture," by Robert E. Kass, Statistical Science 2011, Vol. 26, No. 1, 1–9 DOI: 10.1214/10-STS337 © Institute of Mathematical Statistics ---
http://www.stat.cmu.edu/~kass/papers/bigpic.pdf

Abstract.
Statistics has moved beyond the frequentist-Bayesian controversies of the past. Where does this leave our ability to interpret results? I suggest that a philosophy compatible with statistical practice, labeled here statistical pragmatism , serves as a foundation for inference. Statistical pragmatism is inclusive and emphasizes the assumptions that connect statistical models with observed data. I argue that introductory courses often mischaracterize the process of statistical inference and I propose an alternative “big picture” depiction.

Common Accountics Science and Econometric Science Statistical Mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm

Eight Econometrics Multiple-Choice Quiz Sets from David Giles
You might have to go to his site to get the quizzes to work.
Note that there are multiple questions for each quiz set.
Click on the arrow button to go to a subsequent question.

Would You Like Some Hot Potatoes?
http://davegiles.blogspot.com/2014/10/would-you-like-some-hot-potatoes.html

 
O.K., I know - that was a really cheap way of getting your attention.

 
However, it worked, and this post really is about Hot Potatoes - not the edible variety, but some teaching apps. from "Half-Baked Software" here at the University of Victoria.

 
To quote: 
"The Hot Potatoes suite includes six applications, enabling you to create interactive multiple-choice, short-answer, jumbled-sentence, crossword, matching/ordering and gap-fill exercises for the World Wide Web. Hot Potatoes is freeware, and you may use it for any purpose or project you like."
I've included some Hot Potatoes multiple choice exercises on the web pages for several of my courses for some years now. Recently, some of the students in my introductory graduate econometrics course mentioned that these exercises were quite helpful. So, I thought I'd share the Hot Potatoes apps. for that course with readers of this blog.

 
There are eight multiple-choice exercise sets in total, and you can run  them from here:

 
Quiz 1 ; Quiz 2 ; Quiz 3 ; Quiz 4; Quiz 5 ; Quiz 6Quiz  7 ; Quiz 8 .

 
I've also put the HTML and associated PDF files on the code page for this blog. If you're going to download them and use them on your own computer or website, just make sure that the PDF files are located in the same folder (directory) as the HTML files.
 
I plan to extend and update these Hot Potatoes exercises in the near future, but hopefully some readers will find them useful in the meantime.
 
From my "Recently Read" list:

 


Recall that Bill Sharpe of CAPM fame and controversy is a Nobel Laureate ---
http://en.wikipedia.org/wiki/William_Forsyth_Sharpe

"Don’t Over-Rely on Historical Data to Forecast Future Returns," by Charles Rotblut and William Sharpe, AAII Journal, October 2014 ---
http://www.aaii.com/journal/article/dont-over-rely-on-historical-data-to-forecast-future-returns?adv=yes

Jensen Comment
The same applies to not over-relying on historical data in valuation. My favorite case study that I used for this in teaching is the following:
Questrom vs. Federated Department Stores, Inc.:  A Question of Equity Value," by University of Alabama faculty members by Gary Taylor, William Sampson, and Benton Gup, May 2001 edition of Issues in Accounting Education ---
http://www.trinity.edu/rjensen/roi.htm

Jensen Comment
I want to especially thank David Stout, Editor of the May 2001 edition of Issues in Accounting Education.  There has been something special in all the editions edited by David, but the May edition is very special to me.  All the articles in that edition are helpful, but I want to call attention to three articles that I will use intently in my graduate Accounting Theory course.

Bob Jensen's threads on accounting theory ---
http://www.trinity.edu/rjensen/Theory01.htm


"Proof of a Result About the "Adjusted" Coefficient of Determination," by David Giles, Econometrics Blog, April 16, 2014 ---
http://davegiles.blogspot.com/2014/04/proof-of-result-about-adjusted.html

. . .

Let's take a look at the proof.

The model we're going to look at is the standard, k-regressor, linear multiple regression model:

 
                                   y = Xβ + ε    .                                                                                     (1)

 
We have n observations in our sample.

 
The result that follows is purely algebraic, and not statistical, so in actual fact I don't have to assume anything in particular about the errors in the model, and the regressors can be random. So that the definition of the coefficient of determination is unique, I will assume that the model includes an intercept term.

 
The adjusted coefficient of determination when model (1) is estimated by OLS is

 
                                 RA2 = 1 - [e'e / (n - k)] / [(y*'y*) / (n - 1)] ,                                         (2)

 
where e is the OLS residual vector, and y* is the y vector, but with each element expressed as a deviation from the sample mean of the y data.

 
Now consider J independent exact linear restrictions on the elements of β, namely Rβ = r, where R is a known non-random (J x k) matrix of rank J; and r is a known non-random (J x 1) vector. The F-statistic that we would use to test the validity of these restrictions can be written as:

 
                               F = [(eR'eR - e'e) / J] / [e'e / (n - k)] ,                                                   (3)

 
where eR is the residual vector when the restrictions on β are imposed, and the model is estimated by RLS.

 
In the latter case, the adjusted coefficient of determination is

 
                              RAR 1 - [eR'eR / (n - k + J)] / [(y*'y*) / (n - 1)] .                                 (4)

 
From equation (3),  F ≥ 1 if and only if 

 
                           (n - k) eR'eR ≥ (n - k + J) e'e .                                                                   (5)

 
From (2) and (4), RA2≥ RAR2 if and only if 

 
                         (n - k) eR'eR  ≥ (n - k + J) e'e.

 
But this is just the condition in (5).

 
So, we have the following result:

 
Imposing a set of exact linear restrictions on the coefficients of a linear regression model will decrease (increase) the adjusted coefficient of determination if the F-statistic for testing the validity of those restrictions is greater (less) than one in value. If this statistic is exactly equal to one, the adjusted coefficient of determination will be unchanged.

 
Notice that the result quoted at the beginning of this post is a special case of this result, where the restrictions are all "zero" restrictions. Recalling that the square of a t statistic with v degrees of freedom is just an F statistic with 1 and v degrees of freedom, the other principal result given in the earlier post is also obviously a special case of this, with just one zero restriction:


 
Adding a regressor will increase (decrease) RA2 depending on whether the absolute value of the t-statistic associated with that regressor is greater (less) than one in value. RA2 is unchanged if that absolute t-statistic is exactly equal to one.

Jensen Comment
My question is how robust these results are to the order in which regressors are added or deleted from the model. The model is not very robust in there are ordering effects. My experience years ago was that ordering effects are a problem.

 


David Giles Econometrics Beat Blog ---
http://davegiles.blogspot.com/

Strategies to Avoid Data Collection Drudgery and Responsibilities for Errors in the Data

Obsession With R-Squared

Drawing Inferences From Very Large Data-Sets

The Insignificance of Testing the Null

Zero Testing for Beta Error

Scientific Irreproducibility

Can You Really Test for Multicollinearity?  

Models That aren't Robust

Simpson's Paradox and Cross-Validation

Reverse Regression

David Giles' Top Five Econometrics Blog Postings for 2013

David Giles Blog

A Cautionary Bedtime Story

Gasp! How could an accountics scientist question such things? This is sacrilege!

A Scrapbook on What's Wrong with the Past, Present and Future of Accountics Science

574 Shields Against Validity Challenges in Plato's Cave ---
http://www.trinity.edu/rjensen/TheoryTAR.htm

Real Science versus Pseudo Science ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm#Pseudo-Science

How Accountics Scientists Should Change: 
"Frankly, Scarlett, after I get a hit for my resume in The Accounting Review I just don't give a damn"
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
One more mission in what's left of my life will be to try to change this
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm

"How Non-Scientific Granulation Can Improve Scientific Accountics"
http://www.cs.trinity.edu/~rjensen/temp/AccounticsGranulationCurrentDraft.pdf

Gaming for Tenure as an Accounting Professor ---
http://www.trinity.edu/rjensen/TheoryTenure.htm
(with a reply about tenure publication point systems from Linda Kidwell)

 

Strategies to Avoid Data Collection Drudgery and Responsibilities for Errors in the Data

In 2013 I scanned all six issues of The Accounting Review (TAR) published in 2013 to detect what public databases were (usually at relatively heavy fees for a system of databases) in the 72 articles published January-November, 2013 in TAR. The outcomes were as follows:

42 35.3%   Miscellaneous public databases used infrequently
33 27.7%   Compustat --- http://en.wikipedia.org/wiki/Compustat
21 17.6%   CRSP --- http://en.wikipedia.org/wiki/Center_for_Research_in_Security_Prices
17 14.3%   Datastream --- http://en.wikipedia.org/wiki/Thomson_Financial
6 5.0%   Audit Analytics --- http://www.auditanalytics.com/
119 100.0%   Total Purchased Public Databases
10   Non-public Databases (usually experiments) and mathematical analysis studies with no data
    Note that there are subsets of databases within database like Compustat. CRSP, and Datastream

Many of these 72 articles used more than one public database, and when the Compustat and CRSP joint database was used I counted one for the Compustat Database and one for the CRSP Database. Most of the non-public databases are behavioral experiments using students as surrogates for real-world decision makers.

My opinion is that 2013 is a typical year where over 92% of the articles published in TAR used puchased public databases.  The good news is that most of these public databases are enormous, thereby allowing for huge samples for which statistical inference is probably superfluous. For very large samples even miniscule differences are significant for hypothesis testing making statistical inference testing superfluous:

My theory is that accountics science gained dominance in accounting research, especially in North American accounting Ph.D. programs, because it abdicated responsibility:

1.     Most accountics scientists buy data, thereby avoiding the greater cost and drudgery of collecting data.

 

2.     By relying so heavily on purchased data, accountics scientists abdicate responsibility for errors in the data.

 

3.     Since adding missing variable data to the public database is generally not at all practical in purchased databases, accountics scientists have an excuse for not collecting missing variable data.

 

4.   Software packages for modeling and testing data abound. Accountics researchers need only feed purchased data into the hopper of statistical and mathematical analysis programs. It still takes a lot of knowledge to formulate hypotheses and to invent and understand complex models. But the really hard work of collecting data and error checking is avoided by purchasing data.

David Johnstone posted the following message on the AECM Listserv on November 19, 2013:

An interesting aspect of all this is that there is a widespread a priori or learned belief in empirical research that all and only what you have to do to get meaningful results is to get data and run statistics packages, and that the more advanced the stats the better. Its then just a matter of turning the handle. Admittedly it takes a lot of effort to get very proficient at this kind of work, but the presumption that it will naturally lead to reliable knowledge is an act of faith, like a religious tenet. What needs to be taken into account is that the human systems (markets, accounting reporting, asset pricing etc.) are madly complicated and likely changing structurally continuously. So even with the best intents and best methods, there is no guarantee of reliable or lasting findings a priori, no matter what “rigor” has gone in.

Part and parcel of the presumption that empirical research methods are automatically “it” is the even stronger position that no other type of work is research. I come across this a lot. I just had a 4th year Hons student do his thesis, he was particularly involved in the superannuation/pension fund industry, and he did a lot of good practical stuff, thinking about risks that different fund allocations present, actuarial life expectancies etc. The two young guys (late 20s) grading this thesis, both excellent thinkers and not zealots about anything, both commented to me that the thesis was weird and was not really a thesis like they would have assumed necessary (electronic data bases with regressions etc.). They were still generous in their grading, and the student did well, and it was only their obvious astonishment that there is any kind of worthy work other than the formulaic-empirical that astonished me. This represents a real narrowing of mind in academe, almost like a tendency to dark age, and cannot be good for us long term. In Australia the new push is for research “impact”, which seems to include industry relevance, so that presents a hope for a cultural widening.  

I have been doing some work with a lawyer-PhD student on valuation in law cases/principles, and this has caused similar raised eyebrows and genuine intrigue with young colleagues – they just have never heard of such stuff, and only read the journals/specific papers that do what they do. I can sense their interest, and almost envy of such freedom, as they are all worrying about how to compete and make a long term career as an academic in the new academic world.

 

This could also happen in accountics science, but we'll probably never know! ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm

"Statistical Flaw Punctuates Brain Research in Elite Journals," by Gary Stix, Scientific American, March 27, 2014 ---
http://blogs.scientificamerican.com/talking-back/2014/03/27/statistical-flaw-punctuates-brain-research-in-elite-journals/

Neuroscientists need a statistics refresher.

That is the message of a new analysis in Nature Neuroscience that shows that more than half of 314 articles on neuroscience in elite journals   during an 18-month period failed to take adequate measures to ensure that statistically significant study results were not, in fact, erroneous. Consequently, at  least some of the results from papers in journals like Nature, Science, Nature Neuroscience and Cell were likely to be false positives, even after going through the arduous peer-review gauntlet.

The problem of false positives appears to be rooted in the growing sophistication of both the tools and observations made by neuroscientists.  The increasing complexity poses a challenge to one of the fundamental assumptions made in statistical testing, that each observation, perhaps of an electrical signal from a particular neuron, has nothing to do with a subsequent observation, such as another signal from that same neuron.

In fact, though, it is common in neuroscience experiments—and in studies in other areas of  biology—to produce readings that are not independent of one another. Signals from the same neuron are often more similar than signals from different neurons, and thus the data points are said by statisticians to be clustered, or “nested.” To accommodate the similarity among signals, the authors from VU University Medical Center and other Dutch institutions suggest that a technique called multilevel analysis is needed to take the clustering of data points into account.

No adequate correction was made in any of the 53 percent of the 314 papers that contained clustered data when surveyed in 2012 and the first half of 2013. “We didn’t see any of the studies use the correct multi-level analysis,” says Sophie van der Sluis, the lead researcher. Seven percent of the studies did take steps to account for clustering, but these methods were much less sensitive than multi-level analysis in detecting actual biological effects.  The researchers note that some of the studies surveyed probably report false-positive results, although they couldn’t extract enough information to quantify precisely how many.  Failure to statistically correct for the clustering  in the data can increase the probability of false-positive findings to as high as 80 percent—a risk of no more than 5 percent is normally deemed acceptable.

Jonathan D. Victor, a professor of neuroscience at Weill Cornell Medical College had praise for the study, saying it “raises consciousness about the pitfalls specific to a nested design and then counsels you as to how to create a good nested design given limited resources.”

Emery N. Brown, a professor of computational neuroscience in the department of brain and cognitive sciences at the MIT-Harvard Division of Health Sciences and Technology, points to a dire need to bolster the level of statistical sophistication brought to bear in neuroscience studies. “There’s a fundamental flaw in the system and the fundamental flaw is basically that neuroscientists don’t know enough statistics to do the right things and there’s not enough statisticians working in neuroscience to help that.”

The issue of reproducibility of research results has preoccupied the editors of many top journals in recent years. The Nature journals have instituted a checklist to help authors on reporting on the methods used in their research, a list that inquires about whether the statistical objectives for a particular study were met. (Scientific American is part of the Nature Publishing Group.) The one clear message from studies like that of van der Sluis and others is that the statistician will take on an increasingly pivotal role as the field moves ahead in deciphering ever more dense networks of neural signaling.

Jensen Comment
Accountics science differs neuroscience in that reproducibility of research results does not preoccupy research journal editors ---
http://www.trinity.edu/rjensen/TheoryTAR.htm

Obsession With R-Squared

"Good Old R-Squared," by David Giles, Econometrics Beat:  Dave Giles’ Blog, University of Victoria, June 24, 2013 ---
http://davegiles.blogspot.com/2013/05/good-old-r-squared.html 

My students are often horrified when I tell them, truthfully, that one of the last pieces of information that I look at when evaluating the results of an OLS regression, is the coefficient of determination (R2), or its "adjusted" counterpart. Fortunately, it doesn't take long to change their perspective!

After all, we all know that with time-series data, it's really easy to get a "high" R2 value, because of the trend components in the data. With cross-section data, really low R2 values are really common. For most of us, the signs, magnitudes, and significance of the estimated parameters are of primary interest. Then we worry about testing the assumptions underlying our analysis. R2 is at the bottom of the list of priorities.

Continued in article

Also see http://davegiles.blogspot.com/2013/07/the-adjusted-r-squared-again.html

"R2 and Idiosyncratic Risk Are Not Interchangeable." by Bin Li, The Accounting Review, November 2014 ---
http://aaajournals.org/doi/full/10.2308/accr-50826

 

A growing literature exists in both finance and accounting on the association between firm-specific variation in stock returns and several aspects of the firm's information or governance environment. Appendix A, Part 1 lists 21 published papers in top-tier finance and accounting journals and the Social Sciences Research Network (SSRN) reports at least 75 working papers. These studies rely on one of two proxies for firm-specific return variation as the dependent variable: 

 

Continued in article

 


 

Drawing Inferences From Very Large Data-Sets

David Johnstone wrote the following:

Indeed if you hold H0 the same and keep changing the model, you will eventually (generally soon) get a significant result, allowing “rejection of H0 at 5%”, not because H0 is necessarily false but because you have built upon a false model (of which there are zillions, obviously).

"Drawing Inferences From Very Large Data-Sets,"   by David Giles, Econometrics Beat:  Dave Giles’ Blog, University of Victoria, April 26, 2013 ---
http://davegiles.blogspot.ca/2011/04/drawing-inferences-from-very-large-data.html

. . .

Granger (1998; 2003has reminded us that if the sample size is sufficiently large, then it's virtually impossible not to reject almost any hypothesis. So, if the sample is very large and the p-values associated with the estimated coefficients in a regression model are of the order of, say, 0.10 or even 0.05, then this really bad news. Much, much, smaller p-values are needed before we get all excited about 'statistically significant' results when the sample size is in the thousands, or even bigger. So, the p-values reported above are mostly pretty marginal, as far as significance is concerned. When you work out the p-values for the other 6 models I mentioned, they range from  to 0.005 to 0.460. I've been generous in the models I selected.

Here's another set of  results taken from a second, really nice, paper by
Ciecieriski et al. (2011) in the same issue of Health Economics:

Continued in article

Jensen Comment
My research suggest that over 90% of the recent papers published in TAR use purchased databases that provide enormous sample sizes in those papers. Their accountics science authors keep reporting those meaningless levels of statistical significance.

What is even worse is when meaningless statistical significance tests are used to support decisions.

"Statistical Significance - Again " by David Giles, Econometrics Beat:  Dave Giles’ Blog, University of Victoria, December 28, 2013 ---
http://davegiles.blogspot.com/2013/12/statistical-significance-again.html

Statistical Significance - Again

 
With all of this emphasis on "Big Data", I was pleased to see this post on the Big Data Econometrics blog, today.

 
When you have a sample that runs to the thousands (billions?), the conventional significance levels of 10%, 5%, 1% are completely inappropriate. You need to be thinking in terms of tiny significance levels.

 
I discussed this in some detail back in April of 2011, in a post titled, "Drawing Inferences From Very Large Data-Sets". If you're of those (many) applied researchers who uses large cross-sections of data, and then sprinkles the results tables with asterisks to signal "significance" at the 5%, 10% levels, etc., then I urge you read that earlier post.

 
It's sad to encounter so many papers and seminar presentations in which the results, in reality, are totally insignificant!

 

How Standard Error Costs Us Jobs, Justice, and Lives, by Stephen T. Ziliak and Deirdre N. McCloskey (Ann Arbor:  University of Michigan Press, ISBN-13: 978-472-05007-9, 2007)
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm

Page 206
Like scientists today in medical and economic and other sizeless sciences, Pearson mistook a large sample size for the definite, substantive significance---evidence s Hayek put it, of "wholes." But it was as Hayek said "just an illusion." Pearson's columns of sparkling asterisks, though quantitative in appearance and as appealing a is the simple truth of the sky, signified nothing.

 

pp. 250-251
The textbooks are wrong. The teaching is wrong. The seminar you just attended is wrong. The most prestigious journal in your scientific field is wrong.

You are searching, we know, for ways to avoid being wrong. Science, as Jeffreys said, is mainly a series of approximations to discovering the sources of error. Science is a systematic way of reducing wrongs or can be. Perhaps you feel frustrated by the random epistemology of the mainstream and don't know what to do. Perhaps you've been sedated by significance and lulled into silence. Perhaps you sense that the power of a Roghamsted test against a plausible Dublin alternative is statistically speaking low but you feel oppressed by the instrumental variable one should dare not to wield. Perhaps you feel frazzled by what Morris Altman (2004) called the "social psychology rhetoric of fear," the deeply embedded path dependency that keeps the abuse of significance in circulation. You want to come out of it. But perhaps you are cowed by the prestige of Fisherian dogma. Or, worse thought, perhaps you are cynically willing to be corrupted if it will keep a nice job

 

Bob Jensen's threads on the often way analysts, particularly accountics scientists, often cheer for statistical significance of large sample outcomes that praise statistical significance of insignificant results such as R2 values of .0001 ---
The Cult of Statistical Significance: How Standard Error Costs Us Jobs, Justice, and Lives ---
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm

 


 

The Insignificance of Testing the Null

"Statistics: reasoning on uncertainty, and the insignificance of testing null," by Esa Läärä
Ann. Zool. Fennici 46: 138–157
ISSN 0003-455X (print), ISSN 1797-2450 (online)
Helsinki 30 April 2009 © Finnish Zoological and Botanical Publishing Board 200
http://www.sekj.org/PDF/anz46-free/anz46-138.pdf

The practice of statistical analysis and inference in ecology is critically reviewed. The dominant doctrine of null hypothesis signi fi cance testing (NHST) continues to be applied ritualistically and mindlessly. This dogma is based on superficial understanding of elementary notions of frequentist statistics in the 1930s, and is widely disseminated by influential textbooks targeted at biologists. It is characterized by silly null hypotheses and mechanical dichotomous division of results being “signi fi cant” ( P < 0.05) or not. Simple examples are given to demonstrate how distant the prevalent NHST malpractice is from the current mainstream practice of professional statisticians. Masses of trivial and meaningless “results” are being reported, which are not providing adequate quantitative information of scientific interest. The NHST dogma also retards progress in the understanding of ecological systems and the effects of management programmes, which may at worst contribute to damaging decisions in conservation biology. In the beginning of this millennium, critical discussion and debate on the problems and shortcomings of NHST has intensified in ecological journals. Alternative approaches, like basic point and interval estimation of effect sizes, likelihood-based and information theoretic methods, and the Bayesian inferential paradigm, have started to receive attention. Much is still to be done in efforts to improve statistical thinking and reasoning of ecologists and in training them to utilize appropriately the expanded statistical toolbox. Ecologists should finally abandon the false doctrines and textbooks of their previous statistical gurus. Instead they should more carefully learn what leading statisticians write and say, collaborate with statisticians in teaching, research, and editorial work in journals.

 

Jensen Comment
And to think Alpha (Type 1) error is the easy part. Does anybody ever test for the more important Beta (Type 2) error? I think some engineers test for Type 2 error with Operating Characteristic (OC) curves, but these are generally applied where controlled experiments are super controlled such as in quality control testing.

Jensen Comment
Beta Error --- http://en.wikipedia.org/wiki/Beta_error#Type_II_error

I've never seen an accountics science study published anywhere that tested for Beta Error.

 


 

Scientific Irreproducibility (Frequentists Versus Bayesians)

"Weak statistical standards implicated in scientific irreproducibility: One-quarter of studies that meet commonly used statistical cutoff may be false." by Erika Check Hayden, Nature, November 11, 2013 ---
http://www.nature.com/news/weak-statistical-standards-implicated-in-scientific-irreproducibility-1.14131

 The plague of non-reproducibility in science may be mostly due to scientists’ use of weak statistical tests, as shown by an innovative method developed by statistician Valen Johnson, at Texas A&M University in College Station.

Johnson compared the strength of two types of tests: frequentist tests, which measure how unlikely a finding is to occur by chance, and Bayesian tests, which measure the likelihood that a particular hypothesis is correct given data collected in the study. The strength of the results given by these two types of tests had not been compared before, because they ask slightly different types of questions.

So Johnson developed a method that makes the results given by the tests — the P value in the frequentist paradigm, and the Bayes factor in the Bayesian paradigm — directly comparable. Unlike frequentist tests, which use objective calculations to reject a null hypothesis, Bayesian tests require the tester to define an alternative hypothesis to be tested — a subjective process. But Johnson developed a 'uniformly most powerful' Bayesian test that defines the alternative hypothesis in a standard way, so that it “maximizes the probability that the Bayes factor in favor of the alternate hypothesis exceeds a specified threshold,” he writes in his paper. This threshold can be chosen so that Bayesian tests and frequentist tests will both reject the null hypothesis for the same test results.

Johnson then used these uniformly most powerful tests to compare P values to Bayes factors. When he did so, he found that a P value of 0.05 or less — commonly considered evidence in support of a hypothesis in fields such as social science, in which non-reproducibility has become a serious issue corresponds to Bayes factors of between 3 and 5, which are considered weak evidence to support a finding.

False positives

Indeed, as many as 17–25% of such findings are probably false, Johnson calculates1. He advocates for scientists to use more stringent P values of 0.005 or less to support their findings, and thinks that the use of the 0.05 standard might account for most of the problem of non-reproducibility in science — even more than other issues, such as biases and scientific misconduct.

“Very few studies that fail to replicate are based on P values of 0.005 or smaller,” Johnson says.

Some other mathematicians said that though there have been many calls for researchers to use more stringent tests2, the new paper makes an important contribution by laying bare exactly how lax the 0.05 standard is.

“It shows once more that standards of evidence that are in common use throughout the empirical sciences are dangerously lenient,” says mathematical psychologist Eric-Jan Wagenmakers of the University of Amsterdam. “Previous arguments centered on ‘P-hacking’, that is, abusing standard statistical procedures to obtain the desired results. The Johnson paper shows that there is something wrong with the P value itself.”

Other researchers, though, said it would be difficult to change the mindset of scientists who have become wedded to the 0.05 cutoff. One implication of the work, for instance, is that studies will have to include more subjects to reach these more stringent cutoffs, which will require more time and money.

“The family of Bayesian methods has been well developed over many decades now, but somehow we are stuck to using frequentist approaches,” says physician John Ioannidis of Stanford University in California, who studies the causes of non-reproducibility. “I hope this paper has better luck in changing the world.”

Accountics Scientists are More Interested in Their Tractors Than Their Harvests ---
http://www.trinity.edu/rjensen/TheoryTAR.htm

 


 

Can You Really Test for Multicollinearity?

Unlike real scientists, accountics scientists seldom replicate published accountics science research by the exacting standards real science ---
http://www.trinity.edu/rjensen/TheoryTAR.htm#Replication

Multicollinearity --- http://en.wikipedia.org/wiki/Multicollinearity

"Can You Actually TEST for Multicollinearity?" by David Giles, Econometrics Beat:  Dave Giles’ Blog, University of Victoria, June 24, 2013 ---
http://davegiles.blogspot.com/2013/06/can-you-actually-test-for.html

. . .

Now, let's return to the "problem" of multicollinearity.

 
What do we mean by this term, anyway? This turns out to be the key question!

 
Multicollinearity is a phenomenon associated with our particular sample of data when we're trying to estimate a regression model. Essentially, it's a situation where there is insufficient information in the sample of data to enable us to enable us to draw "reliable" inferences about the individual parameters of the underlying (population) model.


I'll be elaborating more on the "informational content" aspect of this phenomenon in a follow-up post. Yes, there are various sample measures that we can compute and report, to help us gauge how severe this data "problem" may be. But they're not statistical tests, in any sense of the word

 

Because multicollinearity is a characteristic of the sample, and not a characteristic of the population, you should immediately be suspicious when someone starts talking about "testing for multicollinearity". Right?


Apparently not everyone gets it!


There's an old paper by Farrar and Glauber (1967) which, on the face of it might seem to take a different stance. In fact, if you were around when this paper was published (or if you've bothered to actually read it carefully), you'll know that this paper makes two contributions. First, it provides a very sensible discussion of what multicollinearity is all about. Second, the authors take some well known results from the statistics literature (notably, by Wishart, 1928; Wilks, 1932; and Bartlett, 1950) and use them to give "tests" of the hypothesis that the regressor matrix, X, is orthogonal.


How can this be? Well, there's a simple explanation if you read the Farrar and Glauber paper carefully, and note what assumptions are made when they "borrow" the old statistics results. Specifically, there's an explicit (and necessary) assumption that in the population the X matrix is random, and that it follows a multivariate normal distribution.


This assumption is, of course totally at odds with what is usually assumed in the linear regression model! The "tests" that Farrar and Glauber gave us aren't really tests of multicollinearity in the sample. Unfortunately, this point wasn't fully appreciated by everyone.


There are some sound suggestions in this paper, including looking at the sample multiple correlations between each regressor, and all of the other regressors. These, and other sample measures such as variance inflation factors, are useful from a diagnostic viewpoint, but they don't constitute tests of "zero multicollinearity".


So, why am I even mentioning the Farrar and Glauber paper now?


Well, I was intrigued to come across some STATA code (Shehata, 2012) that allows one to implement the Farrar and Glauber "tests". I'm not sure that this is really very helpful. Indeed, this seems to me to be a great example of applying someone's results without understanding (bothering to read?) the assumptions on which they're based!


Be careful out there - and be highly suspicious of strangers bearing gifts!


 
References

 
Bartlett, M. S., 1950. Tests of significance in factor analysis. British Journal of Psychology, Statistical Section, 3, 77-85.

 
Farrar, D. E. and R. R. Glauber, 1967. Multicollinearity in regression analysis: The problem revisited.  Review of Economics and Statistics, 49, 92-107.

 
Shehata, E. A. E., 2012. FGTEST: Stata module to compute Farrar-Glauber Multicollinearity Chi2, F, t tests.

Wilks, S. S., 1932. Certain generalizations in the analysis of variance. Biometrika, 24, 477-494.

Wishart, J., 1928. The generalized product moment distribution in samples from a multivariate normal population. Biometrika, 20A, 32-52.

Multicollinearity --- http://en.wikipedia.org/wiki/Multicollinearity

Singular Matrix --- http://en.wikipedia.org/wiki/Invertible_matrix#singular

"Least Squares, Perfect Multicollinearity, & Estimable Function," by David Giles, Econometrics Blog, September 19, 2014 ---
http://davegiles.blogspot.com/2014/09/least-squares-perfect-multicollinearity.html

. . .

The best way to think about multicollinearity in a regression setting is that it reflects a shortage of information. Sometimes additional information can be obtained via additional data. Sometimes we can "inject" additional information into the problem by means of exact or stochastic restrictions on the parameters. (The latter is how the problem is avoided in a Bayesian setting.) Sometimes, we can't do either of these things.
 
Here, I'll focus on the most extreme case possible - one where we have "perfect multicollinearity". That's the case where X has less than full rank, so that (X'X) doesn't have a regular inverse. It's the situation outlined above.
 
For the least squares estimator, b, to be defined, we need to be able to solve the normal equation, (1). What we're interested in, of course, is a solution for every element of the b vector. This is simply not achievable in the case of perfect multicollinearity. There's not enough information in the sample for us to be able to uniquely identify and estimate every individual regression coefficient. However, we should be able to identify and estimate certain linear combinations of those coefficients. These combinations are usually referred to as "estimable functions" of the parameters.

Continued in article

It's relatively uncommon for accountics scientists to criticize each others' published works. A notable exception is as follows:
"Selection Models in Accounting Research," by Clive S. Lennox, Jere R. Francis, and Zitian Wang,  The Accounting Review, March 2012, Vol. 87, No. 2, pp. 589-616.

This study explains the challenges associated with the Heckman (1979) procedure to control for selection bias, assesses the quality of its application in accounting research, and offers guidance for better implementation of selection models. A survey of 75 recent accounting articles in leading journals reveals that many researchers implement the technique in a mechanical way with relatively little appreciation of important econometric issues and problems surrounding its use. Using empirical examples motivated by prior research, we illustrate that selection models are fragile and can yield quite literally any possible outcome in response to fairly minor changes in model specification. We conclude with guidance on how researchers can better implement selection models that will provide more convincing evidence on potential selection bias, including the need to justify model specifications and careful sensitivity analyses with respect to robustness and multicollinearity.

. . .

CONCLUSIONS

Our review of the accounting literature indicates that some studies have implemented the selection model in a questionable manner. Accounting researchers often impose ad hoc exclusion restrictions or no exclusion restrictions whatsoever. Using empirical examples and a replication of a published study, we demonstrate that such practices can yield results that are too fragile to be considered reliable. In our empirical examples, a researcher could obtain quite literally any outcome by making relatively minor and apparently innocuous changes to the set of exclusionary variables, including choosing a null set. One set of exclusion restrictions would lead the researcher to conclude that selection bias is a significant problem, while an alternative set involving rather minor changes would give the opposite conclusion. Thus, claims about the existence and direction of selection bias can be sensitive to the researcher's set of exclusion restrictions.

Our examples also illustrate that the selection model is vulnerable to high levels of multicollinearity, which can exacerbate the bias that arises when a model is misspecified (Thursby 1988). Moreover, the potential for misspecification is high in the selection model because inferences about the existence and direction of selection bias depend entirely on the researcher's assumptions about the appropriate functional form and exclusion restrictions. In addition, high multicollinearity means that the statistical insignificance of the inverse Mills' ratio is not a reliable guide as to the absence of selection bias. Even when the inverse Mills' ratio is statistically insignificant, inferences from the selection model can be different from those obtained without the inverse Mills' ratio. In this situation, the selection model indicates that it is legitimate to omit the inverse Mills' ratio, and yet, omitting the inverse Mills' ratio gives different inferences for the treatment variable because multicollinearity is then much lower.

In short, researchers are faced with the following trade-off. On the one hand, selection models can be fragile and suffer from multicollinearity problems, which hinder their reliability. On the other hand, the selection model potentially provides more reliable inferences by controlling for endogeneity bias if the researcher can find good exclusion restrictions, and if the models are found to be robust to minor specification changes. The importance of these advantages and disadvantages depends on the specific empirical setting, so it would be inappropriate for us to make a general statement about when the selection model should be used. Instead, researchers need to critically appraise the quality of their exclusion restrictions and assess whether there are problems of fragility and multicollinearity in their specific empirical setting that might limit the effectiveness of selection models relative to OLS.

Another way to control for unobservable factors that are correlated with the endogenous regressor (D) is to use panel data. Though it may be true that many unobservable factors impact the choice of D, as long as those unobservable characteristics remain constant during the period of study, they can be controlled for using a fixed effects research design. In this case, panel data tests that control for unobserved differences between the treatment group (D = 1) and the control group (D = 0) will eliminate the potential bias caused by endogeneity as long as the unobserved source of the endogeneity is time-invariant (e.g., Baltagi 1995; Meyer 1995; Bertrand et al. 2004). The advantages of such a difference-in-differences research design are well recognized by accounting researchers (e.g., Altamuro et al. 2005; Desai et al. 2006; Hail and Leuz 2009; Hanlon et al. 2008). As a caveat, however, we note that the time-invariance of unobservables is a strong assumption that cannot be empirically validated. Moreover, the standard errors in such panel data tests need to be corrected for serial correlation because otherwise there is a danger of over-rejecting the null hypothesis that D has no effect on Y (Bertrand et al. 2004).10

Finally, we note that there is a recent trend in the accounting literature to use samples that are matched based on their propensity scores (e.g., Armstrong et al. 2010; Lawrence et al. 2011). An advantage of propensity score matching (PSM) is that there is no MILLS variable and so the researcher is not required to find valid Z variables (Heckman et al. 1997; Heckman and Navarro-Lozano 2004). However, such matching has two important limitations. First, selection is assumed to occur only on observable characteristics. That is, the error term in the first stage model is correlated with the independent variables in the second stage (i.e., u is correlated with X and/or Z), but there is no selection on unobservables (i.e., u and υ are uncorrelated). In contrast, the purpose of the selection model is to control for endogeneity that arises from unobservables (i.e., the correlation between u and υ). Therefore, propensity score matching should not be viewed as a replacement for the selection model (Tucker 2010).

A second limitation arises if the treatment variable affects the company's matching attributes. For example, suppose that a company's choice of auditor affects its subsequent ability to raise external capital. This would mean that companies with higher quality auditors would grow faster. Suppose also that the company's characteristics at the time the auditor is first chosen cannot be observed. Instead, we match at some stacked calendar time where some companies have been using the same auditor for 20 years and others for not very long. Then, if we matched on company size, we would be throwing out the companies that have become large because they have benefited from high-quality audits. Such companies do not look like suitable “matches,” insofar as they are much larger than the companies in the control group that have low-quality auditors. In this situation, propensity matching could bias toward a non-result because the treatment variable (auditor choice) affects the company's matching attributes (e.g., its size). It is beyond the scope of this study to provide a more thorough assessment of the advantages and disadvantages of propensity score matching in accounting applications, so we leave this important issue to future research.

 


 

A second indicator is our journals. They have proliferated in number. But we struggle with an intertemporal sameness, with incremental as opposed to discontinuous attempts to move our thinking forward, and with referee intrusion and voyeurism. Value relevance is a currently fashionable approach to identifying statistical regularities in the financial market arena, just as a focus on readily observable components of compensation is a currently fashionable dependent variable in the compensation arena. Yet we know measurement error abounds, that other sources of informa- tion are both present and hardly unimportant, that compensation is broad-based and intertemporally managed, and that compen- sating wage differentials are part of the stew. Yet we continue on the comfortable path of sameness.
Joel Demski, AAA President's Message, Accounting Education News, Fall 2001
http://aaahq.org/pubs/AEN/2001/Fall2001.pdf 

Models That aren't Robust

Robust Statistics --- http://en.wikipedia.org/wiki/Robust_statistics

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normally distributed. Robust statistical methods have been developed for many common problems, such as estimating location, scale and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distributions. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations, for example, one and three; under this model, non-robust methods like a t-test work badly.

Continued in article


"R2 and Idiosyncratic Risk Are Not Interchangeable." by Bin Li, The Accounting Review, November 2014 ---
http://aaajournals.org/doi/full/10.2308/accr-50826

 

A growing literature exists in both finance and accounting on the association between firm-specific variation in stock returns and several aspects of the firm's information or governance environment. Appendix A, Part 1 lists 21 published papers in top-tier finance and accounting journals and the Social Sciences Research Network (SSRN) reports at least 75 working papers. These studies rely on one of two proxies for firm-specific return variation as the dependent variable: 

 

Continued in article

 


 

"Allegory of the Cave"

Those not familiar with Plato's Cave should take a look at
http://en.wikipedia.org/wiki/Plato%27s_Cave
 
The phrase is most often used to distinguish assumed (shadow) worlds that differ in usually important ways from the real world such as when economists assume steady-state conditions, equilibrium conditions, corporate utility functions, etc.
 
The Gaussian Copula function blamed for the collapse of the economy in 2007 is an example of a derivation in Plato's Cave that was made operational inappropriately by Wall Street Investment Banks:


"In Plato's Cave:  Mathematical models are a powerful way of predicting financial markets. But they are fallible" The Economist, January 24, 2009, pp. 10-14 ---
http://www.trinity.edu/rjensen/2008Bailout.htm#Bailout



 
Conceivably a subset of Wall Street analysts make up a subset of "alumni" in Plato's Cave. But they are joined by the many more quants in all disciplines who do analytics and empirical research in the realm of assumed worlds that differ from reality in possibly serious ways.

Game Theory Model Solutions Are Rarely Robust

Nash Equilibrium --- http://en.wikipedia.org/wiki/Nash_equilibrium

Question
Why do game theory model solutions like Nash Equilibrium fail so often in the real world?

"They Finally Tested The 'Prisoner's Dilemma' On Actual Prisoners — And The Results Were Not What You Would Expect," by Max Nissen, Business Insider, July 13, 2013 ---
http://www.businessinsider.com/prisoners-dilemma-in-real-life-2013-7

The "prisoner's dilemma" is a familiar concept to just about everyone who took Econ 101.

The basic version goes like this: Two criminals are arrested, but police can't convict either on the primary charge, so they plan to sentence them to a year in jail on a lesser charge. Each of the prisoners, who can't communicate with each other, are given the option of testifying against their partner. If they testify, and their partner remains silent, the partner gets three years and they go free. If they both testify, both get two. If both remain silent, they each get one.

In game theory, betraying your partner, or "defecting" is always the dominant strategy as it always has a slightly higher payoff in a simultaneous game. It's what's known as a "Nash Equilibrium," after Nobel Prize winning mathematician and "A Beautiful Mind" subject John Nash.

 In sequential games, where players know each other's previous behavior and have the opportunity to punish each other, defection is the dominant strategy as well. 

However, on an overall basis, the best outcome for both players is mutual cooperation.

Yet no one's ever actually run the experiment on real prisoners before, until two University of Hamburg economists tried it out in a recent study comparing the behavior of inmates and students. 

Surprisingly, for the classic version of the game, prisoners were far more cooperative  than expected.

Menusch Khadjavi and Andreas Lange put the famous game to the test for the first time ever, putting a group of prisoners in Lower Saxony's primary women's prison, as well as students, through both simultaneous and sequential versions of the game. 

The payoffs obviously weren't years off sentences, but euros for students, and the equivalent value in coffee or cigarettes for prisoners. 

They expected, building off of game theory and behavioral economic research that show humans are more cooperative than the purely rational model that economists traditionally use, that there would be a fair amount of first-mover cooperation, even in the simultaneous simulation where there's no way to react to the other player's decisions. 

And even in the sequential game, where you get a higher payoff for betraying a cooperative first mover, a fair amount will still reciprocate. 

As for the difference between student and prisoner behavior, you'd expect that a prison population might be more jaded and distrustful, and therefore more likely to defect. 

The results went exactly the other way for the simultaneous game, only 37% of students cooperate. Inmates cooperated 56% of the time.

On a pair basis, only 13% of student pairs managed to get the best mutual outcome and cooperate, whereas 30% of prisoners do. 

In the sequential game, far more students (63%) cooperate, so the mutual cooperation rate skyrockets to 39%. For prisoners, it remains about the same.

What's interesting is that the simultaneous game requires far more blind trust from both parties, and you don't have a chance to retaliate or make up for being betrayed later. Yet prisoners are still significantly more cooperative in that scenario. 

Obviously the payoffs aren't as serious as a year or three of your life, but the paper still demonstrates that prisoners aren't necessarily as calculating, self-interested, and un-trusting as you might expect, and as behavioral economists have argued for years, as mathematically interesting as Nash equilibrium might be, they don't line up with real behavior all that well. 

 

"Nobody understands “Prisoner’s dilemma”" July 23, 2013
http://beranger.org/2013/07/23/nobody-understands-prisoners-dilemma/

. . .

Now, the theory says they’d be better off by betraying — i.e. by confessing. And they invoke the Nash equilibrium to “prove” that they’d be better off this way.

The problem in real life is that:

OK, now let me say what I’d do if I were a prisoner to have been offered such a deal: I’ll keep being silent — they’d call this “cooperation”, but by elementary logic, this is obviously the best thing to do, especially when thinking of real jail terms: 20 years is horrendous, 5 years is painful, but 1 year is rather cheap, so I’d assume the other prisoner would think the same. Just common sense. Zero years would be ideal, but there is a risk, and the risk reads “5 years”. This is not altruism, but compared to 5 years, 1 year would be quite acceptable for a felon, wouldn’t you think so? Nothing about any remorse of possibly putting the other guy behind bars for 20 years — just selfish considerations are enough to choose this strategy! (Note that properly choosing the prison terms makes the conclusion easier to reach: 2 years are not as much different from 1 year as the 5 years are.)

They’ve now for the first time tried this dilemma in practice. The idiots have used two groups: students — the stake being a material reward –, and real inmates — where not the freedom was at stake, but merely some cigarettes or coffee.

In such a flawed test environment, 37% of the students did “cooperate”, versus 56% of the inmates. The “iterated” (sequential) version of the dilemma showed an increased cooperation, but only amongst the students (which, in my opinion, proves that they were totally dumb).

Now, I should claim victory, as long as this experiment contradicts the theory saying the cooperation should have been negligible — especially amongst “immoral convicts”. And really, invoking a Pareto standpoint (making one individual better off without making any other individual worse off) is equally dumb, as nobody thinks in terms of ethics… for some bloody cigarettes! In real conditions though, where PERSONAL FREEDOM would be at stake FOR YEARS (1, 5, or 20) — not just peanuts –, an experiment would show even more “cooperation”, meaning that most people would remain silent!

They can’t even design an experiment properly. Not winning a couple of bucks, or a cuppa coffee is almost irrelevant to all the subjects involved (this is not a real stake!), whereas the stress of staying in jail for 5 or for 20 years is almost a life-or-death issue. Mathematicians and sociologists seem unbelievably dumb when basic empathy is needed in order to analyze a problem or conduct an experiment.

__

P.S.: A classical example that’s commonly mentioned is that during the Cold War, both parts have chosen to continuously arm, not to disarm — which means they didn’t “cooperate”. Heck, this is a continuously iterated prisoners’ dilemma, which is a totally different issue than a one-time shot prisoners’ dilemma! In such a continuum, the “official theory” applies with great success.

__

LATE EDIT: If it wasn’t clear enough, the practical experiment was flawed for two major reasons:

  1. The stake. When it’s not about losing personal FREEDOM for years, but merely about not earning a few euros or not being given some cigarettes or coffee, people are more prone to take chances and face the highest possible risk… because they don’t risk that much!
  2. The reversed logic. How can you replace penalties with rewards (on a reversed scale, obviously) and still have people apply the same judgement? Being put in jail for 20 years is replaced with what? With not earning anything? Piece of cake! What’s the equivalent of being set free? Being given a maximum of cash or of cigarettes? To make the equivalent of a real prisoner’s dilemma, the 20 years, 5 years or 1 year penalties shouldn’t have meant “gradually lower earnings”, but rather fines imposed to the subjects! Say, for the students:
    • FREE means you’re given 100 €
    • 1 year means you should pay 100 €
    • 5 years means you should pay 500 €
    • 20 years means you should pay 2000 €

    What do you think the outcome would have been in such an experiment? Totally different, I’m telling you!

Also see http://freakonomics.com/2012/04/25/uk-game-show-golden-balls-a-new-solution-to-the-prisoner%E2%80%99s-dilemma/

 


"ECONOMICS AS ROBUSTNESS ANALYSIS," by Jaakko Kuorikoski, Aki Lehtinen and Caterina Marchionn, he University of Pittsburgh, 2007 ---
http://philsci-archive.pitt.edu/3550/1/econrobu.pdf

ECONOMICS AS ROBUSTNESS ANALYSIS
Jaakko Kuorikoski, Aki Lehtinen and Caterina Marchionni
25.9. 2007
1. Introduction ..................................................................................................................... 1
2. Making sense of robustness............................................................................................ 4
3. Robustness in economics................................................................................................ 6
4. The epistemic import of robustness analysis................................................................. 8
5. An illustration: geographical economics models ........................................................ 13
6. Independence of derivations......................................................................................... 18
7. Economics as a Babylonian science ............................................................................ 23
8. Conclusions ...................................................................................................................
 

1.Introduction
Modern economic analysis consists largely in building abstract mathematical models and deriving familiar results from ever sparser modeling assumptions is considered as a theoretical contribution. Why do economists spend so much time and effort in deriving same old results from slightly different assumptions rather than trying to come up with new and exciting hypotheses? We claim that this is because the process of refining economic models is essentially a form of robustness analysis. The robustness of modeling results with respect to particular modeling assumptions, parameter values or initial conditions plays a crucial role for modeling in economics for two reasons. First, economic models are difficult to subject to straightforward empirical tests for various reasons. Second, the very nature of economic phenomena provides little hope of ever making the modeling assumptions completely realistic. Robustness analysis is therefore a natural methodological strategy for economists because economic models are based on various idealizations and abstractions which make at least some of their assumptions unrealistic (Wimsatt 1987; 1994a; 1994b; Mäki 2000; Weisberg 2006b). The importance of robustness considerations in economics ultimately forces us to reconsider many commonly held views on the function and logical structure of economic theory.

Given that much of economic research praxis can be characterized as robustness analysis, it is somewhat surprising that philosophers of economics have only recently become interested in robustness. William Wimsatt has extensively discussed robustness analysis, which he considers in general terms as triangulation via independent ways of determination . According to Wimsatt, fairly varied processes or activities count as ways of determination: measurement, observation, experimentation, mathematical derivation etc. all qualify. Many ostensibly different epistemic activities are thus classified as robustness analysis. In a recent paper, James Woodward (2006) distinguishes four notions of robustness. The first three are all species of robustness as similarity of the result under different forms of determination. Inferential robustness refers to the idea that there are different degrees to which inference from some given data may depend on various auxiliary assumptions, and derivational robustness to whether a given theoretical result depends on the different modelling assumptions. The difference between the two is that the former concerns derivation from data, and the latter derivation from a set of theoretical assumptions. Measurement robustness means triangulation of a quantity or a value by (causally) different means of measurement. Inferential, derivational and measurement robustness differ with respect to the method of determination and the goals of the corresponding robustness analysis. Causal robustness, on the other hand, is a categorically different notion because it concerns causal dependencies in the world, and it should not be confused with the epistemic notion of robustness under different ways of determination.

In Woodward’s typology, the kind of theoretical model-refinement that is so common in economics constitutes a form of derivational robustness analysis. However, if Woodward (2006) and Nancy Cartwright (1991) are right in claiming that derivational robustness does not provide any epistemic credence to the conclusions, much of theoretical model- building in economics should be regarded as epistemically worthless. We take issue with this position by developing Wimsatt’s (1981) account of robustness analysis as triangulation via independent ways of determination. Obviously, derivational robustness in economic models cannot be a matter of entirely independent ways of derivation, because the different models used to assess robustness usually share many assumptions. Independence of a result with respect to modelling assumptions nonetheless carries epistemic weight by supplying evidence that the result is not an artefact of particular idealizing modelling assumptions. We will argue that although robustness analysis, understood as systematic examination of derivational robustness, is not an empirical confirmation procedure in any straightforward sense, demonstrating that a modelling result is robust does carry epistemic weight by guarding against error and by helping to assess the relative importance of various parts of theoretical models (cf. Weisberg 2006b). While we agree with Woodward (2006) that arguments presented in favour of one kind of robustness do not automatically apply to other kinds of robustness, we think that the epistemic gain from robustness derives from similar considerations in many instances of different kinds of robustness.

In contrast to physics, economic theory itself does not tell which idealizations are truly fatal or crucial for the modeling result and which are not. Economists often proceed on a preliminary hypothesis or an intuitive hunch that there is some core causal mechanism that ought to be modeled realistically. Turning such intuitions into a tractable model requires making various unrealistic assumptions concerning other issues. Some of these assumptions are considered or hoped to be unimportant, again on intuitive grounds. Such assumptions have been examined in economic methodology using various closely related terms such as Musgrave’s (1981) heuristic assumptions, Mäki’s (2000) early step assumptions, Hindriks’ (2006) tractability assumptions and Alexandrova’s (2006) derivational facilitators. We will examine the relationship between such assumptions and robustness in economic model-building by way of discussing a case: geographical economics. We will show that an important way in which economists try to guard against errors in modeling is to see whether the model’s conclusions remain the same if some auxiliary assumptions, which are hoped not to affect those conclusions, are changed. The case also demonstrates that although the epistemological functions of guarding against error and securing claims concerning the relative importance of various assumptions are somewhat different, they are often closely intertwined in the process of analyzing the robustness of some modeling result.

. . .

8. Conclusions
The practice of economic theorizing largely consists of building models with slightly different assumptions yielding familiar results. We have argued that this practice makes sense when seen as derivational robustness analysis. Robustness analysis is a sensible epistemic strategy in situations where we know that our assumptions and inferences are fallible, but not in what situations and in what way. Derivational robustness analysis guards against errors in theorizing when the problematic parts of the ways of determination, i.e. models, are independent of each other. In economics in particular, proving robust theorems from different models with diverse unrealistic assumptions helps us to evaluate what results correspond to important economic phenomena and what are merely artefacts of particular auxiliary assumptions. We have addressed Orzack and Sober’s criticism against robustness as an epistemically relevant feature by showing that their formulation of the epistemic situation in which robustness analysis is useful is misleading. We have also shown that their argument actually shows how robustness considerations are necessary for evaluating what a given piece of data can support. We have also responded to Cartwright’s criticism by showing that it relies on an untenable hope of a completely true economic model.

Viewing economic model building as robustness analysis also helps to make sense of the role of the rationality axioms that apparently provide the basis of the whole enterprise. Instead of the traditional Euclidian view of the structure of economic theory, we propose that economics should be approached as a Babylonian science, where the epistemically secure parts are the robust theorems and the axioms only form what Boyd and Richerson call a generalized sample theory, whose the role is to help organize further modelling work and facilitate communication between specialists.

 

Jensen Comment
As I've mentioned before I spent a goodly proportion of my time for two years in a think tank trying to invent adaptive regression and cluster analysis models. In every case the main reasons for my failures were lack of robustness. In particular, if any two models feeding in predictor variables w, x, y, and z generated different outcomes that were not robust in terms of the time ordering of the variables feeding into the algorithms. This made the results dependent of dynamic programming which has rarely been noted for computing practicality ---
http://en.wikipedia.org/wiki/Dynamic_programming


 

Simpson's Paradox and Cross-Validation

Simpson's Paradox --- http://en.wikipedia.org/wiki/Simpson%27s_paradox

"Simpson’s Paradox: A Cautionary Tale in Advanced Analytics," by Steve Berman, Leandro DalleMule, Michael Greene, and John Lucker, Significance:  Statistics Making Sense, October 2012 ---
http://www.significancemagazine.org/details/webexclusive/2671151/Simpsons-Paradox-A-Cautionary-Tale-in-Advanced-Analytics.html

Analytics projects often present us with situations in which common sense tells us one thing, while the numbers seem to tell us something much different. Such situations are often opportunities to learn something new by taking a deeper look at the data. Failure to perform a sufficiently nuanced analysis, however, can lead to misunderstandings and decision traps. To illustrate this danger, we present several instances of Simpson’s Paradox in business and non-business environments. As we demonstrate below, statistical tests and analysis can be confounded by a simple misunderstanding of the data. Often taught in elementary probability classes, Simpson’s Paradox refers to situations in which a trend or relationship that is observed within multiple groups reverses when the groups are combined. Our first example describes how Simpson’s Paradox accounts for a highly surprising observation in a healthcare study. Our second example involves an apparent violation of the law of supply and demand: we describe a situation in which price changes seem to bear no relationship with quantity purchased. This counterintuitive relationship, however, disappears once we break the data into finer time periods. Our final example illustrates how a naive analysis of marginal profit improvements resulting from a price optimization project can potentially mislead senior business management, leading to incorrect conclusions and inappropriate decisions. Mathematically, Simpson’s Paradox is a fairly simple—if counterintuitive—arithmetic phenomenon. Yet its significance for business analytics is quite far-reaching. Simpson’s Paradox vividly illustrates why business analytics must not be viewed as a purely technical subject appropriate for mechanization or automation. Tacit knowledge, domain expertise, common sense, and above all critical thinking, are necessary if analytics projects are to reliably lead to appropriate evidence-based decision making.

The past several years have seen decision making in many areas of business steadily evolve from judgment-driven domains into scientific domains in which the analysis of data and careful consideration of evidence are more prominent than ever before. Additionally, mainstream books, movies, alternative media and newspapers have covered many topics describing how fact and metric driven analysis and subsequent action can exceed results previously achieved through less rigorous methods. This trend has been driven in part by the explosive growth of data availability resulting from Enterprise Resource Planning (ERP) and Customer Relationship Management (CRM) applications and the Internet and eCommerce more generally. There are estimates that predict that more data will be created in the next four years than in the history of the planet. For example, Wal-Mart handles over one million customer transactions every hour, feeding databases estimated at more than 2.5 petabytes in size - the equivalent of 167 times the books in the United States Library of Congress.

Additionally, computing power has increased exponentially over the past 30 years and this trend is expected to continue. In 1969, astronauts landed on the moon with a 32-kilobyte memory computer. Today, the average personal computer has more computing power than the entire U.S. space program at that time. Decoding the human genome took 10 years when it was first done in 2003; now the same task can be performed in a week or less. Finally, a large consumer credit card issuer crunched two years of data (73 billion transactions) in 13 minutes, which not long ago took over one month.

This explosion of data availability and the advances in computing power and processing tools and software have paved the way for statistical modeling to be at the front and center of decision making not just in business, but everywhere. Statistics is the means to interpret data and transform vast amounts of raw data into meaningful information.

However, paradoxes and fallacies lurk behind even elementary statistical exercises, with the important implication that exercises in business analytics can produce deceptive results if not performed properly. This point can be neatly illustrated by pointing to instances of Simpson’s Paradox. The phenomenon is named after Edward Simpson, who described it in a technical paper in the 1950s, though the prominent statisticians Karl Pearson and Udney Yule noticed the phenomenon over a century ago. Simpson’s Paradox, which regularly crops up in statistical research, business analytics, and public policy, is a prime example of why statistical analysis is useful as a corrective for the many ways in which humans intuit false patterns in complex datasets.

Simpson’s Paradox is in a sense an arithmetic trick: weighted averages can lead to reversals of meaningful relationships—i.e., a trend or relationship that is observed within each of several groups reverses when the groups are combined. Simpson’s Paradox can arise in any number of marketing and pricing scenarios; we present here case studies describing three such examples. These case studies serve as cautionary tales: there is no comprehensive mechanical way to detect or guard against instances of Simpson’s Paradox leading us astray. To be effective, analytics projects should be informed by both a nuanced understanding of statistical methodology as well as a pragmatic understanding of the business being analyzed.

The first case study, from the medical field, presents a surface indication on the effects of smoking that is at odds with common sense. Only when the data are viewed at a more refined level of analysis does one see the true effects of smoking on mortality. In the second case study, decreasing prices appear to be associated with decreasing sales and increasing prices appear to be associated with increasing sales. On the surface, this makes no sense. A fundamental tenet of economics is that of the demand curve: as the price of a good or service increases, consumers demand less of it. Simpson’s Paradox is responsible for an apparent—though illusory—violation of this fundamental law of economics. Our final case study shows how marginal improvements in profitability in each of the sales channels of a given manufacturer may result in an apparent marginal reduction in the overall profitability the business. This seemingly contradictory conclusion can also lead to serious decision traps if not properly understood.

Case Study 1: Are those warning labels really necessary?

We start with a simple example from the healthcare world. This example both illustrates the phenomenon and serves as a reminder that it can appear in any domain.

The data are taken from a 1996 follow-up study from Appleton, French, and Vanderpump on the effects of smoking. The follow-up catalogued women from the original study, categorizing based on the age groups in the original study, as well as whether the women were smokers or not. The study measured the deaths of smokers and non-smokers during the 20 year period.

Continued in article

"Is the Ohlson (1995) Model an Example of the Simpson's Paradox?" by Samithamby Senthilnathan, SSRN 1417746, June 11, 2009 ---
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1417746

The Equity Prices and Accounting Variables: The role of the most recent prior period's price in value relevance studies
Paperback

by Samithamby Senthilnathan (Author)
Publisher: LAP LAMBERT Academic Publishing (May 22, 2012)
ISBN-10: 3659103721     ISBN-13: 978-3659103728
http://www.amazon.com/dp/3659103721?tag=beschevac-20

"Does an End of Period's Accounting Variable Assessed have Relevance for the Particular Period? Samithamby Senthilnathan, SSRN SSRN 1415182,, June 6, 2009 ---
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1415182
Also see http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1406788

What happened to cross-validation in accountics science research?

Over time I've become increasingly critical of the lack of validation in accountics science, and I've focused mainly upon lack of replication by independent researchers and lack of commentaries published in accountics science journals ---
http://www.trinity.edu/rjensen/TheoryTAR.htm

Another type of validation that seems to be on the decline in accountics science are the so-called cross-validations. Accountics scientists seem to be content with their statistical inference tests on Z-Scores, F-Tests, and correlation significance testing. Cross-validation seems to be less common, at least I'm having troubles finding examples of cross-validation. Cross-validation entails comparing sample findings with findings in holdout samples.

Cross Validation --- http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29

When reading the following paper using logit regression to to predict audit firm changes, it struck me that this would've been an ideal candidate for the authors to have performed cross-validation using holdout samples.
"Audit Quality and Auditor Reputation: Evidence from Japan," by Douglas J. Skinner and Suraj Srinivasan, The Accounting Review, September 2012, Vol. 87, No. 5, pp. 1737-1765.

We study events surrounding ChuoAoyama's failed audit of Kanebo, a large Japanese cosmetics company whose management engaged in a massive accounting fraud. ChuoAoyama was PwC's Japanese affiliate and one of Japan's largest audit firms. In May 2006, the Japanese Financial Services Agency (FSA) suspended ChuoAoyama for two months for its role in the Kanebo fraud. This unprecedented action followed a series of events that seriously damaged ChuoAoyama's reputation. We use these events to provide evidence on the importance of auditors' reputation for quality in a setting where litigation plays essentially no role. Around one quarter of ChuoAoyama's clients defected from the firm after its suspension, consistent with the importance of reputation. Larger firms and those with greater growth options were more likely to leave, also consistent with the reputation argument.

Rather than just use statistical inference tests on logit model Z-statistics, it struck me that in statistics journals the referees might've requested cross-validation tests on holdout samples of firms that changed auditors and firms that did not change auditors.

I do find somewhat more frequent cross-validation studies in finance, particularly in the areas of discriminant analysis in bankruptcy prediction modes.

Instances of cross-validation in accounting research journals seem to have died out in the past 20 years. There are earlier examples of cross-validation in accounting research journals. Several examples are cited below:

"A field study examination of budgetary participation and locus of control," by  Peter Brownell, The Accounting Review, October 1982 ---
http://www.jstor.org/discover/10.2307/247411?uid=3739712&uid=2&uid=4&uid=3739256&sid=21101146090203

"Information choice and utilization in an experiment on default prediction," Abdel-Khalik and KM El-Sheshai - Journal of Accounting Research, 1980 ---
http://www.jstor.org/discover/10.2307/2490581?uid=3739712&uid=2&uid=4&uid=3739256&sid=21101146090203

"Accounting ratios and the prediction of failure: Some behavioral evidence," by Robert Libby, Journal of Accounting Research, Spring 1975 ---
http://www.jstor.org/discover/10.2307/2490653?uid=3739712&uid=2&uid=4&uid=3739256&sid=21101146090203

There are other examples of cross-validation in the 1970s and 1980s, particularly in bankruptcy prediction.

I have trouble finding illustrations of cross-validation in the accounting research literature in more recent years. Has the interest in cross-validating waned along with interest in validating accountics research? Or am I just being careless in my search for illustrations?

 


Reverse Regression

"Solution to Regression Problem," by David Giles, Econometrics Beat:  Dave Giles’ Blog, University of Victoria, December 26, 2013 ---
http://davegiles.blogspot.com/2013/12/solution-to-regression-problem.html

O.K. - you've had long enough to think about that little regression problem I posed the other day. It's time to put you out of your misery!

 
Here's the problem again, with a solution.


Problem:
Suppose that we estimate the following regression model by OLS:

 
                     yi = α + β xi + εi .

 
The model has a single regressor, x, and the point estimate of β turns out to be 10.0.

 
Now consider the "reverse regression", based on exactly the same data:

 
                    xi = a + b yi + ui .

 
What can we say about the value of the OLS point estimate of b?
 
Solution:

Continued in article


David Giles' Top Five Econometrics Blog Postings for 2013


Econometrics Beat:  Dave Giles’ Blog, University of Victoria, December 31, 2013 ---
http://davegiles.blogspot.com/2013/12/my-top-5-for-2013.html

Everyone seems to be doing it at this time of the year. So, here are the five most popular new posts on this blog in 2013:
  1. Econometrics and "Big Data"
  2. Ten Things for Applied Econometricians to Keep in Mind
  3. ARDL Models - Part II - Bounds Tests
  4. The Bootstrap - A Non-Technical Introduction
  5. ARDL Models - Part I

Thanks for reading, and for your comments.

Happy New Year!

Jensen Comment
I really like the way David Giles thinks and writes about econometrics. He does not pull his punches about validity testing.

Econometrics Beat: Dave Giles' Blog --- http://davegiles.blogspot.com/

Reading for the New Year

 
Back to work, and back to reading:
  • Basturk, N., C. Cakmakli, S. P. Ceyhan, and H. K. van Dijk, 2013. Historical developments in Bayesian econometrics after Cowles Foundation monographs 10,14. Discussion Paper 13-191/III, Tinbergen Institute.
  • Bedrick, E. J., 2013. Two useful reformulations of the hazard ratio. American Statistician, in press.
  • Nawata, K. and M. McAleer, 2013. The maximum number of parameters for the Hausman test when the estimators are from different sets of equations.  Discussion Paper 13-197/III, Tinbergen Institute.
  • Shahbaz, M, S. Nasreen, C. H. Ling, and R. Sbia, 2013. Causality between trade openness and energy consumption: What causes what  high, middle and low income countries. MPRA Paper No. 50832. 
  • Tibshirani, R., 2011. Regression shrinkage and selection via the lasso: A retrospective. Journal of the Royal Statistical Society, B, 73, 273-282.
  • Zamani, H. and N. Ismail, 2014. Functional form for the zero-inflated generalized Poisson regression model. Communications in Statistics - Theory and Methods, in press.

 


 

A Cautionary Bedtime Story
http://davegiles.blogspot.com/2013/12/a-cautionary-bedtime-story.html#more

 
Once upon a time, when all the world and you and I were young and beautiful, there lived in the ancient town of Metrika a young boy by the name of Joe.
 

Now, young Joe was a talented lad, and his home town was prosperous and filled with happy folk - Metricians, they were called. Joe was a member of the Econo family, and his ancestors had been among the founding-fathers of the town. Originating in the neighbouring city of Econoville, Joe Econometrician's forebears had arrived in Metrika not long after the original settlers of that town - the Biols (from nearby Biologica), and the unfortunately named Psychos (from the hamlet of Psychovia).

In more recent times, other families (or "specialists", as they were sometimes known) had also established themselves in the town, and by the time that Joe was born there was already a sprinkling of Clios (from the ancient city of Historia), and even a few Environs. Hailing from the suburbs of Environmentalia, the Environs were regarded with some disdain by many of the more established families of Metrika.

Metrika began as a small village - little more than a coach-stop and a mandatory tavern at a junction in the highway running from the ancient data mines in the South, to the great city of Enlightenment, far to the North. In Metrika, the transporters of data of all types would pause overnight on their long journey; seek refreshment at the tavern; and swap tales of their experiences on the road.

To be fair, the data transporters were more than just humble freight carriers. The raw material that they took from the data mines was largely unprocessed. The vast mountains of raw numbers usually contained valuable gems and nuggets of truth, but typically these were buried from sight. The data transporters used the insights that they gained from their raucous, beer-fired discussions and arguments (known locally as "seminars") with the Metrika yokels locals at the tavern to help them to sift through the data and extract the valuable jewels. With their loads considerably lightened, these "data-miners" then continued on their journey to the City of Enlightenment in a much improved frame of mind, hangovers nothwithstanding!

Over time, the town of Metrika prospered and grew as the talents of its citizens were increasingly recognized and valued by those in the surrounding districts, and by the data miners transporters.

Young Joe grew up happily, supported by his family of econometricians, and he soon developed the skills that were expected of his societal class. He honed his computing skills; developed a good nose for "dodgy" data; and studiously broadened and deepened his understanding of the various tools wielded by the artisans in the neighbouring town of Statsbourg.

In short, he was a model child!

But - he was torn! By the time that he reached the tender age of thirteen, he felt the need to make an important, life-determining, decision.

Should he align his talents with the burly crew who frequented the gym near his home - the macroeconometricians - or should he throw in his lot with the physically challenged bunch of empirical economists known locally as the microeconometricians?

What a tough decision! How to decide?

He discussed his dilemma with his parents, aunts, and uncles. Still, the choice was unclear to him.

Then, one fateful day, while sitting by the side of the highway and watching the data-miners pass by with their increasingly heavy  loads, the answer came to him! There was a simple solution - he would form his own break-away movement that was free of the shackles of his Econo heritage.

Overwhelmed with excitement, Joe raced back to the tavern to announce to the seminar participants locals that henceforth he was to be known as a Data Scientist.

As usual, the locals largely ignored what he was saying, and instead took turns at talking loudly about things that they thought would make them seem important to their peers. Finally, though, after many interruptions, and the consumption of copious quantities of ale, Joe was able to hold their attention.

"You see", he said, "the data that are now being mined, and transported to the City of Enlightenment, are available in such vast quantities that the truth must lie within them."

"All of this energy that we've been expending on building economic models, and then using the data to test their validity - it's a waste of time! The data are now so vast that the models are superfluous."

(To be perfectly truthful, he probably used words of one syllable, but I think you get the idea.)

"We don't need to use all of those silly simplifying assumptions that form the basis of the analysis being undertaken by the microeconometricians and macroeonometricians."

(Actually, he slurred these last three words due to a mixture of youthful enthusiasm and a mouthful of ale.)

"Their models are just a silly game, designed to create the impression that they're actually adding some knowledge to the information in the data. No, all that we need to do is to gather together lots and lots of our tools, and use them to drill deep into the data to reveal the true patterns that govern our lives."

"The answer was there all of the time. While we referred to those Southerners in disparaging terms, calling them "data miners" as if such activity were beneath the dignity of serious modellers such as ourselves, in reality data-mining is our future. How foolish we were!"

Now, it must be said that there were a few older econometricians who were somewhat unimpressed by Joe's revelation. Indeed, some of them had an uneasy feeling that they'd heard this sort of talk before. Amid much head-scratching, beard-stroking, and ale-quaffing, some who were present that day swear they heard mention of long-lost names such as Koopmans and Vining. Of course, we'll never know for sure.

However, young Joe was determined that he had found his destiny. A Data Scientist he would be, and he was convinced that others would follow his lead. Gathering together as many calculating tools as he could lay his hands on, Joe hitched a ride North, to the great City of Enlightenment. The protestations of his family and friends were to no avail. After all, as he kept insisting, we all know that "E" comes after "D".

And so, Joe was last seen sitting in a large wagon of data, trundling North while happily picking through some particularly interesting looking nuggets, and smiling the smile of one who knows the truth.

To this day, econometricians gather, after a hard day of modelling, in the taverns of Metrika. There, they swap tales of new theories, interesting computer algorithms, and even the characteristics of their data. Occasionally, Joe's departure from the town is recalled, but what became of him, or his followers, we really don't know. Perhaps he never actually found the City of Enlightenment after all. (Shock, horror!)

And that, dear children, is what can happen to you - yes, even you - if you don't eat all of your vegetables, or if you believe everything that you hear at seminars the tavern.

 


"Some Thoughts About Accounting Scholarship," by Joel Demski, AAA President's Message, Accounting Education News, Fall 2001
http://aaahq.org/pubs/AEN/2001/Fall2001.pdf 

Some Thoughts on Accounting Scholarship From Annual Meeting Presidential Address, August 22, 2001
Tradition calls for me to reveal plans and aspirations for the coming year. But a slight deviation from tradition will, I hope, provide some perspective on my thinking.

We have, in the past half century, made considerable strides in our knowledge of accounting institutions. Statistical connections between accounting measures and market prices, optimal contracting, and professional judgment processes and biases are illustrative. In the process we have raised the stature, the relevance, and the sheer excitement of intellectual inquiry in accounting, be it in the classroom, in the cloak room, or in the journals.

Of late, however, a malaise appears to have settled in. Our progress has turned flat, our tribal tendencies have taken hold, and our joy has diminished.

Some Warning Signs
Some Warning Signs One indicator is our textbooks, our primary communication medium and our statement to the world about ourselves. I see several patterns here. One is the unrelenting march to make every text look like People magazine. Form now leads, if not swallows, substance. Another is the insatiable appetite to list every rule published by the FASB (despite the fact we have a tidal wave thanks to DIG, EIFT, AcSEC, SABs, and what have you). Closely related is the interest in fads. Everything, including this paragraph of my remarks, is now subject to a value-added test. Benchmarking, strategic vision, and EVA ® are everywhere. Foundations are nowhere. Building blocks are languishing in appendices and wastebaskets.

A second indicator is our journals. They have proliferated in number. But we struggle with an intertemporal sameness, with incremental as opposed to discontinuous attempts to move our thinking forward, and with referee intrusion and voyeurism. Value relevance is a currently fashionable approach to identifying statistical regularities in the financial market arena, just as a focus on readily observable components of compensation is a currently fashionable dependent variable in the compensation arena. Yet we know measurement error abounds, that other sources of information are both present and hardly unimportant, that compensation is broad-based and intertemporally managed, and that compensating wage differentials are part of the stew. Yet we continue on the comfortable path of sameness.

A third indicator is our work habits. We have embraced, indeed been swallowed by, the multiple adjective syndrome, or MAS: financial, audit, managerial, tax, analytic, archival, experimental, systems, cognitive, etc. This applies to our research, to our reading, to our courses, to our teaching assignments, to our teaching, and to the organization of our Annual Meeting. In so doing, we have exploited specialization, but in the process greatly reduced communication networks, and taken on a near tribal structure.

A useful analogy here is linearization. In accounting we linearize everything in sight: additive components on the balance sheet, linear cost functions, and the most glaring of all, the additive representation inherent in ABC, which by its mere structure denies the scope economy that causes the firm to jointly produce that set of products in the first place. Linearization denies interaction, denies synergy; and our recent propensity for multiple adjectives does precisely the same to us. We are doing to ourselves what we’ve done to our subject area. What, we might ask, happened to accounting? Indeed, I worry we will someday have a section specialized in depreciation or receivables or intangibles.

I hasten to add this particular tendency has festered for some time. Rick Antle, discussing the “Intellectual Boundaries in Accounting Research” at the ’88 meeting observed:

In carving out tractable pieces of institutionally defined problems, we inevitably impose intellectual boundaries. ... My concern arises when, instead of generating fluid, useful boundaries, our processes of simplification lead to rigid, dysfunctional ones. (6/89 Horizons, page 109).

I fear we have perfected and made a virtue out of Rick’s concern. Fluid boundaries are now held at bay by our work habits and natural defenses.

A final indicator is what appears to be coming down the road, our work in progress. Doctoral enrollment is down, a fact. It is also arguably factual that doctoral training has become tribal. I, personally, have witnessed this at recent Doctoral and New Faculty Consortia, and in our recruiting at UF. This reinforces the visible patterns in our textbooks, in our journals, and in our work habits. Some Contributors

Some Contributors
These patterns, of course, are not accidental. They are largely endogenous. And I think it is equally instructive to sketch some of the contributors.

One contributor is employers, their firms, and their professional organizations. Employers want and lobby for the student well equipped with the latest consulting fad, or the student well equipped to transition into a billable audit team member or tax consultant within two hours of the first day of employment. Immediacy is sought and championed, though with the caveat of critical-thinking skills somehow being added to the stew.

Continued in article

Jensen Comment
I agree with much of what Joel said, but I think he overlooks what I think is a major problem in accounting scholarship. That major problem in my viewpoint is the takeover of accountancy doctoral programs in North America where accounting dissertations are virtually not acceptable unless they have equations ---
http://www.trinity.edu/rjensen/Theory01.htm#DoctoralPrograms
Recommendation 2 of the American Accounting Association Pathways Commission (emphasis added)

Scapbook1083--- http://www.trinity.edu/rjensen/TheoryTar.htm#Scrapbook1083

 

Promote accessibility of doctoral education by allowing for flexible content and structure in doctoral programs and developing multiple pathways for degrees. The current path to an accounting Ph.D. includes lengthy, full-time residential programs and research training that is for the most part confined to quantitative rather than qualitative methods. More flexible programs -- that might be part-time, focus on applied research and emphasize training in teaching methods and curriculum development -- would appeal to graduate students with professional experience and candidates with families, according to the report.
 
http://commons.aaahq.org/groups/2d690969a3/summary

It has been well over a year in which I've scanned the media for signs of change. But in well over a year I've seen little progress and zero encouragement that accounting doctoral programs and our leading accounting research journals are going to change. A necessary condition remains that an accounting doctoral dissertation and an Accounting Review article is not acceptable unless it has equations.

Accounting scholarship in doctoral programs is still "confined to quantitative rather than qualitative methods." The main reason is simple. Quantitative research is easier.

My theory is that accountics science gained dominance in accounting research, especially in North American accounting Ph.D. programs, because it abdicated responsibility:

1.     Most accountics scientists buy data, thereby avoiding the greater cost and drudgery of collecting data.

 

2.     By relying so heavily on purchased data, accountics scientists abdicate responsibility for errors in the data.

 

3.     Since adding missing variable data to the public database is generally not at all practical in purchased databases, accountics scientists have an excuse for not collecting missing variable data.
 

4.  Software packages for modeling and testing data abound. Accountics researchers need only feed purchased data into the hopper of statistical and mathematical analysis programs. It still takes a lot of knowledge to formulate hypotheses and to understand the complex models. But the really hard work of collecting data and error checking is avoided by purchasing data.

Some Thoughts About Accounting Scholarship," by Joel Demski, AAA President's Message, Accounting Education News, Fall 2001
http://aaahq.org/pubs/AEN/2001/Fall2001.pdf 

. . .

A second indicator is our journals. They have proliferated in number. But we struggle with an intertemporal sameness, with incremental as opposed to discontinuous attempts to move our thinking forward, and with referee intrusion and voyeurism. Value relevance is a currently fashionable approach to identifying statistical regularities in the financial market arena, just as a focus on readily observable components of compensation is a currently fashionable dependent variable in the compensation arena. Yet we know measurement error abounds, that other sources of information are both present and hardly unimportant, that compensation is broad-based and intertemporally managed, and that compensating wage differentials are part of the stew. Yet we continue on the comfortable path of sameness.

It has been well over a year since the Pathways Report was issued. Nobody is listening on the AECM or anywhere else! Sadly the accountics researchers who generate this stuff won't even discuss their research on the AECM or the AAA Commons:

"Frankly, Scarlett, after I get a hit for my resume in The Accounting Review I just don't give a damn"
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
One more mission in what's left of my life will be to try to change this
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm

 


 

Bob Jensen's threads on validity testing in accountics science ---
http://www.trinity.edu/rjensen/TheoryTAR.htm

How did academic accounting research become a pseudo science?
http://www.trinity.edu/rjensen/theory01.htm#WhatWentWrong



Gasp! How could an accountics scientist question such things? This is sacrilege!
Let me end my remarks with a question: Have Ball and Brown (1968)—and Beaver (1968) for that matter, if I can bring Bill Beaver into it—have we had too much influence on the research agenda to the point where other questions and methods are being overlooked?
Phil Brown of Ball and Brown Fame

"How Can We Do Better?" by Phillip R. Brown (of Ball and Brown Fame), Accounting Horizons (Forum on the State of Accounting Scholarship), December 2013 ---
http://aaajournals.org/doi/full/10.2308/acch-10365
Not Free

Philip R. Brown AM is an Honorary Professor at The University of New South Wales and Senior Honorary Research Fellow at The University of Western Australia.

I acknowledge the thoughtful comments of Sudipta Basu, who arranged and chaired this session at the 2012 American Accounting Association (AAA) Annual Meeting, Washington, DC.

The video presentation can be accessed by clicking the link in Appendix A.

Corresponding author: Philip R. Brown AM. Email:

When Sudipta Basu asked me whether I would join this panel, he was kind enough to share with me the proposal he put to the conference organizers. As background to his proposal, Sudipta had written:

Analytical and empirical researchers generate numerous results about accounting, as do logicians reasoning from conceptual frameworks. However, there are few definitive tests that permit us to negate propositions about good accounting.

This panel aims to identify a few “most wrong” beliefs held by accounting experts—academics, regulators, practitioners—where a “most wrong” belief is one that is widespread and fundamentally misguided about practices and users in any accounting domain.

While Sudipta's proposal resonated with me, I did wonder why he asked me to join the panel, and whether I am seen these days as just another “grumpy old man.” Yes, I am no doubt among the oldest here today, but grumpy? You can make your own mind on that, after you have read what I have to say.

This essay begins with several gripes about editors, reviewers, and authors, along with suggestions for improving the publication process for all concerned. The next section contains observations on financial accounting standard setting. The essay concludes with a discussion of research myopia, namely, the unfortunate tendency of researchers to confine their work to familiar territory, much like the drunk who searches for his keys under the street light because “that is where the light is.”



 
ON EDITORS AND REVIEWERS, AND AUTHORS

I have never been a regular editor, although I have chaired a journal's board of management and been a guest editor, and I appointed Ray Ball to his first editorship (Ray was the inaugural editor of the Australian Journal of Management). I have, however, reviewed many submissions for a whole raft of journals, and written literally hundreds of papers, some of which have been published. As I reflect on my involvement in the publications process over more than 50 years, I do have a few suggestions on how we can do things better. In the spirit of this panel session, I have put my suggestions in the form of gripes about editors, reviewers, and authors.

One-eyed editors—and reviewers—who define the subject matter as outside their journal's interests are my first gripe; and of course I except journals with a mission that is stated clearly and in unequivocal terms for all to see. The best editors and the best reviewers are those who are open-minded who avoid prejudging submissions by reference to some particular set of questions or modes of thinking that have become popular over the last five years or so. Graeme Dean, former editor of Abacus, and Nick Dopuch, former editor of the Journal of Accounting Research, are fine examples, from years gone by, of what it means to be an excellent editor.

Editors who are reluctant to entertain new ways of looking at old questions are a second gripe. Many years ago I was asked to review a paper titled “The Last Word on …” (I will not fill in the dots because the author may still be alive.) But at the time I thought, what a strange title! Can any academic reasonably believe they are about to have the last say on any important accounting issue? We academics thrive on questioning previous works, and editors and their reviewers do well when they nurture this mindset.

My third gripe concerns editors who, perhaps unwittingly, send papers to reviewers with vested interests and the reviewers do not just politely return the paper to the editor and explain their conflict of interest. A fourth concerns editors and reviewers who discourage replications: their actions signal a disciplinary immaturity. I am referring to rejecting a paper that repeats an experiment, perhaps in another country, purely because it has been done before. There can be good reasons for replicating a study, for example if the external validity of the earlier study legitimately can be questioned (perhaps different outcomes are reasonably expected in another institutional setting), or if methodological advances indicate a likely design flaw. Last, there are editors and reviewers who do not entertain papers that fail to reject the null hypothesis. If the alternative is well-reasoned and the study is sound, and they can be big “ifs,” then failure to reject the null can be informative, for it may indicate where our knowledge is deficient and more work can be done.1

It is not only editors and reviewers who test my emotional state. I do get a bit short when I review papers that fail to appreciate that the ideas they are dealing with have long yet uncited histories, sometimes in journals that are not based in North America. I am particularly unimpressed when there is an all-too-transparent and excessive citation of works by editors and potential reviewers, as if the judgments of these folks could possibly be influenced by that behavior. Other papers frustrate me when they are technically correct but demonstrate the trivial or the obvious, and fail to draw out the wider implications of their findings. Then there are authors who rely on unnecessarily coarse “control” variables which, if measured more finely, may well threaten their findings.2 Examples are dummy variables for common law/code law countries, for “high” this and “low” that, for the presence or absence of an audit/nomination/compensation committee, or the use of an industry or sector variable without saying which features of that industry or sector are likely to matter and why a binary representation is best. In a nutshell, I fear there may be altogether too many dummies in financial accounting research!

Finally, there are the International Financial Reporting Standards (IFRS) papers that fit into the category of what I describe as “before and after studies.” They focus on changes following the adoption of IFRS promulgated by the London-based International Accounting Standards Board (IASB). A major concern, and I have been guilty too, is that these papers, by and large, do not deal adequately with the dynamics of what has been for many countries a period of profound change. In particular, there is a trade-off between (1) experimental noise from including too long a “before” and “after” history, and (2) not accommodating the process of change, because the “before” and “after” periods are way too short. Neither do they appear to control convincingly for other time-related changes, such as the introduction of new accounting and auditing standards, amendments to corporations laws and stock exchange listing rules, the adoption of corporate governance codes of conduct, more stringent compliance monitoring and enforcement mechanisms, or changes in, say stock, market liquidity as a result of the introduction of new trading platforms and protocols, amalgamations among market providers, the explosion in algorithmic trading, and the increasing popularity among financial institutions of trading in “dark pools.”



 
ON FINANCIAL ACCOUNTING STANDARD SETTING

I count a number of highly experienced financial accounting standard setters among my friends and professional acquaintances, and I have great regard for the difficulties they face in what they do. Nonetheless, I do wonder


. . .

 
ON RESEARCH MYOPIA

A not uncommon belief among academics is that we have been or can be a help to accounting standard setters. We may believe we can help by saying something important about whether a new financial accounting standard, or set of standards, is an improvement. Perhaps we feel this way because we have chosen some predictive criterion and been able to demonstrate a statistically reliable association between accounting information contained in some database and outcomes that are consistent with that criterion. Ball and Brown (1968, 160) explained the choice of criterion this way: “An empirical evaluation of accounting income numbers requires agreement as to what real-world outcome constitutes an appropriate test of usefulness.” Note their reference to a requirement to agree on the test. They were referring to the choice of criterion being important to the persuasiveness of their tests, which were fundamental and related to the “usefulness” of U.S. GAAP income numbers to stock market investors 50 years ago. As time went by and the financial accounting literature grew accordingly, financial accounting researchers have looked in many directions for capital market outcomes in their quest for publishable results.

Research on IFRS can be used to illustrate my point. Those who have looked at the consequences of IFRS adoption have mostly studied outcomes they believed would interest participants in equity markets and to a less extent parties to debt contracts. Many beneficial outcomes have now been claimed,4 consistent with benefits asserted by advocates of IFRS. Examples are more comparable accounting numbers; earnings that are higher “quality” and less subject to managers' discretion; lower barriers to international capital flows; improved analysts' forecasts; deeper and more liquid equity markets; and a lower cost of capital. But the evidence is typically coarse in nature; and so often the results are inconsistent because of the different outcomes selected as tests of “usefulness,” or differences in the samples studied (time periods, countries, industries, firms, etc.) and in research methods (how models are specified and variables measured, which estimators are used, etc.). The upshot is that it can be difficult if not impossible to reconcile the many inconsistencies, and for standard setters to relate reported findings to the judgments they must make.

Despite the many largely capital market outcomes that have been studied, some observers of our efforts must be disappointed that other potentially beneficial outcomes of adopting IFRS have largely been overlooked. Among them are the wider benefits to an economy that flow from EU membership (IFRS are required),5 or access to funds provided by international agencies such as the World Bank, or less time spent by CFOs of international companies when comparing the financial performance of divisions operating in different countries and on consolidating the financial statements of foreign subsidiaries, or labor market benefits from more flexibility in the supply of professionally qualified accountants, or “better” accounting standards from pooling the skills of standard setters in different jurisdictions, or less costly and more consistent professional advice when accounting firms do not have to deal with as much cross-country variation in standards and can concentrate their high-level technical skills, or more effective compliance monitoring and enforcement as regulators share their knowledge and experience, or the usage of IFRS by “millions (of small and medium enterprises) in more than 80 countries” (Pacter 2012), or in some cases better education of tomorrow's accounting professionals.6 I am sure you could easily add to this list if you wished.

In sum, we can help standard setters, yes, but only in quite limited ways.7 Standard setting is inherently political in nature and will remain that way as long as there are winners and losers when standards change. That is one issue. Another is that the results of capital markets studies are typically too coarse to be definitive when it comes to the detailed issues that standard setters must consider. A third is that accounting standards have ramifications extending far beyond public financial markets and a much more expansive view needs to be taken before we can even hope to understand the full range of benefits (and costs) of adopting IFRS.

Let me end my remarks with a question: Have Ball and Brown (1968)—and Beaver (1968) for that matter, if I can bring Bill Beaver into it—have we had too much influence on the research agenda to the point where other questions and methods are being overlooked?

February 27, 2014 Reply from Paul Williams

Bob,
If you read that last Horizon's section provided by "thought leaders" you realize the old guys are not saying anything they could not have realized 30 years ago. That they didn't realize it then (or did but was not in their interest to say so), which led them to run journals whose singular purpose seemed to be to enable they and their cohorts to create politically correct academic reputations, is not something to ask forgiveness for at the end of your career.

Like the sinner on his deathbed asking for God's forgiveness , now is a hell of a time to suddenly get religion. If you heard these fellows speak when they were young they certainly didn't speak with voices that adumbrated any doubt that what they were doing was rigorous research and anyone doing anything else was the intellectual hoi polloi.

Oops, sorry we created an academy that all of us now regret, but, hey, we got ours. It's our mess, but now we are telling you its a mess you have to clean up. It isn't like no one was saying these things 30 years ago (you were as well as others including yours truly) and we have intimate knowledge of how we were treated by these geniuses.



David Johnstone asked me to write a paper on the following:
"A Scrapbook on What's Wrong with the Past, Present and Future of Accountics Science"
Bob Jensen
February 19, 2014
SSRN Download:  http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2398296 

Abstract

For operational convenience I define accountics science as research that features equations and/or statistical inference. Historically, there was a heated debate in the 1920s as to whether the main research journal of academic accounting, The Accounting Review (TAR) that commenced in 1926, should be an accountics journal with articles that mostly featured equations. Practitioners and teachers of college accounting won that debate.

TAR articles and accountancy doctoral dissertations prior to the 1970s seldom had equations.  For reasons summarized below, doctoral programs and TAR evolved to where in the 1990s there where having equations became virtually a necessary condition for a doctoral dissertation and acceptance of a TAR article. Qualitative normative and case method methodologies disappeared from doctoral programs.

What’s really meant by “featured equations” in doctoral programs is merely symbolic of the fact that North American accounting doctoral programs pushed out most of the accounting to make way for econometrics and statistics that are now keys to the kingdom for promotion and tenure in accounting schools ---
http://www.trinity.edu/rjensen/Theory01.htm#DoctoralPrograms

The purpose of this paper is to make a case that the accountics science monopoly of our doctoral programs and published research is seriously flawed, especially its lack of concern about replication and focus on simplified artificial worlds that differ too much from reality to creatively discover findings of greater relevance to teachers of accounting and practitioners of accounting. Accountics scientists themselves became a Cargo Cult.

Shielding Against Validity Challenges in Plato's Cave ---
http://www.trinity.edu/rjensen/TheoryTAR.htm

Common Accountics Science and Econometric Science Statistical Mistakes ---
http://www.cs.trinity.edu/~rjensen/temp/AccounticsScienceStatisticalMistakes.htm

The Cult of Statistical Significance: How Standard Error Costs Us Jobs, Justice, and Lives ---
http://www.cs.trinity.edu/~rjensen/temp/DeirdreMcCloskey/StatisticalSignificance01.htm

How Accountics Scientists Should Change: 
"Frankly, Scarlett, after I get a hit for my resume in The Accounting Review I just don't give a damn"
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm
One more mission in what's left of my life will be to try to change this
http://www.cs.trinity.edu/~rjensen/temp/AccounticsDamn.htm 

What went wrong in accounting/accountics research?  ---
http://www.trinity.edu/rjensen/theory01.htm#WhatWentWrong

The Sad State of Accountancy Doctoral Programs That Do Not Appeal to Most Accountants ---
http://www.trinity.edu/rjensen/theory01.htm#DoctoralPrograms