Nov. 15, 2013
I would like to address Director Cordray's statement that the CFPB has a "proven statistical methodology" to measure alleged patterns of discrimination in the financing of auto purchases.
It appears that the CFPB has borrowed the statistical method it is using is from the healthcare community who also uses proxy measures to determine ethnic participation at healthcare facilities. The technology of choice is Bayesian analysis.
Bayesian analysis comes in many shades and many colors. The lament of model builders is that "all models are inaccurate some are useful". One cannot assess the reliability of the output of the Bayesian analysis unless the CFPB discloses its underlying algorithm, the input data and the resultant outputs.
A discussion of the reliability of Bayesian analysis is timely because the current edition of Science states that Bayesian analysis could result in as many as one in four false positives—concluding that at a negative event is in fact a causal factor.
It demonstrates another lament of modelers: "If you torture data enough it will speak to you".
To a degree the CFPB has already spoken on the aforementioned issues. More specifically the CFPB has issued regulations which implement the Data Quality Act (DQA). The DQA required OMB to promulgate regulations which set forth the standards which data disseminated by all federal agencies must meet. These conditions include transparency and reproducibility.
The DQA requires that each federal agency issue its own DQA ( Information Quality) guidelines to conform with those issued by OMB.
With respect to "transparency" the CFPB regulations state:
The Bureau will make both original and supporting data and the source of the data available to the public."
With respect to "reproducibility" the CFPB regulations state:
Bureau will strive to ensure that statistical and financial data disseminated by the Bureau is capable of being substantially reproduced by an independent evaluator, subject to some degree of imprecision."