|
|
Serbian
Journal of Management
2017,
vol. 12, iss. 1, pp. 157-169
|
|
High-dimensional
data in economics and their (robust) analysis
Jan
Kalina
Institute
of Computer Science, Czech Academy of Sciences & Institute of
Information
Theory and Automation, Czech Academy of Sciences, Czech Republic
e-mail: kalina@cs.cas.cz
Abstract
This
work is devoted to statistical methods for the analysis of economic
data with a large number of variables. The authors present a review of
references documenting that such data are more and more commonly
available in various theoretical and applied economic problems and
their analysis can be hardly performed with standard econometric
methods. The paper is focused on highdimensional data, which have a
small number of observations, and gives an overview of recently
proposed methods for their analysis in the context of econometrics,
particularly in the areas of dimensionality reduction, linear
regression and classification analysis. Further, the performance of
various methods is illustrated on a publicly available benchmark data
set on credit scoring. In comparison with other authors, robust methods
designed to be insensitive to the presence of outlying measurements are
also used. Their strength is revealed after adding an artificial
contamination by noise to the original data. In addition, the
performance of various methods for a prior dimensionality reduction of
the data is compared.
This Work is licensed under a Creative Commons Attribution 4.0 License.
Keywords
Econometrics,
high-dimensional data, dimensionality reduction, linear regression,
classification analysis, robustness.
|
|
References
|
Newly
uploaded article: references checking, normalizing and linking in
progress. |
Ahrens,
A., & Bhattacharjee, A. (2015). Two-step lasso estimation of
the spatial weights matrix. Econometrics, 3, 128-155.
Atkinson A., & Riani M. (2004). Exploring multivariate data
with the forward search. New York, NY, USA: Springer.
Baesens, B. (2014). Analytics in a big data world. New York, NY, USA:
Wiley.
Belloni,
A., Chernozhukov, V., & Hansen, C.B. (2013). Inference for
high-dimensional sparse econometric models. In Acemoglu, D., Arellano,
M., & Dekel, E. (Eds.), Advances in Economics and Econometrics,
10th World Congress, Vol. 3. Cambridge, UK: Cambridge University Press.
Belloni,
A., Chernozhukov, V., & Wei, Y. (2015). Honest confidence
regions
for a regression parameter in logistic regression with a large number
of controls. Available: http://arxiv.org/abs/1304.3969 (February 20,
2016).
Bühlmann, P., & van de Geer, S. (2011). Statistics for
high-dimensional data. Berlin, Germany: Springer.
Candes,
E., & Tao, T. (2007). The Dantzig selector: Statistical
estimation
when p is much larger than n. Annals of Statistics, 35, 2313-2351.
Carrasco,
M., Florens, J.-P., & Renault, E. (2007). Linear inverse
problems
in structural econometrics estimation based on spectral decomposition
and regularization. Pp. 5633-5751 in Handbook of Econometrics, Volume
6, Part B.
Einav, L., & Levin, J.D. (2013). The data revolution and
economic analysis. NBER working paper No. 19035.
Eisenstein,
E.M., & Lodish, L.M. (2002). Marketing decision support and
intelligent systems: Precisely worthwhile or vaguely worthless? Pp.
436-454 in Weitz B.A., Wensley R. (Eds.), Handbook of marketing.
London, UK: SAGE.
Fan, J., Han, F., & Liu, H. (2014). Challenges of big data
analysis. National Science Review, 1, 293-314.
Florens,
J.-P., & Simoni, A. (2012). Nonparametric estimation of an
instrumental regression: A quasi-Bayesian approach based on regularized
prior. Journal of Econometrics, Vol. 170, 458-475.
Greene, W.H. (2012). Econometric Analysis. 7th edn. Harlow, UK: Pearson
Education Limited.
Harrell,
F.E. (2001). Regression modeling strategies with applications to linear
models, logistic regression, and survival analysis. New York, NY, USA:
Springer.
Hastie, T., Tibshirani, R., & Friedman, J. (2009).
The elements of statistical learning. Data mining, inference, and
prediction. New York, NY, USA: Springer.
Jurečková, J., Picek, J. (2012). Methodology in robust and
nonparametric statistics. Boca Raton, FL, USA: CRC Press.
Kalina, J. (2012). On multivariate methods in robust econometrics.
Prague Economic Papers, 21, 69-82.
Kalina,
J., & Rensová, D. (2015). How to reduce dimensionality of data:
Robustness point of view. Serbian Journal of Management, 10, 131-140.
Kalina,
J., & Schlenker, A. (2015). A robust and regularized supervised
variable selection. BioMed Research International, Article 320385.
Kalina,
J., Schlenker, A., & Kutílek, P. (2015). Highly robust analysis
of
keystroke dynamics measurements. Pp. 133-138 in Proceedings SAMI 2015,
13th International Symposium on Applied Machine Learning Intelligence
and Informatics. Budapest, Hungary: IEEE.
Ledoit, O., &
Wolf, M. (2003). Improved estimation of the covariance matrix of stock
returns with an application to portfolio selection. Journal of
Empirical Finance, 10, 603-621.
Lee, J.A., & Verleysen, M. (2007). Nonlinear dimensionality
reduction. New York, NY, USA: Springer.
Leskovec,
J., Rajaraman, A., & Ullman, J. (2014). Mining of massive
datasets,
2nd edn. Cambridge, UK: Cambridge University Press.
Lessmann,
S., Baesens, B., Seow, H.-V., & Thomas, L.C. (2015).
Benchmarking
state-of-the-art classification algorithms for credit scoring: An
update of research. European Journal of Operational Research, 247,
124-136.
Lichman, M. (2013). UCI Machine Learning Repository.
Available: http://archive.ics.uci. edu/ml (February 20, 2016). Irvine,
CA, USA: University of California.
Liu, B., Yuan, B., & Liu,
W. (2008). Classification and dimension reduction in bank credit
scoring system. Lecture Notes in Computer Science, 5263, 531-538.
Liu, D. (2014). Essays in theoretical and applied econometrics.
Montreal, Canada: Concordia University.
Pourahmadi, M. (2013). High-dimensional covariance estimation. Hoboken,
NJ, USA: Wiley.
Ratner,
B. (2012). Statistical and machine-learning data mining: Techniques for
better predictive modeling and analysis of big data, 2nd edn. Boca
Raton, FL, USA: CRC Press.
Roelant, E., Van Aelst, S., &
Willems, G. (2009). The minimum weighted covariance determinant
estimator. Metrika, 70, 177-204.
Schmarzo, B. (2013). Big data: Understanding how data powers big
business. New York, NY, USA: Wiley.
Taylor,
L., Schroeder, R., & Meyer, E. (2014). Emerging practices and
perspectives on Big Data analysis in economics: Bigger and better or
more of the same? Big Data & Society, 1, 1-10.
Varian, H.R. (2014). Big data: New tricks for econometrics. Journal of
Economic Perspectives, 28, 3-28.
Víšek, J.Á. (2008). The implicit weighting of GMM estimator. Bulletin
of the Czech Econometric Society, 15, 3-29.
Víšek,
J.Á. (2009). The least weighted squares I. The asymptotic linearity of
normal equations. Bulletin of the Czech Econometric Society, 15, 31-58.
Wang,
X., & Tang, X. (2004). Experimental study on multiple LDA
classifier combination for high dimensional data classification.
Lecture Notes in Computer Science, Vol. 3077, 344-353.
Zhu, Y.
(2015). Sparse linear models and l1-regularized 2SLS with
high-dimensional endogenous regressors and instruments. Available:
http://arxiv.org/pdf/1309.4193 (February 20, 2016). |
|
|
|
|