BIG DATA STATISTICS FOR BUSINESS (MOD.I)
The course aims to introduce the main features of Big Data. It has an applied focus and the theoretical level of the topics is aimed at understanding the practical aspects of the applications. The course focuses on the use of modern statistical learning techniques for forecasting and classification in business and economic contexts The course has an applied focus and the theoretical level of the topics is aimed at understanding the practical aspects of the applications. In particular the main descriptive statistics and the most popular graphical methods of exploratory data analysis are described. The introduction to the methodology is accompanied by the practical guide to the use of the statistical software R. Computer-based exercises complement the material discussed in class.
At the end of the course, the student will be able to
- analyse a multivariate data set
- build models to analyse and test hypotheses of interest and make predictions
- understand and develop a critical assessment of the implications of the results of the analysis the modelling component present in the economic and business literature.
The learning objectives of the course can be declined as follows:
Knowledge and understanding:
The student will be able to understand the main feature of a dataset, and to implement specific analyses to detect possible anomalies. Moreover he/she will have the knowledge of the methodologies to analyze Big Data according to the target of the analysis. He/she is also expected to know how to communicate the results of the analysis.
Ability to apply knowledge and understanding:
The student will be able to identify the most suitable method of statistical analysis for solving the proposed problem and achieving the required objectives. He/she will also be able to present the analysis carried out by producing summary reports including tables and graphs that can be transferred to third parties. At the end of the course, the student will also be able to identify actions that the analysis suggests should be implemented.
Autonomy of judgement:
once the main models of data analysis have been learned, the student will be able to understand the limits and areas of application of each of them. In this way, the student will be able to make critical use of the tools learned.
Communication skills: the student must be able to communicate in a clear, coherent and exhaustive way the characteristics of the data, the methods of analysis and the results obtained. This ability cannot be separated from the correctness of the vocabulary used and the ability to synthesize necessary to communicate the results achieved by the analyses conducted.
Learning skills: the student must demonstrate a good learning ability by being able to deepen their knowledge and interpretation skills on the basis of further relevant specialist texts. The student must demonstrate that he/she is able to apply statistical methods correctly and that he/she is familiar with the methods of analysis. He/she will also have the ability to integrate his/her knowledge by following the evolution of the discipline.
The course requires skills acquired in a basic statistics course (descriptive statistics, inferential statistics, simple linear regression).
The main quantitative methods for understanding, measuring and analysing data useful for studying markets, the social and cultural environment and economic factors will be introduced.
After illustrating the basic elements of Statistics, we will proceed with the study and application of modelling for complex data. The supervided and unsepervised learning methods useful for analysing markets and context factors will be examined in depth, During the course, the implementation of the illustrated analysis methodologies will be carried out with the help of statistical software.
The basics of Big Data: the five V’s. Structured and unstructured data. Exploratory data analysis for Big Data. Data visualization. Data handling. Supervised learning techniques: multiple, polynomial and spline regression. Classification and regression trees.
Unsupervised learning techniques in big data: dimensionality reduction and cluster analysis.
The course adopts a variety of teaching methods:
- classroom lectures and laboratory exercises with application of the studied techniques to economic and business data-sets to acquire the basic knowledge of statistical techniques.
- presentation and discussion of real cases aimed at developing interpretative skills of the methods and techniques of practical implementation.
- elaboration of an individual or group project: the general objective is to apply the analysis techniques discussed during the classoroom lessons to a real data set of interest to the working group. Data for the project can be obtained from Internet sites or developed by the students.
Attending students. The project is prepared in a group (maximum 4 people). During the course, datasets will be assigned for the project and indications will be given on the preparation of a final report. Groups can be formed freely. Those who have difficulty forming a group communicate their name via email (
Non-attending students: it is necessary to prepare: 1. A project proposal - maximum one page (indicatively at least one month before the exam) that contains:
- The names of the members of the group (only if the project is not individual)
- Description of the project (objectives, steps for its completion, etc.)
- Description of a data set (dimensions, variable names with their description) that will be used for the project.
Each group (or student) is invited to discuss with the teacher to individuate the difficulties that may exist in finding the data, in preparing the proposal, in identifying the suitable technique, in the analysis phase, in the phase of drafting the report.
One week before the exam the project must be delivered to
. It includes:
- the report contains the summary of the results (maximum 4 pages).
- the R code briefly commented.
The final grade is expressed in 30ths, with a maximum grade equal to 30 / 30ths. Honors are assigned at the discretion of the teacher, as a distinctive element of excellence in the work performed.
At the end of the course, the examination will consist of an oral interview based on the content of the syllabus and will cover
- Methodological aspects
- Discussion of the results of the project previously sended.
The discussion of the methodological aspects has a weighting of 70% on the overall assessment and aims to test understanding of the theory and the ability to interpret in practice the results of the methodologies discussed in the course.
The final group report has a weighting of 30% on the overall assessment.
The examination is passed with a minimum score of 18/30. In order to obtain the maximum score, the student will have to demonstrate, in addition to an excellent knowledge of the proposed methods and a thorough interpretation of the results of the project, a correct use of the specialist vocabulary.
Sedkaoui, Data Analytics and Big Data, Wiley, 2018.
James, Witte, Hastie Tibshirani, An Introduction to Statistical Learning with Applications in R, (chapter: Springer, 2013 (a legally downloadable pdf copy of the full text is available at https://www.statlearning.com/).
Lecture notes, support material provided on the e-learning platform.