This paper proposes a template for modelling complex datasets that
integrates traditional statistical modelling approaches with more
recent advances in statistics and modelling through an exploratory
framework. Our approach builds on the well-known and long standing
traditional idea of `good practice in statistics' by establishing a
comprehensive framework for modelling that focuses on exploration,
prediction, interpretation and reliability assessment, a relatively
new idea that allows individual assessment of predictions.
¶
The integrated framework we present comprises two stages. The first
involves the use of exploratory methods to help visually understand
the data and identify a parsimonious set of explanatory variables.
The second encompasses a two step modelling process, where the
use of non-parametric methods such as decision trees and generalized
additive models are promoted to identify important variables and
their modelling relationship with the response before a final
predictive model is considered. We focus on fitting the predictive
model using parametric, non-parametric and Bayesian approaches.
¶
This paper is motivated by a medical problem where interest focuses
on developing a risk stratification system for morbidity of 1,710
cardiac patients given a suite of demographic, clinical and
preoperative variables. Although the methods we use are applied
specifically to this case study, these methods can be applied across
any field, irrespective of the type of response.