John Libbey Eurotext

Environnement, Risques & Santé


Completing partial non-responses in a very hierarchical questionnaire: An original semiautomatic method Volume 9, issue 3, Mai-Juin 2010


See all figures

Union des caisses nationales de sécurité sociale 18 avenue Léon Gaumont 75980 ParisFrance, Observatoire de la qualité de l’air intérieur Centre scientifique et technique du bâtiment (CSTB) 84, av Jean Jaurès, Champs sur Marne 77447 Marne la ValléeFrance
  • Key words: bias, data bases, imputation, interpretation, statistical, missing values, questionnaires, surveys, statistics
  • DOI : 10.1684/ers.2010.0345
  • Page(s) : 223-30
  • Published in: 2010

Partial non-response in surveys (when the respondent answers some but not all the questions) is inevitable and results in missing values in the associated database. These missing data may cause bias, distorting the statistical analyses, reducing the precision of the estimators, and even making it impossible to use the most current methods of multidimensional statistical analysis. It is thus essential, although complex, to deal with these partial non-responses. The 2003-2005 homes survey of the indoor air quality observatory (OQAI) was no exception. Although the proportion of missing values for answers was low, less than 1% of the collected data, it represents nearly 7000 values, dispersed throughout the database. Most variables had at least one missing value, and all homes in the sample were affected. A procedure for filling in the missing values by statistical imputation was chosen, developed and implemented. It allows a quasi-deterministic imputation, conducted to test the approach and evaluate its robustness, and a random imputation. It consists in using statistically established links between the answers to the variables by a “tree model” method and has the great advantage of respecting the hierarchical sequence of the questions.