ARTICLE
Auteur(s) : Cédric
Duboudin1,2
1Agence française de sécurité sanitaire de
l’environnement et du travail (Afsset) 253, avenue Général Leclerc
94700 Maisons Alfort
2Union des caisses nationales de sécurité sociale,
18 avenue Léon Gaumont 75980 Paris
Article reçu le 7 Mai 2009, accepté le 16 Juillet 2009
From October 2003 through December 2005, the Indoor Air
Quality Observatory (OQAI, Observatoire de la qualité de l’air
intérieur) conducted a national survey to measure air quality in
567 French homes randomly selected to be representative of the
24 million primary residences in mainland France. It measured
30 physical, chemical and biological pollutants [1-4].
The results of the measurements, pollutant by pollutant, were
presented in an OQAI report [5] at the end of 2006 and
published in mid-2007 [6]. This first stage demonstrated the
concentration, variability and distribution of each pollutant
within homes. The final purpose of the study, however, was to
describe the air quality of the homes not pollutant by pollutant
but overall, considering them all simultaneously [7, 8].
Accordingly, in a report published in the last issue [9], we
analysed the pollutants overall, specifically the statistical
correlation scores between the concentrations measured within the
homes. This analysis identified four groups of volatile organic
compounds (VOCs), correlated or very closely correlated to each
other. The first group of closely correlated VOCs contained the
aromatic hydrocarbons and the second group two aliphatic
hydrocarbons, linked to the first group. The third group was made
up of aldehydes and the fourth group of two halogenated
hydrocarbons.
The second part of this study considers the homes rather than
simply the pollutants. Here we divided the sample into groups of
comparable homes in terms of multiple pollution levels to
characterise the type of the chemical mixtures to which their
occupants are exposed. These types depend on the simultaneous
presence of several pollutants, at various levels. The statistical
analysis was carried out in two stages: the first phase
concentrated on the study of VOCs alone and the second phase on
introducing the other physical and biological agents identified
during the survey.
The background information about the methodology of the OQAI
national homes survey [1-4], the selection of the study parameters
and the construction of the data tables for analysis are described
in part I [5]. All the data used come from the OQAI’s
2003-2005 national home survey database.1
Material and Method
The purpose of the analysis described here was to identify
homogenous sets of homes in terms of global pollution. It was
divided into two stages: the first concerned only VOCs; the
remaining pollutants were introduced during the second stage.
Study of the VOCs alone
Pre-processing of the data
The first phase of this analysis is based on the data table
comprising the 532 homes with no missing values for any of the
18 VOCs (table 1), [5, 6]).
Here we studied the concentration values, rather than ranks that
were essential for the pollutant study [9]. Rank is no longer
relevant here, and the transformation into ranks hides the real
differences in terms of concentration levels.
To minimise the influence of extreme values (several orders of
magnitude above the median level), the concentration values were
limited to twice the 95th percentile2 of the sample for
each VOC (table 1): any value which
exceeded this limit was replaced by the limit in question. The data
were therefore doubly censored, for low values by the metrologic
limitations (the detection and quantification limits) and for high
values for the needs of the statistical analysis.
Levelling down the highest values makes it possible to group
homes that are not in fact very close. If the values were not so
limited, however, all the data would be levelled down to the profit
of a tiny minority of extreme values.
The actual concentration values were then
standardised3 to a value from 0 to 1 for each
VOC to ensure that each had the same weight in the analysis,
independently of the order of magnitude of the concentrations for
each. No hierarchy, relative to health or anything else, was
introduced.
Table 1 Upper limits of the volatile organic compounds
(VOC) concentrations selected for the statistical analysis of the
homes.Tableau 1. Bornes supérieures retenues sur les concentrations
des composés organiques volatils (COV) pour l’analyse statistique
des logements.
|
Substances
|
Codes
|
(DL) QL in μg/m3
|
Minimum
|
Median
|
95th p
|
Maximum
|
Limits
|
|
benzene
|
c41
|
(0.4) 1.1
|
0
|
2.01
|
7
|
23
|
14
|
|
toluene
|
c44
|
(0.4) 1.3
|
1.51
|
11.92
|
79.5
|
414
|
159
|
|
m+p-xylene
|
c48
|
(0.5) 1.5
|
0.75
|
5.48
|
36.5
|
233
|
73
|
|
o-xylene
|
c50
|
(0.2) 0.6
|
0
|
2.25
|
13.7
|
112
|
27
|
|
124-trimethylbenzene
|
c52
|
(0.03) 0.1
|
0
|
4.04
|
21
|
112
|
42
|
|
ethylbenzene
|
c47
|
(0.3) 0.9
|
0
|
2.24
|
12.7
|
85
|
25
|
|
styrene
|
c49
|
(0.1) 0.3
|
0
|
0.93
|
2.8
|
35
|
6
|
|
n-decane
|
c54
|
(0.07) 0.2
|
0
|
5.37
|
53
|
1774
|
106
|
|
n-undecane
|
c56
|
(0.5) 1.4
|
0
|
6.2
|
63.8
|
502
|
128
|
|
trichloroethylene
|
c43
|
(0.4) 1
|
0
|
1.01
|
8.5
|
4087
|
17
|
|
tetrachloroethylene
|
c45
|
(0.4) 1.2
|
0
|
1.41
|
7.6
|
684
|
15
|
|
1.4-dichlorobenzene
|
c53
|
(0.07) 0.2
|
0
|
4.11
|
244.1
|
4810
|
488
|
|
formaldehyde
|
a21
|
(0.6) 1.1
|
1.29
|
19.68
|
45.7
|
86
|
91
|
|
acetaldehyde
|
a22
|
(0.3) 0.4
|
2.09
|
11.46
|
31.1
|
95
|
62
|
|
acrolein
|
a23
|
(0.1) 0.3
|
0
|
1.08
|
3.4
|
13
|
7
|
|
hexaldehyde
|
a24
|
(0.1) 0.2
|
1.62
|
13.55
|
49.9
|
369
|
100
|
|
1-methoxy-2-propanol
|
c42
|
(0.5) 1.8
|
0
|
0.9
|
17.8
|
115
|
36
|
|
2-butoxy ethanol
|
c51
|
(0.4) 1.5
|
0
|
1.54
|
10.1
|
61
|
20
|
Data analysis
The identification and characterisation of the groups of homes
relatively similar in terms of VOC pollution was conducted in four
successive stages, as described below.
The first stage divided the sample relatively finely into
homogeneous groups of homes in terms of VOC pollution, by applying
Kohonen’s self-organising maps method [10, 11], based on the
principle of unsupervised neural networks. Increasingly used as a
tool for the analysis and visualisation of multidimensional data,
this classification method partitions a set of observations into
groups whose elements are similar in respect to the study variables
considered simultaneously. The “mean”4 level of each
study variable is calculated for each group, which is thus
characterised by these values. These groups are (most often)
organised in a flat structure of discrete neighbourhoods that takes
their proximity into account. This structure, the dimensions of
which are defined in advance based on the total number of
observations, is called a topological map.
This approach allows the study variables themselves to be used
for comparing and interpreting the groups. There is no projection
into a new space, as in methods such as principal component
analysis. Furthermore, the spatial discretisation enables the
metric to adapt locally to the structure of the data. This method
is therefore suitable for cases where the distributions of values
are skewed and encompass several orders of magnitude.
We then mapped the data (i.e., produced maps) twice,
sequentially, first using a random map and then using the results
of the first map. This procedure produces more detailed and stable
results.
In this case, a flat square map of 7x7=49 cells was
selected beforehand, based on the number of homes in the analysis
(532). This process allowed us to partition the sample into
49 groups of homes that were homogeneous in terms of air
quality for the 18 VOCs, considered simultaneously. These
49 groups were spread over the topological map according to
their proximity (figure 1). To
distinguish them from the final groups resulting from the
clustering of these initial 49 groups, the latter are called
subgroups.
The mean levels of the standardised concentrations (between
0 and 1) of each VOC of each subgroup of homes were
represented simultaneously on the same spider chart table: each
spider chart corresponds to a subgroup of homes and each branch of
the web to a VOC. The length of the branch thus corresponds to the
average concentration of this VOC in the group of homes being
considered: the longer the branch of a VOC on a spider web, the
higher its concentration in a subgroup of homes represented by the
web, when compared with the other homes. The neighbourhood of the
spider webs on the diagram corresponds to that on the topological
map, and consequently to the proximity between each of the
subgroups.
In the second stage, some subgroups were clustered to reduce the
number of groups and to increase the number of homes in each. This
consolidation is mainly based on the hierarchical ascending
contiguity constraint classification method (HACCC), which makes it
possible to classify housing subgroups with respect to their mean
levels of VOC concentrations, while respecting the structure of the
discrete neighbourhood of the topological map [11, 12]. The
contiguity limitation can be stated as follows: only the subgroups
of neighbouring clusters on the topological map (contiguous in a
vertical, horizontal or diagonal direction) can be aggregated. When
two subgroups are clustered, the neighbours of one become
neighbours of the other and vice versa.
This HACCC classification is based on Euclidian distance and
Ward’s variance clustering criterion5 [9]. It takes into
account the initial number of units (here, houses) contained in
each subgroup from the first topological mapping and thus respects
the logic of Ward’s clustering criterion, the calculation of which
depends on the number in each entity. The number in each subgroup
cluster is summed each time. This method was specifically
programmed for this study.
The analysis of the representation by spider charts and the
comparison of the pollutant distributions associated with each
subgroup were also taken into account for this consolidation. They
led to the introduction of an additional limitation on one of the
subgroups, the neighbourhood of which was specifically defined to
favour its clustering with certain contiguous subgroups rather than
others.
The third stage consisted of characterising the groups of homes
resulting from the HACCC in respect to their VOC concentrations and
determining on what VOCs the groups differ one of the other one.
Implemented by a statistical test based on the test-value [13],
applied to the median through a procedure of jack-knife resampling
[14], this procedure estimated the probability p that the median
concentration of a VOC for a given group of homes was equivalent to
the median concentration for the entire set of homes in the
analysis (532) (bilateral test). This procedure was thus applied to
each VOC and to each group. The lower this probability p is, the
stronger the statistical link between the VOC considered and the
group. The VOCs thus selected as descriptors for each group are
those for which the link with the group is significant at the
threshold of 0.001 and those for which the ratio between the median
value of the group and median value of the sample was greater than
1.5. Raw concentration values were used for this step, i.e., the
values were not limited in cases of high values and were not
standardised, so that this step could take into account the reality
of the groups.
In the final stage, we estimated the percentage of the national
housing stock represented by each of the groups obtained.
Methodological problems prevented consideration of the weightings
of the homes resulting from the adjustment [5, 6] in the analyses
with the Kohonen maps. These weightings were however used
afterwards in estimating the percentage represented by each group
at the national level. The adjustment was carried out on
567 homes. Because the classification of the homes was carried
out on a working sub-sample of 532 homes, the weighting was
corrected by simple standardisation so that their total for the
532 homes corresponds to the total number of French main
residences. A confidence interval of 95% associated with these
percentages was calculated with the Blyth and Still formula
[15].
Consideration of the other pollutants
The second stage of the analysis (taking into account the entire
set of pollutants – table 2 and [5,
6]) identified the possible links between the groups of homes
obtained from the analysis of the VOCs and pollutants other than
the VOCs, using the statistical test on the median previously
described. A pollutant was considered to be associated with a
group of homes when the p value exceeded 0.01.
A Kohonen map using all of the pollutants simultaneously turned
out not to be relevant because the correlations observed between
the VOCs and the other pollutants in Part II of this study (the
pollutant analysis) were so low [9]. Furthermore, the number of
missing values for the parameters excluding the VOCs would have
limited this analysis to 147 homes.
Table 2 List of the other pollutants, excluding
volatile organic compounds (VOCs), measured in the OQAI
(Observatoire de la qualité de l’air intérieur) housing survey
(2003-2005) [5, 6] and considered in this study.Tableau 2. Liste
des paramètres, autres que les composés organiques volatils (COV),
mesurés dans la campagne logement de l’OQAI (2003-2005) et pris en
compte dans cette étude.
|
Non-chemical pollutants
|
Codes
|
Units
|
Selected place of measurement
|
|
Dust mite allergens in dust Derf1 (QL=0.01)
|
aca11
|
μg/g
|
Bedroom
|
|
Dust mite allergens in dust Derp1 (QL=0.02)
|
aca12
|
μg/g
|
Bedroom
|
|
Cat allergens (Fel d1) (QL=0.18)
|
alr31
|
ng/m3
|
Living room
|
|
Dog allergens (Can f1) (QL=1.02)
|
alr32
|
ng/m3
|
Living room
|
|
Concentration of particles with a diameter < 10 microns
(PM10)
|
PM10
|
μg/m3
|
Living room
|
|
Concentration of particles with a diameter < 2.5 microns
(PM2.5)
|
PM2.5
|
μg/m3
|
Living room
|
|
Radon concentration (2 months)
|
rad.br rad.lr
|
Bq/m3
|
Bedroom and living room
|
|
Intensity of Gamma radiation
|
gamma
|
μSV/h
|
Living room
|
|
Carbon monoxide Maximum, moving average over 15 mins Maximum,
moving average over 30 mins Maximum, moving average over 1 hour
Maximum, moving average over 8 hours
|
CO comax15 comax30 comax1h comax8h
|
ppm
|
Living room
|
|
Mean relative humidity over the observation week
|
hum
|
%
|
Bedroom
|
Results
Mapping the VOC pollution
The mapping of the VOC pollution in spider charts (figure 2) showed a
wide variety of webs and thus of various types of pollution within
the homes. A diagonal topological arrangement can be seen: at
the top left of the map, the webs have several long branches but
their number and size gradually tend to diminish down to the bottom
right of the map, where the webs are almost small points. The homes
presenting the highest concentrations of several VOCs
simultaneously are therefore positioned at the top left of the map
whereas the homes with the lowest concentration levels for the
entire set of VOCs are found at the bottom right.
Figure 2
also shows the presence of subgroups of homes schematically
definable as polluted by a single item: the corresponding spider
charts have one very predominant branch. The concentration of one
VOC overwhelms the others. Eight subgroups of “single-pollutant”
homes were identified. They corresponded to the following eight
VOCs: 1,4-dichlorobenzene, n-undecane, 1-methoxy-2-propanol,
styrene, trichloroethylene, tetrachloroethylene, 2-butoxyethanol
and formaldehyde.
The ratio between the length of the longest branch and the
length of the second longest branch was calculated for each spider
chart. Spider charts for the “single-pollutant” subgroups were
distinguished by the fact that one branch was at least twice as
long as any of the others. For the other subgroups, this ratio was
less than 2.
The HACCC conducted on the subgroups that were not considered
“single-pollutant” homes partitioned the topological map into six
groups (figure 3).
The first of these groups (at the top left on figure 3), made up of
4 subgroups, combined the 32 homes of the sample most
polluted by several VOCs simultaneously, in particular by the
aromatic hydrocarbons (group a). The second group contained the
13 homes most highly polluted by aliphatic hydrocarbons (b).
These two groups therefore brought together the homes that were the
most highly polluted by several VOCs. The opposite extreme on the
spider chart map, at the bottom right, grouped together the
161 least polluted homes of the sample, regardless of the VOC
considered (e).
Three other groups, linked by the HACCC classification, were
identified between these two extremes and therefore between these
two groups of homes typed according to pollution levels. In order
of decreasing pollution levels, a group of 54 homes (d) with
moderate levels of pollution, in particular for the aldehydes,
appeared first, followed by a group of 78 homes (c) also
presenting moderate pollution levels, this time for aromatic
hydrocarbons, and a group of 72 homes with lower levels of
pollution (e2) unassociated with any particular type of VOC.
Groups c and d are conceptually close, characterised as they are
by moderate levels for several VOCs simultaneously. They were not
directly linked by the HACCC, however, because the types of
pollution differed: aldehyde pollution was greater in the first
group and aromatic hydrocarbon pollution in the second.
Group e2 was closer than any of the other groups to the
sample of 532 homes in terms of median concentrations. It was
mid-way between group e, which had median concentrations
significantly lower than those of the overall sample for virtually
all the VOCs, and groups c and d for which the concentrations were
significantly higher than those of the sample as a whole for
several VOCs at the same time. The main characteristic of
e2 is that it is the median group of the sample.
Overall, therefore, 14 groups were identified, and again
they could be classified into the four major types described here:
housing with high levels of multiple pollutants (groups a and b),
with high levels of different single pollutants (i to m),
moderately polluted by multiple agents (c and d), and “slightly
polluted” (e and e2). It should be borne in mind that this study
adopts a relative point of view, for it compares the homes to one
another in terms of their VOC concentration, and that pollutant
toxicity has not been considered at all. The pollutants have not
been weighted in terms of the danger they present, nor have any
threshold toxicity values been considered. Thus the terms “highly”,
“moderately” and “slightly” polluted cannot be directly interpreted
from a health point of view; they are simple relative
concentrations within the overall set of homes. Similarly,
describing pollution as primarily due to a single agent does not
mean there is only a single VOC present in the homes in question,
but rather that one of them is predominant relative to the others
in terms of concentrations, in comparison to the other homes.
Characterisation of the groups of identified
homes
Housing highly polluted by multiple agents
The homes that were the most highly polluted by several VOCs
simultaneously represented 8.5% of the sample analysed (i.e.,
45 of 532 homes) and potentially 9.6% [7%; 13%] of homes
nationally.6 They were characterised by a median
concentration for around 7 VOCs that ranged from 2 to
20 times higher than that of the overall sample. Two groups
could be distinguished: in one, the pollution was mainly from
aromatic hydrocarbons (a) and in the other, mainly from aliphatic
hydrocarbons (b).
Group a. This group comprised 32 homes accounting for 6.2%
of the sample and potentially 6.6% [5%; 9%] of homes nationally
basis. These homes had concentrations much higher than the median
values of the sample as a whole for aromatic hydrocarbons:
m+p-xylene (the median value of the concentrations7
measured in this group for this VOC was 10 times greater than
that of the sample), toluene (9 times), o-xylene
(8 times), ethylbenzene (7 times), benzene
(4 times), 1,2,4-trimethylbenzene (4 times), and styrene
(1.7 times) and, to a lesser extent, for the aliphatic
hydrocarbon –n-undecane (twice). The homes of this group contained
an average of 7 VOCs (and a minimum of 5) for which the
concentrations were among the highest 10% of concentrations
observed in the sample.
Group b. This group comprised 13 homes representing 2.4% of
the sample and potentially 3% [2%; 5%] of homes nationally. They
presented concentrations much higher than the median values of the
sample, first for the two aliphatic hydrocarbons, n-undecane (the
median value of the concentrations measured for this VOC for this
group was 26 times higher than that of the sample), n-decane
(20 times) and, to a lesser extent, for the aromatic
hydrocarbons: 1,2,4-trimethylbenzene (8 times), o-xylene
(5 times), m+p-xylene (4 times), hexaldehyde
(3 times), ethylbenzene (3 times) and styrene
(1.8 times). The benzene concentrations in the houses of this
group were, however, significantly lower than in the total sample.
As in group a, the homes of this group contained an average of
7 VOCs (and a minimum of 3) for which the concentrations were
among the highest 10% in the overall sample.
Moderately multiply polluted housing
The homes moderately polluted by several VOCs simultaneously
represented 25% of the analysed sample (i.e., 132 of
532 homes) and potentially 26.7% [23%; 31%] nationally. They
were characterised by a median concentration 1.5 to
2.5 times greater than that of the overall sample for
4 to 7 VOCs simultaneously. Here again, two groups can be
distinguished: one mainly concerned by aromatic hydrocarbons (c)
and the other by aldehydes (d).
Group c. This group was made up of 78 homes representing
14.7% of the sample and potentially 15.7% [13%; 19%] nationally.
These homes had concentrations significantly higher than the median
values of the sample, mainly for aromatic hydrocarbons: benzene,
toluene, ethylbenzene, m+p-xylene and o-xylene, as well as for
n-decane. The median value of the concentrations of the latter
measured in this group was about 1.5 to 2 times greater
than that of the sample. For the other VOCs, the median
concentration levels for this group were close to those of the
sample. The homes of this group contained on average only
1.4 VOCs for which the concentrations were among the highest
10% of values observed in all the monitored homes, but on average
4.5 VOCs for which the values were among the highest 20%.
Group d. This group was made up of 54 homes representing
10% of the sample and potentially 11% [9%; 14%] of homes
nationally. The concentrations in these homes were higher than the
median values of the sample for some aldehydes (acetaldehyde,
acrolein, hexaldehyde and formaldehyde) and for some aromatic
hydrocarbons (benzene, toluene, styrene, o-xylene, 2-butoxy ethanol
and n-decane). The median values of the concentrations measured for
this VOCs were about 1.5 to 2.5 times greater than that
of the sample. The homes of this group contained an average of
3.4 VOCs for which the concentrations were among the highest
10% observed, and 7 VOCs for which the values were among the
highest 20%.
Slightly polluted housing
The slightly polluted homes represented 44% of the analysed sample
(i.e., 233 of 532 homes) and potentially 40% [36%; 45%]
of homes nationally. They were characterised by median
concentration levels equal to or lower than those of the sample as
a whole and were subdivided into two groups, e and e2.
Group e2. This group was made up of 72 homes representing
13.5% of the sample and potentially 11.4% [9%; 15%] homes
nationwide. This group had a median concentration for the two
glycol ethers, 1-methoxy-2-propanol (5 times) and 2-butoxy
ethanol (1.5 times), higher than that of the overall sample
and equivalent to that of the concentration of the overall sample
for the other VOCs.
Group e. This group of 161 homes represented 30% of the
sample and potentially 28.3% [25%; 33%] nationally. The median
concentration values for this group were lower than those of the
overall sample for all 18 VOCs and significantly lower
(p<0.001) for 14 of them.
Housing with high levels of a single
pollutant
Homes with high levels of single pollutants accounted for 23% of
the sample (i.e. 122 of 532 homes) and potentially 24%
[21%; 28%] nationally, with median concentration levels between
5 and 400 times higher than those of the complete sample
for each single VOC. Eight subgroups could be distinguished,
corresponding to 8 VOCs: 1,4-dichlorobenzene, n-undecane,
1-methoxy-2-propanol, styrene, trichloroethylene,
tetrachloroethylene, 2-butoxyethanol and formaldehyde. The homes of
this group thus contained, on average, 2 VOCs (including the
one characterising the group) for which the concentrations were
simultaneously among the highest 10% for the overall sample. As far
as the other VOCs are concerned, the median concentration levels in
these groups were very close to those of the whole sample (unless
otherwise indicated and detailed in each group below).
Group f. This group is made up of 20 homes representing
3.8% of the sample and potentially 2.6% [1%; 5%] of homes
nationwide. It had a median concentration much higher than that of
the overall sample for 1,4-dichlorobenzene (400 times
higher).
Group g. This group of 8 homes represented 1.5% of the
sample and potentially 1.7% [0%; 3%] of homes nationally, with a
median concentration much higher than that of the sample as a whole
for n-undecane (13 times higher) and, to a lesser extent, for
n-decane (5 times). However, concentrations were significantly
lower here than in the overall sample for formaldehyde, o-xylene
and toluene.
Group h. The 10 homes in this group represented 1.9% of the
sample and potentially 2.9% [2%; 5%] of homes nationally. They had
a median concentration higher than that of the overall sample for
styrene (5 times) and 1-methoxy-2-propanol (6 times),
although statistical significance was lower for the
latter8 (p = 0.01).
Group i. This group of 15 homes represented 2.8% of the
sample and potentially 2.7% [1%; 5%] of homes nationally. Its
median concentration was much higher than that of the overall
sample for trichloroethylene (25 times) and, to a lesser
extent, for tetrachloroethylene (twice).
Group j: This group was made up of 14 homes representing
2.6% of the sample and potentially 2.3% [1%; 4%] of the homes on a
national basis. Its median concentration was much higher than that
of the sample as a whole for 1-methoxy-2-propanol (25 times)
and, to a lesser extent, for hexaldehyde (twice).
Group k. The 26 homes in this group represented 4.9% of the
sample and potentially 5.7% [4%; 8%] of the homes nationwide. Their
median concentration was much higher than that of the sample for
tetrachlorethylene (11 times) and, to a lesser extent, for
2-butoxy ethanol (twice).
Group l. This group of 10 homes represented 1.9% of the
sample and potentially 1.6% [0%; 3%] of homes nationally. This
group presented a median concentration value that was much higher
than that of the sample for 2-butoxy ethanol (15 times).
Group m. This group was made up of 19 homes representing
3.6% of the sample and potentially 4.5% [3%; 7%] of homes
nationally, and its median concentration was higher than that of
the sample for formaldehyde (twice as high) and, less
significantly, for 2-butoxy ethanol (twice) (p = 0.02). For the
other VOCs, the median concentrations in this group tended to be
lower than those of the overall sample.
Relation between the VOC pollution pattern
and the other pollutants
The relationships between the VOC pollution patterns described
above and the other pollutants, including allergens, particles,
radon, gamma radiation and carbon monoxide, are presented here.
Relative humidity is also considered.
The statistical test procedure based on the median allowed us to
identify the agents other than VOCs significantly associated with
one or another of the 14 groups obtained by mapping the VOC
pollution. We note that the level of significance required here is
not as high as in the analysis of the VOCs alone. The following
points describe the results of this analysis.
Significant positive associations were observed between some
agents other than the VOCs and groups b and d:
Group b (homes heavily polluted by multiple VOCs, in particular
by aliphatic hydrocarbons) was significantly (p<0.01) associated
with higher concentrations of cat allergens: its median
concentration in this group was 6 times higher than that of
the overall sample analysed;
Group d (homes moderately polluted by multiple VOCs, aldehydes
in particular) was significantly (p<0.01) associated with the
following agents: CO (maximum mean values over 15 mins,
30 mins, 1 h and 8 h), PM10 and
PM2.5, and dust mite allergens (Der f 1). Their median
concentrations in group d were 1.5 to 5 times higher than
those of the sample.
On the other hand, significant negative associations were
observed between some pollutants other than the VOCs and groups e,
k and l:
Group e (less polluted homes) was significantly (p<0.001)
associated with PM10 and PM2.5 concentrations
slightly lower than those of the sample. Groups e and e2 (the
group of median homes) were both significantly associated with
lower concentration levels of CO as well;
Group k (homes polluted predominantly by tetrachloroethylene)
was significantly (p<0.005) associated with a lower
concentration of dust mite allergens (Der p 1), gamma radiation and
radon;
Finally, group l (homes polluted predominantly by 2-butoxy
ethanol) was significantly (p<0.001) associated with lower radon
levels.
The groups formed by the VOC analysis thus had few links to the
other pollutants. These results appear to be consistent with the
correlation analysis between pollutants.
Discussion
The methodological choices made in relation to data collection were
discussed in detail in part I of this study [9], especially the
influence of the room used and the temporal distribution of the
measurements. Accordingly, they are mentioned only briefly here.
Instead, we discuss principally the statistical methodology applied
in this second part of this study.
The statistical approach
The important point about the statistical approach requiring
discussion is our use of a hierarchical classification method that
respects the topological constraints resulting from the Kohonen
map.
We considered Kohonen’s self-organising maps method to be the
most relevant methodology in view of the objectives and constraints
of our analysis of homes. The results of this approach, and more
particularly the topological order obtained (the proximity between
the 49 subgroups of homes), should not therefore be called
into question in the second stage of the analysis, aimed at
obtaining a smaller number of groups and based on the hierarchical
classification method.
Hierarchical classification methods were developed by Yacoub
et al. specifically to exploit the results of Kohonen’s
topological maps [16, 17]. They allow the same metric to be used
and consequently preserve the topological order. Their complexity,
however, led us to prefer a simpler, albeit imperfect, approach. We
therefore set up and implemented a method based on respecting the
contiguity constraints in the aggregations of the hierarchical
classification. Because this approach, which combines distance and
constraints, can sometimes lead to difficulties in the
interpretation of the results [11], we analysed the impact of these
constraints for the patterns identified, carrying out hierarchical
ascending classification without constraints from the subgroups
taken from the topological map.
It should be noted first that hierarchical classification
without constraints did not result in clustering the subgroups far
from each other on the topological map. Had that occurred, it would
have raised questions about the overall hierarchical classification
approach to these subgroups.
Lifting the neighbourhood constraints reduced the number of
homes in the least polluted group (e) and increased the size of the
group moderately polluted by aromatic hydrocarbons (c). The other
12 groups are identical to those obtained from the
classification with constraints, but the proximities between these
groups, in other words, when they aggregate, differed markedly.
One additional constraint was added to one of the subgroups on
the topological map to force it to cluster with certain
neighbouring subgroups rather than others. The lifting of this
constraint, increased the number of homes included in the group
moderately polluted by multiple aldehydes (d) and reduced the
number of those in the groups heavily (a) and moderately (c)
polluted by multiple aromatic hydrocarbons; the other
11 groups remained unchanged.
Accordingly, lifting these two types of constraints did not
fundamentally change the patterns we identified. Furthermore, these
constraints tended to increase the number of homes included in the
groups that were the most different, that is, the most and least
heavily polluted groups, to the detriment of the moderately
polluted or intermediate groups. This result in itself is quite
satisfactory.
Choices about data collection
The discussion in the first part of this study [9] pointed out that
VOC pollution inside homes was relatively homogeneous and that the
correlations between pollutants depended only slightly on the rooms
in which the measurements were taken. Similarly, the season of data
collection had a moderate influence on the concentrations measured
and on the inter-pollutant correlation scores. Two additional
questions may be asked: i) Is the pattern of the pollution
identified that of the home or only of the room where the
measurements were taken?; ii) Is this pattern influenced by the
season during which the data were collected?
To answer the first question, we analysed the relation between
the pollution breakdown and the two types of homes previously
defined. This analysis indicates that the group of homes described
as moderately polluted by aldehydes (d) is significantly
(p<0.01) associated with the category of studio flats. This
association, however, is the only one found. None of the
14 groups of homes identified was associated with closing the
bedroom door night and day. Taking readings from a single room or
using a room partly cut-off (door closed) does not appear to have a
great influence on the correlation scores between the pollutants or
on the pattern of multiple pollution within the homes.
To answer the second question, about the effect of seasons, we
analysed the relationship between the pollution pattern and the
season. Statistically significant links were observed between some
of the 14 groups of homes and the season of measurement (see
table 3).
In particular, the association between the group of slightly
polluted homes (e alone as well as e and e2 together) and the
months of June, July and August together is significant, as is, at
the other end of the scale, the association between the groups
either moderately polluted by many agents or heavily polluted by a
single agent and the heating period.
The season during which the data were collected therefore has an
influence on the pollution pattern that cannot be considered to be
determined only by the characteristics of the housing and the
households. A home belonging to one class during the heating
period may belong to another class outside the heating period. The
pattern identified can be considered to be an overall snapshot over
a year.
Table 3 Significant links (p<0.01) between groups of
homes (pattern of multiple pollution) and season of
measurements.Table 3. Liens significatifs (p < 0,01) entre
groupes de logements (typologie de la pollution multiple) et saison
de mesures.
|
Group of homes
|
Season
|
RR*
|
|
Moderately polluted by aromatic hydrocarbons (c)
|
Heating period
|
1.3
|
|
Moderately polluted by aldehydes (d)
|
March-April-May
|
1.6
|
|
Least polluted (e)
|
June-July-August
|
1.8
|
|
Slightly polluted (e and e2)
|
June-July-August
|
1.6
|
|
Polluted principally by tetrachloroethylene (k)
|
Not the heating period
|
1.6
|
|
Polluted principally by 2-butoxy ethanol (l)
|
September-October-November
|
2.4
|
|
Polluted principally by formaldehyde (m)
|
Not the heating period
|
2.2
|
|
Polluted principally by n-undecane (g)
|
Heating period
|
1.5
|
Consideration of the sample adjustment
The analyses of the pollutants (Part I) [9] and of the homes were
carried out without adjustment weighting [5, 6] for methodological
reasons: i) the method used here does not allow these weightings to
be taken into account; ii) the statistical analyses were carried
out on a working subsample of 532 homes and not on the sample
of 567 homes. However, the weighting of the homes concerned
was standardised and used retroactively to evaluate the impact of
adjustment on the analysis of the inter-pollutant correlations [9]
and to estimate the percentage of homes that each group might
represent at a national level.
The percentages of homes that each group represents within the
sample are quite close to their estimate at the national level
(taking the adjustment into account). The differences are lower
than one percent for most of the groups.
Against expectations, the group of slightly polluted homes (e
and e2) may represent (taking the adjustment into account) a
smaller proportion of the national housing stock than of the sample
(41% against 44%). That is, because this group was significantly
associated with the summer (non-heating) period, which is
under-represented within the sample, the size of this group might
be expected to increase to the level of the national housing stock
after adjustment. After checking, although the adjustment does
allow for a “correction” of the distribution between the heating
period and the period without heating, the summer period (June,
July, August) remains under-represented (21.6% instead of the 25%
expected). The adjustment does not completely correct the
under-representation of the summer period, which most probably
leads to an under-representation of the category of “slightly
polluted” homes.
Conclusion
Two multidimensional descriptive statistical approaches were
applied to the data on the concentrations of VOCs and other
pollutants or harmful agents (allergens, particles, radon, gamma
radiation, carbon monoxide and relative humidity), measured in the
homes included in OQAI‘s national survey. The first approach looked
at the pollutants to determine whether the pollutants were present
together within the homes (part I) [9]. The second looked at the
homes to classify them into homogeneous groups in relation to the
pollution (part II). In each case, the VOCs alone were studied
first and then the other agents.
These two series of analyses were complementary and the results
obtained convergent. The complex methodological choices made here
were influenced by the aims of the study on the one hand, and by
the nature of the data on the other hand. These proved to be
appropriate in relation to the results obtained, and the discussion
of these results has generally shown their robustness.
The homes analysis identified four types of housing in relation
to their pollution by VOCs: highly polluted by multiple VOCs,
highly polluted by one principal VOC, moderately polluted by
multiple VOCs, and lightly polluted. These groups are subdivided
into 14 groups. A few associations between these groups
and agents other than the VOCs were then identified.
It should be stressed that this study adopts a relative point of
view: it ranks the homes in relation to each other in terms of
pollutants, without any consideration of their potential toxicity.
Thus the terms “highly”, “moderately” and “slightly” polluted
cannot be directly interpreted from a health point of view; they
consider relative concentrations within the homes compared with
each other.
The analysis of the data of the OQAI pilot survey indicates that
pollution is homogeneous within a home, at any rate insofar as its
main rooms are concerned. Thus the pattern of pollution described
below may be considered representative of the entire home and not
solely of the room surveyed. The study did, however, demonstrate
the influence of season on indoor pollution measurements. Because
data collection took place during all seasons, this pattern can be
considered to be a snapshot covering a complete year.
Homes described as highly polluted by multiple VOCs were
characterised by high concentrations, from 2 to 20 times
greater than the median value of the sample of surveyed homes, for
7 VOCs on average. This type of home represents nearly 9.6%
[7%-13%] of homes nationwide. Two groups can be distinguished: the
first polluted mainly by aromatic hydrocarbons, the other mainly by
aliphatic hydrocarbons. This second group is also associated with
higher concentrations of cat allergens.
Homes highly polluted principally by a single VOC are
characterised by high concentrations, from 5 to 400 times
greater than the sample median, for mainly one VOC and levels for
the others similar to those of the sample. They represent nearly
24% [21%-28%] of the housing stock. Eight groups can be
distinguished, each corresponding to a different VOC:
1,4-dichlorobenzene, n-undecane, 1-methoxy-2-propanol, styrene,
trichloroethylene, tetrachloroethylene, 2-butoxyethanol and
formaldehyde. Some of these groups are also associated with higher
or lower levels of radon, gamma radiation and dust mite
allergens.
Homes moderately polluted by multiple VOCs were characterised by
concentrations from 1.5 to 2.5 times greater than the
sample median, for 4 to 7 VOCs simultaneously. This type
of home represents 26.7% [23–31%] of the housing stock and is
subdivided into two groups, the first polluted mainly by aromatic
hydrocarbons and the second mainly by aliphatic hydrocarbons and
by, to a lesser degree, some aromatic hydrocarbons. This second
group is also associated with higher concentrations of CO,
PM2.5/PM10 and dust mite allergens.
Homes of the slightly polluted type represent 40% [23-45%] of
the housing stock and are subdivided into two groups: one for which
the median levels of the concentrations are equal to the those of
the sample for virtually all the VOCs and the other for which they
are lower. These groups are also associated with lower
concentrations of PM and CO.
Pollution therefore appears heterogeneous within the sample of
homes, in terms of both concentrations and associations. The homes
analysis showed the groupings of VOCs by chemical families in the
correlation analysis. The physical or biological agents are not
especially correlated with the VOCs. Nor are they linked to one
another. That is, they are essentially independent parameters.
Their links with the groups resulting from the VOC pollution
mapping are equally — and logically — weak.
The multidimensional approach with which we experimented in this
study has not been used before in the field of indoor air
pollution. Hence, this study is a first and presents a new
perspective on the exposure of inhabitants to chemical mixtures in
homes. The results of this study can usefully be applied for risk
assessments that cannot be carried out on a substance-by-substance
basis, especially for homes that are highly or moderately polluted
by multiple substances. In the next step, an explicative
statistical analysis will attempt to identify the determinants of
this multiple pollution.
Acknowledgements
The work presented here was carried out by the French Agency for
Environmental and Occupational Health Safety (Afsset). The data
come from the survey of homes conducted by the OQAI (Indoor Air
Quality Observatory), financed by the Ministries for housing,
ecology and health, the French Institute for Public Health
Surveillance (InVS), the Scientific and Technical Center for
Building (CSTB), the French Environment and Energy Management
Agency (ADEME) and the French National Agency for Housing (ANAH),
all of which we would like to thank. Follow-up work has been
conducted by the OQAI “Data use” working group composed of the CSTB
(coordinator), Afsset, the national Insitute of Health and Medical
Research (INSERM), InVS, the LOCEAN laboratory, and the Hygiene
Laboratory of Paris City (LHVP).
We would also like to thank Mustapha Lebbah of the Medical
Computing and Bio-Computing laboratory of Paris Nord University,
for the creation of the Kohonen self-organising maps, Sylvie Thiria
of the LOCEAN Laboratory of the Pierre et Marie Curie University
for her advice on Kohonen self-organising maps, Sandrine Philippe
and Elisabeth Robert-Gnansia of Afsset for their attentive
proofreading and extensive drafting advice.
Financial support: none; conflict of interest: none.
References
1 Golliot F, Annesi-Maesano I, Delmas MC, et al. The French
National Survey on Indoor Air Quality: sample survey design. Proc
Healthy building 7th International Conference 2003 ; 3 ; 712-7.
2 Mosqueron L, Nedellec V, Kirchner K, et al. Ranking indoor
pollutants according to their potential health effect, for action
priorities and costs optimization in the French permanent survey on
indoor air quality. Proc Healthy building 7th International
Conference 2003 ; 3 : 138-43.
3 Ramalho O, Derbez M, Grégoire A, et al. French permanent
survey on Indoor Air Quality - Part. 1: Measurement protocols and
quality control. Actes de la conférence Healthy Buildings.
Lisbonne, 2006.
4 Derbez M, Grégoire A, Garrigue J, Kirchner S. French permanent
survey on Indoor Air Quality: Part. 2: Questionnaires and
validation procedure of collected data. Actes de la conférence
Healthy Buildings. Lisbonne, 2006.
5 Observatoire de la qualité de l’air intérieur (OQAI). Campagne
nationale Logements. État de la qualité de l’air dans les logements
français. Rapport final de l’observatoire de la qualité de l’air
intérieur. Paris : OQAI, 2006.
www.air-interieur.org/userdata/documentation/ document_133.pdf.
6 Kirchner S, Arenes JF, Cochet C, et al. État de la
qualité de l’air dans les logements français. Environnement,
Risques et Santé 2007; 6 : 259-69. doi : 10.1684/ers.2007.0096
7 Duboudin C. Répartition de la pollution chimique dans le parc
de logement en France : Analyse descriptive multipolluants des
données de l’OQAI. Journées RSEIN; 7-8 June 2007; La Rochelle,
France.
http://rsein.ineris.fr/actualite/actu_pdf/colloque2007/5-RSEIN_OQAI_Duboudin.pdf.
8 Duboudin C. Analyse descriptive multipolluants des données de
la campagne logement de l’OQAI (Observatoire de la qualité de l’air
intérieur). Rencontres scientifiques de l’Afsset; (Afsset
scientific conferences), 14 February 2008; Paris, France.
9 Duboudin C. Pollution inside the home: descriptive analysis.
Part I: Analysis of the statistical correlations between pollutants
inside homes. Environnement, Risques et Santé 2009 ; 8 : 485-96.
doi : 10.1684/ers.2009.0304
10 Kohonen T. Self-Organizing Maps. Berlin: Spring-Verlag,
1995.
11 Dreyfus G, Martinez JM, Samuelides M,
et al. Réseaux de neurones : Méthodologies et applications.
Paris: Eyrolles, 2004.
12 Murtagh F. A survey of algorithms for
contiguity-constrained clustering and related problems. Comput J
1985; 28: 82-8.
13 Lebart L, Morineau A, Piron M. Statistique
exploratoire multidimensionnelle. Paris: Dunod, 1997.
14 Efron B, Tibshirani RJ. An Introduction to the
Bootstrap. New York: Chapman and Hall, 1993.
15 Blyth CR, Still HA. Binomial Confidence Intervals.
J Amer Stat Association 1983; 78: 108-16.
16 Yacoub M, Badran F, Thiria S. Topological
hierarchical Clustering: Application to Ocean Color Classification.
ICANN’2001 Proceedings. Berlin: Springer, 2001.
17 Yacoub M, Frayssinet F, Badran F, et al.
Clustering and Classification Based on Expert Knowledge Propagation
Using a Probabilistic Self-Organizing Map: Application to
Geophysics. In: Gaul W, Opitz O, eds. Data Analysis:
scientific modeling and practical application. Berlin:
Springer-Verlag, 2000.
1 Bd-OQAI-logements2005.
2 The Xth percentile of a sample
is the value which separates the lower X% of this sample from the
higher 1-X% values.
3 The standardisation formula is the
following: if x represents the set of all the concentration values
measured for a VOC, each xi value of this set is
replaced by the value.
4 It is in fact a locally weighted
mean.
5 Clustering by minimising the intra-class
variance. This clustering criterion favours the consolidation of
clusters containing low numbers as a priority.
6 Taking into account the weight
adjustments calculated for 532 homes.
7 These are for raw values, neither limited
nor standardised.
8 The threshold of 0.001 was chosen for the
identification of the VOCs significantly associated with each
group, see methodology.
|