ARTICLE
Auteur(s) : Nanny VAN GEEL, Yves VANDER HAEGHEN, Katia
ONGENAE, Jean-Marie NAEYAERT
Department of Dermatology, Ghent University Hospital, De
Pintelaan 185, 9000 Gent, Belgium
Article accepted on 24/3/2004
Vitiligo is a skin disease characterized by sharply demarcated
depigmented lesions localized on any part of the body. So far, no
single curative therapy has been developed and response to the
different treatment modalities appears to be moderate, despite
intensive ongoing research [1]. Furthermore, as our group recently
showed, there is absolutely no uniformity in the evaluation methods
used in the assessment of treatment outcome for vitiligo [2]. For
instance, most authors report repigmentation capacity’ as being the
most important parameter in the assessment of treatment outcome.
However, almost every article defines repigmentation capacity’
differently, mostly using a subjective scoring system. Besides
typically exhibiting low reproducibility, the use of such
parameters excludes meaningful comparison of different
treatments.
In our opinion, the best way to objectively and consistently
judge repigmentation capacity might be the use of a digital image
analysis system (DIAS). So far, only a few groups have used a DIAS
in evaluating vitiligo treatments [3-6]. However these systems have
shortcomings concerning cost, speed, reproducibility, accuracy or
user-friendliness.
A new DIAS for the surface measurement of vitiligo lesions has
been developed in our department, and in this study we thoroughly
investigate the reproducibility, accuracy, user-friendliness and
time effectiveness of this new measuring tool.
Material and methods
Material
Ten vitiligo patients (4 male and 6 female, mean age
29.4, range 15-51 years), were included in our study. All
patients signed an informed consent for the use of their images.
Seven patients had skin type II-III, 3 patients had skin type
V-VI. In each patient 1 vitiligo lesion was selected. The
anatomical localisation of the lesion was selectively chosen. Some
curvature of the lesion was tolerated. In 8 of
10 patients (Figs.
1 a,b,c,f,g,h,i,j) the selected lesion was clinically
well described. In two other patients (Figs. 1 d,e) a
clinically less well described lesion was selected for measurement
analysis. The clinical data (localisation of the lesion, skin type,
sex and age of the patient) are shown in Table
I. All selected lesions were photographed 3 times
using a digital camara (Figs. 2 a,b,c). To
simulate a real-life situation, where photography would have taken
place on 3 different visits to the dermatologist, the position
of both the patient and the photographer were changed after each
photograph (patient and photographer turning both 360° around their
axes). Subsequently, 1 extra picture was taken under the same
circumstances, after outlining the contours of the lesion on the
skin accurately with a black pencil (Fig. 2 d). Finally,
the lesion contour was also directly copied to a transparent sheet
by overlaying it physically onto the lesion (Fig. 2 f). This method
takes the local curvature of the lesion into account and should
thus yield the most accurate surface measurement (3D
measurement).
Table I. Clinical data of patient population
| Patient/Lesion |
Age/sex |
Localisation of lesion |
Photo type* |
| 1 |
31/F |
Neck |
IV |
| 2 |
39/M |
Flank |
III |
| 3 |
31/F |
Back |
III |
| 4 |
38/F |
Abdomen |
III |
| 5 |
15/F |
Mamma |
V |
| 6 |
26/M |
Back |
III |
| 7 |
17/M |
Back |
II |
| 8 |
26/F |
Elbow |
IV |
| 9 |
20/M |
Nipple |
III |
| 10 |
51/F |
Thigh |
III |
* Photo type according to Fitzpatrick
Digital photography was performed with an Olympus
CL2500 digital reflex camera, using the built-in flash. All
photographs were taken at a fixed working distance, using a spacer’
attached to the camera. The field of view could then still be
adjusted using the optical zoom of the camera. The spacer also
contained a card with colour patches [7] which was used for the
colour and geometrical calibration of the picture. The colour
calibration procedure eliminates most variations in the images due
to camera settings and other outside influences like extraneous
lighting, and ensures that images can be compared qualitatively,
e.g. visually, and quantitatively, e.g. by colour measurement. It
also prepares the images for realistic viewing on a computer
monitor and proper and consistent image segmentation. The
geometrical calibration procedure is based on the known size of the
colour patches, and allows us to compute a scale factor for the
surface measurements.
The surface measurement of the transparent sheets with a simple
image processing program in Matlab’ is very accurate and exhibits
no intra- or inter-observer variability, so only one measurement is
performed per lesion (The Math Works, inc. 3Apple Hill Drive,
Natick, MA 017602098, USA). This results in 10 3D lesion
surface measurements.
Methods
Image measurement procedure for an observer
To test the DIAS system 3 different observers performed a
surface measurement on 30 images, i.e. pictures 1, 2 and
3 (Figs.
2 a,b,c) of all 10 selected lesions (Figs. 1 a-j). To
minimize memory recall bias all 30 pictures were presented to
the observers on a computer monitor in random order and
orientation.
This procedure consists of 2 steps:
a) Visual estimation of the lesion surface
b) Surface measurement using the new digital image analysis
system
Visual estimation:
Three independent different observers (1 dermatologist,
1 I.T. expert, 1 dermatological laboratory co-worker)
individually estimated the surface of every lesion appearing on the
computer monitor, based on a reference patch of 1.5 cm by
1.5 cm. In total 90 estimations were performed (i.e.
30 per observer).
Surface measurement using the new digital image analysis
system:
After the visual estimation each observer performed a
semi-automatic segmentation of the vitiligo lesion using the
digital image analysis system (Fig. 2 e). This
segmentation is based on colour differences as they would be
perceived by a human observer (colorimetry). Because of this
relation with human vision it is believed that this method will
deliver results that are closer to the lesion borders as
intuitively perceived by the dermatologist during a clinical
inspection.
The process consisting of the following steps:
1) The user indicates a spot which according to him/her
belongs to the vitiligo lesion
2) The image analysis algorithm tries to expand the region
(region growing) by including neighbouring pixels if they are
similar to the colour of the indicated spot. This colour similarity
is computed in a perceptually uniform colour space called CIE
L*a*b* [8] and is correlated to the human perception of colour
differences.
3) Stop if the whole lesion has been segmented, according to
the user. Otherwise go back to step 1. All measurement data, as
well as the time needed for the procedure itself, were accurately
noted.
Surface measurement using traced lesions (2D measurement of the
lesion)
After all observers had performed the measurement, an expert
performed an analysis on all pictures with the traced lesions
(10 in total; Fig.
2 d). This removes any ambiguity as to the exact
borders of the lesion, and so these measurements can be used as the
gold standard for the 2D measurement of the lesion.
Surface measurement using a transparent sheet (3D measurement
of the lesion)
In order to obtain an accurate and realistic estimation of the
vitiligo lesion size (3D measurement), all lesions were copied onto
transparent sheets by putting the sheet over the lesion and tracing
the lesion contours (Fig. 2 f). This has
the advantage of taking the local curvature into account, thereby
avoiding possible underestimation of the lesion surface due to the
move from 3 to 2 dimensions. These sheets were then
scanned at a predefined resolution, and analyzed using some simple
image processing techniques implemented in Matlab (The MathWorks,
inc). In case of a gap in the traced contours, the computer gave an
error and the procedure had to be repeated. The estimated
reproducibility of this method (scanning, segmentation and
measurement) for a plane surface is in the order of 98% or more.
(CV or coefficient of variation is 2%). The accuracy (how close the
measurement is to the real surface) is of the same order. The time
necessary to fulfil the procedure was also recorded.
Estimation of reproducibility and accuracy
Using all measurement data gathered by the different procedures
the reproducibility and accuracy were determined for both the
visual estimation and for the new DIAS.
1) Reproducibility is the extent to which a measuring
procedure yields the same results on repeated measurements of the
same object. As we are dealing with a semi-automatic method, i.e.
one in which a person intervenes there are two aspects to
reproducibility: intraobserver reproducibility which involves
repeated measurements by one person only and interobserver
reproducibility which involves single measurements by several
persons.
2) Accuracy refers to the ability of an instrument to measure
what it is designed to measure. In other words is there a
correlation between a gold standard’ measurement and the
instruments rating.
Evaluating reproducibility
To evaluate the reproducibility of the visual estimation, the
coefficient of variation (C.V.) is used [9]. This is defined as the
sample standard deviation divided by the sample average, and allows
comparison of the ‘spread’ of measurements of variables with a
different mean, e.g. the surface of the different lesions. This
value can be computed over the measurements of the different
observers separately, or by lumping all the measurements together.
In the latter case one obtains a “total” measure of variation, i.e.
irrespective of the observers. It is also interesting to compute
the ratio of variance due to differences between the measurements
of one observer, versus those between observers. This ratio is an
F-statistic, and can be tested with a certain confidence for
equality [9]. These results are to be treated with caution due to
the possible non-normality of underlying distributions.
In order to ascertain if the DIAS improved reproducibility
significantly we used a nonparametric Wilcoxon signed-rank test,
equivalent to the t-test for paired observations [9]. This test
does not require normality of the underlying probability
distributions, at the price of reduced power.
Evaluating accuracy
The accuracy was evaluated by comparing the average area from
the visual estimation and the image analysis system with the 2D
lesion measurements. We used a non-parametric sign-test to
determine if there was a significant bias between those results. If
so, we tried to quantify this bias using a percentual change. Note
that this is a relatively weak test, but a more powerful
non-parametric test cannot be used here because the areas and
differences between areas cannot be compared between different
lesions. This means that if this test does not find a bias, it does
not mean that it is not present but only that we cannot prove it
with the current limited set of measurements. This is true in
general, but it is more apparent with weaker statistical tests.
A further comparison between the 2D (Fig. 2 d) and 3D
lesion measurements (Fig. 2 f) gives us an
idea of the error (bias) associated with the fact that we are
measuring 3D surfaces in 2D.
Evaluation of user-friendliness and the time effectiveness
No observer was familiar with the new system before starting the
measurements. Therefore a short introduction (approximately
2 minutes) to the procedure was given by one of the experts.
The (time)-efficiency was subsequently estimated by comparing the
mean time necessary for digital segmentation on one hand and the
time needed to measure the traced lesions on transparent sheets
(using imaging processing system in Matlab’) on the other hand.
Results
Reproducibility
Table II shows the results of the
visual estimations of the lesion area. The total C.V. is 21%
averaged over all the lesions, but the range of the individual and
total C.V. is from 0% to 51% and 7% to 42% respectively, which
indicates a huge spread on visual assessment in general. The fact
that for some lesions the per observer C.V. is zero also indicates
a certain memory recall effect is present: the observer remembered
the visual estimation he/she entered on the previous showing of the
lesion, in spite of the random order and orientation under which
these lesions were shown.
Table II. The results for visual
estimations of 10 lesion areas by 3 observers, and the
corresponding coefficient of variations (C.V.). The results of the
F-test for the proportion of inter- to intra-observer variability
is only an indication due to non-normality of the underlying
distributions
| Obs. 2
(n = 3) |
Obs. 3
(n = 3) |
| Avg. Area |
C.V.% |
Avg. Area |
C.V.% |
Avg. Area |
C.V.% |
| 1 |
8.3 |
28 |
11.6 |
3 |
11.2 |
18 |
10.4 |
21 |
= |
| 2 |
2.1 |
7 |
2.3 |
2 |
2.1 |
5 |
2.2 |
7 |
> |
| 3 |
1.8 |
25 |
2.2 |
1 |
2.4 |
22 |
2.1 |
21 |
= |
| 4 |
2.6 |
20 |
3.6 |
21 |
2.9 |
18 |
3.0 |
23 |
= |
| 5 |
2.4 |
13 |
3.2 |
2 |
2.6 |
21 |
2.7 |
17 |
= |
| 6 |
3.2 |
25 |
3.6 |
0 |
3.0 |
19 |
3.2 |
17 |
= |
| 7 |
14.3 |
16 |
15.3 |
2 |
16.6 |
29 |
15.4 |
18 |
= |
| 8 |
9.3 |
27 |
11.3 |
20 |
9.9 |
51 |
10.2 |
31 |
> |
| 9 |
1.8 |
25 |
4.8 |
0 |
3.5 |
30 |
3.3 |
42 |
= |
| 10 |
1.3 |
0 |
2.0 |
0 |
1.8 |
0 |
1.7 |
20 |
> > |
Table III shows the results of the
measurements with the DIAS of the lesion area for all observers.
The average total C.V. is 9%, with a range of the individual C.V.
from 1% to 25%, and of the total C.V. from of 4% to 23%. The
Wilcoxon signed-rank test shows a significant improvement of
reproducibility for the DIAS (comparing to the visual estimation)
with a level p = 0.01. It has to be noted that the
lesions with the worst reproducibility are precisely the lesions
with the most unclear borders, see Figs. 1 d,e.
Table III. The results of
the surface estimation with the digital image analysis system of
10 lesion areas by 3 observers, and the corresponding
coefficient of variations (C.V.)
| Lesion |
Area (cm2) |
Avg. area |
Total C.V.% |
Inter vs. intravariability
(F2,6, p = 0.9) |
| Obs. 1
(n = 3) |
Obs. 2
(n = 3) |
Obs. 3
(n = 3) |
| Avg. Area |
C.V.% |
Avg. Area |
C.V.% |
Avg. Area |
C.V.% |
| 1 |
9.4 |
3 |
8.9 |
5 |
9.5 |
3 |
9.3 |
4 |
= |
| 2 |
2.1 |
3 |
2.0 |
8 |
2.2 |
6 |
2.1 |
7 |
= |
| 3 |
1.9 |
6 |
1.9 |
1 |
2.1 |
5 |
2.0 |
7 |
> |
| 4 |
3.6 |
4 |
2.1 |
2 |
2.7 |
15 |
2.8 |
23 |
> > |
| 5 |
3.0 |
16 |
2.7 |
18 |
2.6 |
25 |
2.7 |
18 |
= |
| 6 |
2.8 |
3 |
2.8 |
5 |
3.0 |
1 |
2.8 |
4 |
= |
| 7 |
18.8 |
8 |
18.7 |
4 |
19.1 |
4 |
18.9 |
5 |
= |
| 8 |
12.3 |
6 |
12.4 |
9 |
12.3 |
9 |
12.4 |
7 |
< |
| 9 |
4.0 |
12 |
3.6 |
8 |
4.3 |
12 |
4.0 |
13 |
= |
| 10 |
1.3 |
6 |
1.3 |
2 |
1.5 |
3 |
1.4 |
6 |
> |
Accuracy
Table IV compares the lesion areas
resulting from visual estimation, DIAS, 2D and 3D measurements.
Having already established the poor reproducibility of the visual
estimation, there is not much point in estimating its accuracy.
Based on the sign-test, the DIAS and 2D measurements are not
statistically significantly different. However, the 2D and 3D
measurements are very significantly different
(p = 0.004), with a bias of typically – 20% and
maximally – 34%! Clearly, this bias must be kept in mind
when performing 2D area analysis of lesions.
Table IV. Comparison between the
average visual estimations, the average DIAS measurements, the 2D
and the 3D measurements of the 10 lesions. Also shown is the
bias of 2D versus 3D measurements
| Lesion |
Avg. visually estimated area |
Avg. DIAS measured area |
2D area |
3D area |
3D-2D bias in% |
| 1 |
10.4 |
9.3 |
8.8 |
10.2 |
– 13 |
| 2 |
2.2 |
2.1 |
2.1 |
2.1 |
0 |
| 3 |
2.1 |
2.0 |
1.8 |
2.5 |
– 29 |
| 4 |
3.0 |
2.8 |
2.7 |
3.8 |
– 29 |
| 5 |
2.7 |
2.7 |
3.5 |
4.1 |
– 16 |
| 6 |
3.2 |
2.8 |
3.1 |
4.2 |
– 26 |
| 7 |
15.4 |
18.9 |
18.6 |
21.6 |
– 14 |
| 8 |
10.2 |
12.4 |
14.2 |
21.6 |
– 34 |
| 9 |
3.3 |
4.0 |
4.2 |
5.6 |
– 25 |
| 10 |
1.7 |
1.4 |
1.5 |
1.8 |
– 16 |
User-friendliness and time effectiveness
Despite the relatively short introduction by the expert
concerning the use of this new system, no observer mentioned a
problem in handling the system for the first time. The mean time
necessary for digital segmentation was 22.7 minutes for
30 pictures (45 seconds per image). The time to analyse
transparent sheets took 2.5 minutes per lesion. This would
mean 75 minutes for 30 analyses.
Discussion
In order to overcome a major issue in evaluating and comparing
clinical vitiligo studies, i.e. the lack of uniformity in
assessment of treatment outcome, we investigated a new digital
image analysis system that might be useful in consistently
measuring surfaces of vitiligo lesions both before and after
different therapeutic modalities. The system is based on a
semi-automatic colour segmentation technique. The most important
difference with currently used techniques is that this system is
capable of measuring a surface from a digital image without the
need of a manual tracing procedure. This makes the procedure much
easier and less time-consuming.
We studied the reproducibility, accuracy, user-friendliness and
the time effectiveness of this system. To do so several measurement
procedures were compared. The digital segmentation procedure of
10 vitiligo lesions was performed by 3 independent
observers, while the gold standard 2D measurements based on the
traced lesions and 3D measurements using transparent sheets were
performed by an expert.
A high inter- and intra-observer variability was observed for the
visual estimation, even though 8 of 10 lesions had well
described borders (Figs.
1 a,b,c,f,g,h,i,j). A statistically significant
improvement of the reproducibility was achieved by the digital
image analysis system (p = 0.01). The surface
calculations by the DIAS seem to be very accurate, as the DIAS and
the gold standard 2D measurements were not statistically
different.
The comfort in use and the time efficacy was clearly improved
using this new DIAS. The observer only needed 45 seconds per
lesion, where as the “old” procedure using transparent sheets took
2.5 minutes per lesion (even without taking the time needed
for tracing the lesions on a transparent sheet into account).
Comparing the 2D with the 3D measurements, a systematic
underestimation was demonstrated. However, this value was a lot
higher than expected (typically – 20%), even for the
relatively ‘flat’ and small lesions used in our experiments. This
is an important restriction of the system. However, note that this
bias is less important when using the same system in order to
compare a certain lesion over time. Therefore we feel that this
method is not good enough for absolute surface measurements of
large areas, but rather for the estimation of surface changes over
time of some selected target lesions. Moreover, we believe that if
during follow-up the observer has access to previously segmented
lesions the reproducibility will increase because some of the
ambiguity of the location of lesion borders can be removed.
The digital measuring procedure was tested on lesions that showed
clinically sufficient contrast in colours compared to the
surrounding skin. Indeed, in the case of I-III photo types exact
lesion contours are sometimes visually very difficult to locate.
Because the DIAS is based on human vision, it is clear that it will
exhibit similar shortcomings.
In clinical practise this is solved by the use of an UV-lamp (Wood
lamp). In a small preliminary study we performed measurements with
the DIAS of images that were taken with such UV-illumination, with
some encouraging results. In the near future we will try to expand
the applicability of this system to larger and more curved areas
with or without the use of UV-illumination.
Although this new DIAS seems to be reproducible, accurate, time
effective and easy to use, one should not lose track of the fact
that in the evaluation of a therapeutic modality for vitiligo the
degree of repigmentation is not always a good and satisfying
parameter for the patient. Apart from the repigmentation capacity
one should also pay attention to the personal evaluation of the
patient. A global assessment scale or a quality of life
questionnaire could be of use and should be filled in by the
patient [10, 11]. A combination of an objective measurement tool on
one hand and a psychosocial c.q. personal evaluation on the other
hand may give a good total assessment of the efficacy of the
treatment studied [2].
Conclusion
For many skin diseases the quantification of clinical symptoms
has been recognized. In our opinion this is also essential for
vitiligo. To objectively assess the repigmentation capacity of
treatment the use of an easy digital image analysis system will be
indispensable. We feel that the proposed image analysis system is a
step in the right direction, although it can only be used for
fairly small and flat surfaces. This means it is currently not very
useful for the daily clinical practise and restricts its use mainly
to studies. Extending the system to include 3D information would
probably allow much larger areas to be measured reliably, but looks
a complicated and expensive proposition for the moment. n
Acknowledgements. This research project was
supported by a grant from the “Bijzonder Onderzoeksfonds” number
01108101 (Ghent University, Belgium) for NvG and a grant from the
Fonds Wetenschappelijk Onderzoek (Ghent University, Belgium) for
KO. The calibration software was partially supported by Pierre
Fabre, Dermato Cosmétique, Toulouse.
References
1. Taneja A. Treatment of vitiligo. J Dermatolog
Treat 2002 Mar; 13 (1): 19-25.
2. van Geel N, Ongenae K, Vander Haeghen Y, Naeyaert
JM. Autologous transplantation techniques for vitiligo: how to
evaluate treatment outcome? Eur J Dermatol 2004; 14:
46-51.
3. Guerra L, Capurro S, Melchi F, et al.
Treatment of “stable” vitiligo by Timedsurgery and transplantation
of cultured epidermal autografts. Arch Dermatol
2000 Nov; 136 (11): 1380-9.
4. Andreassi L, Pianigiani E, Andreassi A, Taddeucci
P, Biagioli M. A new model of epidermal culture for the surgical
treatment of vitiligo. Int J Dermatol 1998 Aug; 37 (8):
595-8.
5. Lepe V, Moncada B, Castanedo-Cazares JP, et
al. A double-blind randomized trial of 0.1% Tracrolimus vs
0.05% Clobetasol for the treatment of childhood vitiligo. Arch
Dermatol 2003; 139: 581-5.
6. Boersma BR, Westerhof W, Bos J. Repigmentation in
vitiligo vulgaris by autologous minigrafting: results in nineteen
patients. J Am Acad Dermatol 1995; 33: 990-5.
7. Gretag-MacBeth colour checker chart:
http://www.gretagmacbeth.com
8. Vander Haeghen Y, Naeyaert JM, Lemahieu I,
Philips W. An imaging system with calibrated color image
acquisition for use in dermatology. IEEE Transactions on Medical
Imaging 2000; 19: 722-30.
9. Rosner B. Descriptive statistics. In:
Fundamentals of biostatistics. Harvard University, 5 th
edition, Duxbury (Thomason Learning) USA. 2000: Coefficient of
variation (C.V.) p. 24-5; Fstatistic p. 289; Wilcoxon
signed-rank test p. 338-43.
10. Finlay AY, Khan GK. Dermatology Life Quality
Index (DLQI) – a simple practical measure for routine
clinical use. Clin Exp Dermatol 1994 May; 19 (3): 210-6.
11. Kent G, al-Abadie M. Factors affecting responses
on Dermatology Life Quality Index items among, vitiligo sufferers.
Clin Exp Dermatol 1996 Sep; 21 (5): 330-3.
| Lesion |
Area
(cm2) |
Avg. area |
Total C.V.% |
Inter vs.
intravariability (F2,6, p = 0.9) |
|
Obs. 1 (n = 3)
|
|