CDC Homepage NIOSH Homepage 

 

QSAR study of skin sensitization using Local Lymph Node Assay data.

Adam Fedorowicz1, Lingyi Zheng2, Harshinder Singh1,2, Eugene Demchuk1,3

 

1 National Institute for Occupational Safety and Health, Morgantown, WV
2 Department of Statistics, West Virginia University, Morgantown, WV
3 School of Pharmacy, West Virginia University, Morgantown, WV

 

KEYWORDS:

ACD, LLNA, QSAR, logistic regression, skin sensitization

 

ABSTRACT

Allergic Contact Dermatitis (ACD) is a common work-related skin disease that often develops as a result of repetitive skin exposures to a sensitizing chemical agent. A variety of experimental tests have been suggested to assess the skin sensitization potential. We applied a method of Quantitative Structure-Activity Relationship (QSAR) to relate measured and calculated physical-chemical properties of chemical compounds to their sensitization potential. Using statistical methods, each of these properties, called molecular descriptors, was tested for its propensity to predict the sensitization potential. A few of the most informative descriptors were subsequently selected to build a model of skin sensitization. In this work the murine Local Lymph Node Assay (LLNA) data were used. In principle, LLNA provides a standardized continuous scale suitable for quantitative assessment of skin sensitization. However, at present many LLNA results are still reported on a dichotomous scale, which is congruous to the scale of guinea pig tests, which were widely used in past years. Therefore, in this study only a dichotomous version of the LLNA data was used. To the statistical end, we relied on the logistic regression approach. This approach provides a statistical tool for investigating and predicting skin sensitization that is expressed only in categorical terms of activity and non-activity. Based on the data of compounds used in this study, our results suggest a QSAR model of ACD that is based on the following descriptors: nDB (number of double bonds), C-003 (number of CHR3 molecular subfragments), GATS6M (autocorrelation coefficient) and HATS6m (GETAWAY descriptor), although the relevance of the identified descriptors to the continuous ACD QSAR has yet to be shown.

 

INTRODUCTION

The Bureau of Labor Statistics estimates that occupational skin diseases constitute the second largest group of occupational injuries in the U.S. [1]. Among them, Occupational Contact Dermatitis (OCD) is the most common cause of work-related skin illness comprising up to 95% of registered cases. Allergic Contact Dermatitis (ACD) may lead to severe recurrent forms of OCD because of long-lasting memory of the immune system. ACD usually develops as a result of repetitive skin exposures to a sensitizing chemical agent. Usually at least a single excessive exposure is essential in the development of the immune response. A variety of experimental tests have been suggested to assess the skin sensitization potential of a chemical [2]. Information that leads to the development of recommended skin exposure limits that would prevent workers from sensitizing overexposures is an important factor. Unfortunately, many experimental protocols result in a dichotomous conclusion, more appropriate for denial/acceptance decision-making in design and manufacturing of new chemicals rather than for preventive protection of workers occupationally involved with sensitizing chemical agents. The murine Local Lymph Node Assay (LLNA) has the capacity to provide a standardized continuous scale in the quantitative assessment of skin sensitization.

A combination of methods in statistics and computational chemistry, commonly referred to as Quantitative Structure-Activity Relationship (QSAR) modeling, complements the experimental approach. A method of QSAR is based on the examination of measured and calculated physicalchemical properties of chemical compounds, called molecular descriptors, with known biological activity, in this work the sensitization potential, and then relating a few of the most informative descriptors to the target bioactivity. The structure-activity relationships constructed this way provide a means of investigating and predicting the sensitization potential of the chemicals.

We rely on LLNA data to quantify the skin sensitization potential [3]. At present the LLNA data are (1) outnumbered by the long history of guinea pig assays, and (2) often reported as a dichotomous scale congruous to the guinea pig data. Therefore, the work has been started using a dichotomous version of LLNA data to identify molecular descriptors that may be effective in the continuous-scale LLNA QSAR. The work began from building a database of chemical names, structures, properties and bioactivities, along with design of appropriate software. Our immediate goal is to identify a pool of potentially informative molecular descriptors and chemical classes that are most appropriate for QSAR modeling to predict LLNA results. In the present work a QSAR based on a generalized linear model of logistic regression is proposed. The logistic regression permits construction of standard QSAR equations, in which the activity data are represented only in terms of activity (1) or nonactivity (0) values. In order to evaluate molecular properties, which can be associated with LLNA data on skin sensitization, 1204 molecular descriptors were calculated and tested for their significance in predicting the skin sensitization potential. Only a limited number of molecular descriptors were found to be statistically associated with skin sensitization.

These results suggest that a validated QSAR model of ACD may be built by using only a few appropriate parameters, although the relevance of identified descriptors to the continuous-scale of ACD-QSAR has yet to be shown. Further work will be focused on populating the QSAR database with continuous-scale ACD data and extending the database so it will contain more LLNA-tested compounds.

 

Materials and Methods

In our QSAR studies we applied a pool of LLNA-tested compounds consisting of 54 compounds from which 25 were active sensitizers and 29 were negative controls [4,5]. The molecular structures of these compounds were first encoded using the SMILES notation and were subsequently transformed into three-dimensional co-ordinates using Cerius2 from Accelrys, Inc. The Dragon 2.1 software developed by Milano Chemometrics and QSAR Research Group has been used to calculate a total of 1024 molecular descriptors, for each of the studied compounds. The statistical analysis was carried out using the SAS 8.2 statistical package.

The linear probability model is inadequate for modeling the probability of positive LLNA sensitization response, since it is heteroscedastic and often nonsensical. Depending on the choice of cumulative distribution function F, the probability of positive response of the LLNA sensitization test P{S=1|X1, X2, …, XN } = F(X`b) – can be represented either by the probit or the logistic regression model [6]. In the present study, we used the logistic regression and in this model, p(X) = P{S=1|X1, X2, …, XN}, that depends on molecular descriptors X1, X2, …, XN, is modeled in the form

Logistic regression model

EQ 1.

or

Logistic regression model

where b0, b1, …, bN are regression coefficients.
The logistic regression is a more appropriate statistical tool than linear probability models, when the response variable is binary (dichotomous). The properties of the logistic function (EQ. 1) ensure that whatever estimate of the response one obtains, it is always a number between 0 and 1 that can be easily translated into binary responses using an appropriate threshold value (usually 0.5). The S-shape of the logistic function is another important feature, which is particularly appealing in epidemiology studies when the single variable X is viewed as representing an index that combines contributions of several risk factors and p(X) represents the risk for a given value of X in single variable logistic regression models.

The validity of logistic regression models was checked using cross validation, which, in general, treats n-1 out of n training observations as a training set. It reestimates the parameters of the model, and then classifies the observation based on the new parameter estimates. This is done for each of the n training observations. The misclassification rate for each group is the proportion of sample observations in group that are misclassified. This method achieves an almost unbiased estimate but with a relatively large variance.

The most predictive molecular descriptors were identified in several stages. At first, the statistical quality of a single-descriptor logistic model, the P-value, was assessed for each of the descriptors. Descriptors with the P-value above 0.05 were then omitted from further analysis. The remaining potentially predictive descriptors were subsequently used in an exhaustive search through all possible combinations of 1,2,3 and 4-descriptor models, along with a stepwise regression algorithm, which does not restrict the number of descriptors in the model. QSAR models which identified positive sensitizers with probability above 75% were analyzed in detail. The validity of these results was additionally verified using cross validation.

 

Results

Overall 420 descriptors (out of 1204) were found to be statically significant at the P-level of 0.05. Table 1 shows the top part of a list of descriptors with P-values below the 0.01 threshold.

Table 1. Descriptors

No.

Symbol

Definition

Class of Descriptors

P-Value

1

GATS6m

Geary autocorrelation – lag 6 / weighted by atomic masses

2D autocorrelations

0.0042

2

RTe+

R maximal index / weighted by Sanderson electronegativities

GETAWAY

0.0049

3

RDF040p

Radial Distribution Function –4.0 / weighted by atomic polarizabilities

RDF

0.0024

4

Rtu+

R maximal index / unweighted

GETAWAY

0.0045

5

RDF040v

Radial Distribution Function –4.0 / weighted by atomic van der Waals volumes

RDF

0.0039

6

X1v

Valence connectivity index chi-1

Topological

0.0074

7

RDF050u

Radial Distribution Function –5.0 / unweighted

RDF

0.0095

8

RDF050e

Radial Distribution Function –5.0 / weighted by atomic Sanderson electronegativities

RDF

0.0061

9

RDF075v

Radial Distribution Function –7.5 / weighted by atomic van der Waals volumes

RDF

0.0089

10

RDF075p

Radial Distribution Function –7.5 / weighted by atomic polarizabilities

RDF

0.0082

11

X0v

Valence connectivity index chi-0

Topological

0.0085

12

X3v

Valence connectivity index chi-3

Topological

0.0061

13

RDF065p

Radial Distribution Function –6.5 / weighted by atomic polarizabilities

RDF

0.0072

14

RDF065u

Radial Distribution Function –6.5 / unweighted

RDF

0.0092

15

S2K

2-path Kier alpha-modified shape index

Topological

0.0070

16

nDB

Number of double bonds

Constitutional

0.0029

17

C-003

CHR3

Atom-centered fragments

0.0005

18

E2m

2nd component accessibility directional WHIM index / weighted by atomic masses

WHIM

0.0078

19

TI2

Second Mohar index TI2

Topological

0.0040

20

Htp

H total index / weighted by atomic polarizabilities

GETAWAY

0.0082

21

BEHp2

Highest eigenvalue n. 2 of Burden matrix / weighted by atomic polarizabilities

BCUT

0.0051

22

BEHe2

Highest eigenvalue n. 2 of Burden matrix / weighted by Sanderson electronegativities

BCUT

0.0097

Most of the descriptors with P-value below 0.01 can be partitioned into four broad classes:

The selection of these classes of molecular descriptors seems to have a natural association with immunological activity measured by Local Lymph Node Assay, where the three dimensional structure recognition of a given antigen is responsible for the immunological response. However, the sophisticated representation of these descriptor classes impedes a simple interpretation of the mechanism of immunological response. Thus we can only rely on these QSAR models as an instrument of predicting the immunological activity.

Several tested QSAR models gave rise to interesting results and most of them contain 3 or 4 descriptors, We found that the best classification results were achieved with 3-4 parameter models, although we have identified several above-average models that include only 2 or even 1 descriptor. The best model that we identified so far consists of 4 descriptors:

EQ 2.

QSAR Model

where:

The proposed QSAR model gives a percentage of positively predicted responses of 83% on the training set of compounds, and in cross validation it correctly identifies 79% of responses. The results of proposed QSAR model are summarized in table 2.


Table 2. Model Summary.

Percentage of correctly predicted responses

Percentage of correctly identified active compounds

Percentage of correctly identified inactive compounds

Model

83%

72%

93%

Cross validation

79%

68%

90%

Table 3. presents the list of compounds tested in this study, together with their Local Lymph Node Activity data and the activity estimated by the application of the proposed QSAR model.

Table 3. LLNA-tested compounds.

No.

Compound

CAS

LLNA

Predicted skin sensitization

1

chlorobenzene

108-90-7

0

0

2

geraniol

106-24-1

0

1

3

phenol

108-95-2

0

0

4

2-chloroethanol

107-07-3

0

0

5

benzaldehyde

100-52-7

0

1

6

1-bromobutane

109-65-9

0

0

7

1-butanol

71-36-3

0

0

8

2-4-dichloronitrobenzene

611-06-3

0

0

9

isopropanol

67-63-0

0

0

10

glycerol

56-81-5

0

0

11

hexane

110-54-3

0

0

12

streptozotocin

18883-66-4

0

0

13

4-aminobenzoic acid

150-13-0

0

0

14

2-acetamidefluorene

53-96-3

0

0

15

benzalkonium chloride

8001-54-5

0

0

16

dimethyl-isophthalate

1459-93-4

0

0

17

ethyl-methanesulfonate

62-50-0

0

0

18

4-hydroxybenzoic acid

99-96-7

0

0

19

lactic acid

598-82-3

0

0

20

4-methoxyacetophenone

100-06-1

0

0

21

6-Methylcoumarin

92-48-8

0

0

22

methyl-4-hydroxybenzoate

99-76-3

0

0

23

methyl salicylate

119-36-8

0

0

24

2-nitrofluorene

607-57-8

0

0

25

propylene glycol

57-55-6

0

0

26

propyl paraben

94-13-3

0

0

27

resorcinol

108-46-3

0

0

28

salicylic acid

69-72-7

0

0

29

di-2-furanylethanedione

492-94-4

0

0

30

12-bromo-1-dodecanol

3344-77-2

1

1

31

3-amino-5-mercapto-1-2-4-triazole

16691-43-3

1

0

32

chloramine-T

127-65-1

1

1

33

benzocaine

94-09-7

1

0

34

urushiol V

53237-59-5

1

1

35

2-aminophenol

95-55-6

1

0

36

phthalic anhydride

85-44-9

1

1

37

cinnamic aldehyde

104-55-2

1

1

38

camphorquinone

10373-78-1

1

1

39

2-hydroxyethyl-acrylate

818-61-1

1

1

40

N-nitroso-N-methylurea

684-93-5

1

1

41

diethyl-sulfate

64-67-5

1

1

42

1-2-Benzisothiazol-3[2H]-one

2634-33-5

1

1

43

butyl-glycidil ether

2426-08-6

1

0

44

methyl-2-nonynoate

111-80-8

1

1

45

2-vinylpyridine

100-69-6

1

1

46

propyl gallate

121-79-9

1

0

47

ethylene-glycol-dimethacrylate

97-90-5

1

0

48

imidazolidinyl urea

39236-46-9

1

1

49

tetrachlorosalicynanilide

1154-59-2

1

0

50

oxazolone

1564-29-0

1

1

51

acetyl-isovaleryl

13706-86-0

1

1

52

hydroxycitronellal

107-75-5

1

1

53

methylene diphenyl diisocyanate

101-68-8

1

1

54

dodecyl methanesulphonate

51323-71-8

1

1

 

Conclusions

The main goal of the presented study was to evaluate classes of molecular descriptors that later can be used in a comprehensive QSAR model of LLNA based on a large set of compounds. Our preliminary results demonstrate that the most promising molecular descriptors are derived either from three or two dimensional molecular structure indices, which are based on radial distribution functions, or topological indices, or autocorrelation functions. These classes of descriptors seem to be naturally related to the LLNA activity as they associate the immunological response with a three dimensional structure and shape of the sensitizing agents. These results suggest that a comprehensive QSAR model of ACD may be built by using only a few appropriate parameters, although the relevance of the identified descriptors to the continuous-scale ACD QSAR has yet to be shown. Further work will be focused on populating the QSAR database with continuous-scale ACD data and the expansion of the database. New predictive QSARs are expected to be useful in screening larger sets of compounds for their potential impact on the skin, and thus may suggest a useful order of priorities in experimental testing.

 

Acknowledgment

This research was supported by the National Occupational Research Agenda Dermal Exposure Research Program.

 

References

  1. Worker Health Chartbook, 2000. Nonfatal Illness. DHHS (NIOSH) Publication No. 2002-120, April 2002.

  2. Hewitt, P. & Maibach, H.I. Dermatotoxicology. In: Handbook of Occupational Dermatology (Kanerva, L., Eisner, P., Wahiberg, J.E., Maibach, H.I. eds), Springer, Berlin, 2000.

  3. The Murine Local Lymph Node Assay: A Test Method for Assessing the Allergic Contact Dermatitis Potential of Chemicals/Compounds, NIH Publication No. 99-4494, February 1999.

  4. J. Ashby, D.A. Basketter, D. Paton, I. Kimber, Structure-activity relationships in skin sensitization using murine local lymph node assay., Toxicology, 102 (1995) 177-194

  5. K.E. Haneke, R.R. Tice, B.L. Carson, B.H. Margolin, W.S. Stokes, ICCVAM evaluation of the murine local lymph node assay. III. Data analyses completed by the national toxicology program interagency center for the evaluation of alternative toxicological methods. Regulatory Toxicology and Pharmacology, 34 (2001) 274-286

  6. Agresti, A. Categorical Data Analysis, John Wiley & Sons, New York, 1990

  7. Hemmer, M.C., Steinhauer, V. & Gasteiger J. Vibrat. Spectr. 19 151-164, 1999

  8. Todeschini, R. & Consonni, V. Handbook of molecular descriptors. Wiley-VCH, Weinheim, Germany, 2000.

  9. V. Consonni, R. Todeschini,M. Pavan Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. 1. Theory of the novel 3D molecular descriptors Journal of Chemical Information and Computer Sciences 42 (2002) 682-692

  10. V. Consonni, R. Todeschini, M. Pavan, P. Gramatica Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. 2. Application of the novel 3D molecular descriptors to QSAR/QSPR studies Journal of Chemical Information and Computer Sciences 42 (2002) 693-705