A Procedure for Matching Truck Crash Records with Hazardous Material Release Incidents and a Comparative Analysis of the Determinants of Truck Crashes with Hazardous Material Releases

The current study quantifies the number and location of hazardous release crashes and identifies the events leading to crashes, as well as the type of material released. This study, for the first time, combined two federal databases: the United States Department of Transportation Pipeline and Hazardous Materials Safety Administration (PHMSA) database, and the Motor Carrier Management Information System (MCMIS) crash database. PHMSA and MCMIS data for 1999 through 2009 were obtained and matched using the common attributes of time, day, month, year, county, state, and phase of transportation. Naive Bayesian, logistic and neural network classification methods were developed and compared. Each method performed well. All possible pairwise combinations of records between the two datasets were identified. Likelihood estimates of a match using these common attributes were calculated, after which a sample of the records was drawn. The sample was manually checked for matches and mismatches, and was used in the calibration of the logistic and neural networks. The matching algorithms were run using all possible pairwise combinations to identify exact matches, as well as the probability of matches. Pairwise comparisons with a probability of a match greater than 0.50 were extracted and used in the statistical analysis of truck crash characteristics. Each of the extracted records was weighed based on the probability of a match, and the weighted total was set to equal the number of MCMIS reported crashes characterized by hazardous material releases. One outcome of this study will be the identification of a probabilistic model that will advance safety regulations of the U.S. trucking industry and fleet.