Abstract
A computerized system for identifying and eliminating duplicate bibliographic citations collected from multiple data sources is presented. The system has been used successfully in building and maintaining databases at the Department of Energy's Office of Scientific and Technical Information since 1984. Compressed keys for author and title fields are used to locate potential duplicates, which are further checked by comparing compressed keys for report number, patent number, CODEN, date, page, volume-issue, and publisher. To aid in comparison, each field is weighted according to its value in identifying and discriminating among duplicates.
