Data Preparation > Database > ISI > Merge Document Sources
To ensure data integrity, by default, the ISI database loader only identifies (in the sense of recognizing many as one) journal entities based on the most stringent of exact string matching. This means that it is generally possible for two documents, or two references, or a document and a reference to specify equal journal identifiers but to have those two journal entities treated as if they are distinct.
This algorithm makes a best-effort attempt to merge (that is, identify) journal entities.
The Basic Problem
A paper from the Proceedings of the National Academy of Sciences of the United States of America might indicate that it is from PROC NAT ACAD SCI USA, but a reference to that journal might identify it by P NATL ACAD SCI USA. On the other hand, did you know that MOL CELL and MOL CELLS are different journals (Molecular Cell and Molecules and Cells, respectively)? Linking up journals is a subtle problem.
A Conservative Solution
ISI publishes an official association between journal names and canonical journal identifiers. This, combined with that ISI records typically indicate the "full title" of each document's source journal, generally allows for the identification of document-source journals in your database. In some cases, we can even unify journals which occur only as a reference source (and our ability will increase in time). These methods, combined with identification based on exact string matching, provide nearly as strong an automatic ISI journal merging operation as is possible while minimizing false positives.
Load an ISI file into the tool, then create a database from it using the ISI database loader.
This algorithm should be suitable for many journal merging needs. Nevertheless, it is strongly recommended that you check the table of journal entities in the database both before and after merging to confirm desired behavior.