ISI Web of Science (WoS) is a leading citation database cataloging over 10,000 journals and over 120,000 conferences. Access it via the "Web of Science" tab at http://www.isiknowledge.com (note: access to this database requires a paid subscription). Along with Scopus, ISI WoS provides some of the most useful datasets data sets for scientometric analysis.
To find all publications by an author, search for the Publications may be obtained by querying the Web of Science database by a variety of fields within the database, such as topic, author and journal names, date ranges, etc. There are two methods for exporting data from Web of Science: direct data exporting from the search interface, and through adding records to a marked list. Direct data exporting from searches provides the fastest method to gather the most used data fields in a search, and provides less room for user selection bias of a group of records. Web of Science Marked Lists allow users to select only the records that interest them and select specific data to be exported from the various databases that a user's institution subscribes to.
126.96.36.199.1 ISI Web of Science - Direct Exporting
In this tutorial, we will download publication data using a query with an author name. Author names should be searched using the last name and the first initial followed by an asterisk wildcard in the author field. To find papers by Eugene Garfield, enter Garfield E* in the author field. The search yielded 1,529 results on November 11th 2009, 725 results on April 23, 2016, 500 of which can be downloaded at a time, see Figure 4.22a.
Figure 4.22a: ISI Web of Science search interface and ISI Web of Science search results
Download To download the first 500 article records using the output box at the with citations, go to the bottom of the page. Enter , and from the drop down menu next to the printing and email icons, select "Save to Other File Formats". A box will appear on the screen where you can manually select records '1' to '500', select 'Full Record ' and 'plus Cited ReferenceReferences', select 'Save to Plain Text' in the drop down menu, and then click save. send, see figure 4.2b. This process can be repeated to obtain records and citations by adjusting the manually selected records by units of 500, e.g. select records "501" to "725" for the Garfield example.
Figure 4.2b: Saving records from Web of Science
Wait for the processing to complete, and then save the file as GarfieldE.isi. The resulting file can be seen in Figure 4.32c.
Figure 4.3: Saving records from Web of Science and viewingGarfieldE.isi2c: Saving Records from Web of Science
Figure 4.2d: View of saved GarfieldE.isi
188.8.131.52.2 ISI Web of Science - Marked List Data Exporting
Downloading WoS data from a marked list is very similar to the process of downloading data directly from the search page. With this tutorial, we will skip the instructions on how to run a basic WoS search, which is described in the section above.
After you have refined your search query to the WoS database, you will want to add records to your Marked List. To do this, you may select individual articles using the check boxes next to a publication record and then selecting the button, "Add to Marked List", see figure 4.2e.
Figure 4.2e: Locating the Marked List button on Web of Science
You may also add records to the Marked List using a range of publication records in a search result. To do this, select the "Add to Marked List" button in the search result page without adding marks to individual publication records. Next, a box will pop up in the screen that allows you to enter a range for the publication records that you are interested and then select the "Add" button, see figure 4.2f for a picture of this screen.
Figure 4.2f: Specifying a the range of records in a Marked List on Web of Science
You will notice that the publications you have selected in the search page will now have orange checks next to them, and the Marked List tab on the Navigation bar for the site will be updated to include the total number of records in your list. After you've completed adding publication records to your marked list you will want to visit the Marked List page. Click on the link in the navigation bar named "Marked List" that has an orange count box next to it, see figure 4.2g.
Figure 4.2g: Click the Marked List link in the navigation bar to access the Marked List page on Web of Science
The Marked List page provides you with a list of publications selected and various means of exporting publication data, including limiting exports to specific databases that are part of the WoS database. To export the most amount of data from WoS and to ensure the most replicable results, it is best to choose the Web of Science Core Collection export tool. The export tool outlines the various steps you will need to take: from selecting the exported records, selecting the content for exporting, and the format that the data will come in, see figure 4.2h for a view of this export tool.
The Web of Science Core Collection export tool allows you to select a variety of fields, including correspondence addresses of researchers, funding information, cited reference lists, keywords and WoS categories and research areas, etc.
Like direct data exporting, make sure to select the format option "Save to Other File Formats" and Plain Text. You will also be limited to exporting 500 records at a time, which means that you will have to combine data sets manually after downloading all publications (see for instructions here).
Figure 4.2h: Selecting data fields and exporting data from a Marked List
ISI files are loosely based on the RIS file format, and data in this format can be used for the following types of analyses:
Elsevier's Scopus, like ISI Web of Science, has an extensive catalog of citations and abstracts from journals and conferences. Subscribers to Scopus can access the service via http://www.scopus.com. Scopus provides a multiple methods to search and analyze citations and abstracts: by document, authors, institutional affiliations, and advanced Boolean search.
To find all articles whose abstract, title, or keywords include the terms 'Watts Strogatz Clustering Coefficient', simply enter those terms in the Article/Abstract/Keywords field. TwentyForty-five nine results were found as of November 11th, 2009. Download April 29, 2016. You can download the references (up to 2,000 references full records, and 20,000 "Citations Only" in a CSV file) by checking the 'Select All' box and clicking 'Output'.on the down arrow next to 'RIS Export' and choosing the file type you wish to export the references as. You can export in RIS Format (this is for EndNote, or other Reference Managers), CSV as mentioned earlier, BibTeX, or as a .txt file. Once you have chosen the desired format, click on the 'Export' button on the bottom right of the drop down menu.
Figure 4.42i: Scopus search interface and Scopus search results
At the output windowexport screen, select 'Comma separated file, .csv' (e.g. Excel) and 'Complete formatselect the types of information that you will need. For our purposes, select 'All available information' from the drop-down menus and choose 'Export'.
Save the file as WattsStrogatz.scopus. The resulting file can be seen in Figure 4.52j.
Figure 4.52j: Saving records in Scopus and viewing WattsStrogatz.scopus
Google Scholar data can be acquired using Publish or Perish (Harzing, 2008) that can be freely downloaded from http://www.harzing.com/pop.htm. A query for papers by Albert-László Barabási run on Sept. 21, 2008 April 29, 2016 results in 111 280 papers that have been cited 1492,343 213 times, see Figure 4.62k.
Figure 4.62k: Publish or Perish search results for Albert-László Barabási and viewing barabasiPoP.csv
To save records, select 'File > Save' from menu and then choose the appropriate file format (.csv, *.enl, or *.bib) in the 'Choose File' pop-up window. All three file formats can be read by the Sci2 Sci2 Tool. The result in all three formats named 'barabasi.' is also available in the respective subdirectories in 'yoursci2directory /sampledata/scientometrics/' and will be used later in this tutorial.
Data from Google Scholar can be used for the following types of analyses:
Please note that before any of the following algorithms can be run, the Google Citation User ID Search algorithm must be run to retrieve the Google Citation User IDs.
Attach citation Citation Table from Google Scholar
- Statistical Attributes** Cites**h-index **i10-index
- Network Analysis** Citation User ID
An individual with a Facebook account can download their Facebook Friends data and Mutual Friends data using the Sci2 Tool.
To download your Facebook data, open the Sci2 Tool and select the "Facebook" option in the file menu. In the next submenu, select the "Access Token" option. You may be asked to log into your account; afterwards, you will be redirected to a webpage that provides you with an access token. In your browser, right click on the grey text, choose "Select All", and then copy this text to your clipboard. Figure 4.2l demonstrates this process.
Fig. 4.2l: Use Sci2 to gain access and retrieval of Facebook Access Token
After you have copied your access token, return to the Sci2 Tool and the file menu. In the "Facebook" sub-menu, select either "Facebook Friends Data" or "Mutual Friends". A window will pop up, like in figure 4.2m. Paste your access token into the text field and hit the "OK" button.
Fig 4.2m: Sci2 Tool's Facebook Friends Data and Mutual Friends data load windows with access tokens
Your data will load into the data manager as a comma separated values (.csv) file for you to use. You can save the data to your computer by right clicking the file in the data manager and selecting "Save" from the menu. A new window will pop up, allowing you to choose the a directory and new name for the data being saved.
Data that you downloaded from Facebook can be used to perform
- Statistical Attributes** Gender, Interests, Political Views, Relationships
- Geospatial Analysis** Current Location and Hometown
- Topical Analysis** Status, Interests
- Network Analysis** Mutual Friends
3 Datasets: Funding
3.1 NSF Award Search
Funding data provided by the National Science Foundation (NSF) can be retrieved via the Award Search site (http://www.nsf.gov/awardsearch). Search by PI name, institution, and many other fields, see Figure 4.79.
Figure 4.72n: NSF 'Award Search' interface and search results page
To retrieve all projects funded under the Science of Science and Innovation Policy (SciSIP) program, simply select the 'Program InformationAdvanced Searcg' tab, do an 'Element Code Lookup', enter '7626' into the 'Element Code' field which is under the 'Program Information' section, and click the 'Search' button. On Sept 21st, 2008April 23, 2016, exactly 50 124 awards were found. Award records can be downloaded in csv, Excel, XML, or XML .txt format. Save file in csv format, and change the file extension from .csv to .nsf. A sample .nsf file is available in 'yoursci2directory /sampledata/scientometrics/nsf/BethPlale.nsf'. In the Sci2 Sci2 Tool, load the file using 'File > Load File'. Select "NSF csv format" in the "Load" pop-up window. A table with all records will appear in the Data Manager. View the file in Excel.
Data in NSF files can be used for the following types of analyses:
- Network Analysis** Principle Investigator** Co-PI Name(s)** Organization
- Temporal Analysis** Expiration Date** Start Date
- Geospatial Analysis** Organization City** Organization State** Organization Street Address** Organization Zip
- Topical Analysis** Abstract** NSF Organization** Title
3.2 NIH RePORTER
Funding data provided by the National Institutes of Health (NIH), and associated publications and patents, can be retrieved via the NIH RePORTER site (http://projectreporter.nih.gov/reporter.cfm). The database draws from eRA, Medline, PubMed Central, NIH Intramural, and iEdison. Search by location, PI name, category, etc., see Figure 4.82o.
Figure 4.82o: NIH RePORTER search interface and search results page
A sample search of "Epidemic" in the 'Public Health Relevance' field displays 205 results as of November 11th11th, 2009. Up to 500 results can be exported into csv or Excel format using the "Export" button at the top of the page. Save the file as a .csv and load it into the Sci2 Sci2 Tool using 'File > Load File' to perform temporal or topical analyses.
Data in NIH files can be used for the following types of analyses:
- Statistical Attributes** Type
- Temporal Analysis** Year of award
- Topical Analysis** Abstract** Project Title
- Network Analysis** Principle Investigator** Organization** Project Number
4 Datasets: Scholarly Database
Figure 4.92p: Graph of the numbers of records published each year by various organizations
Medline, U.S. patent, as well as funding data provided by the National Science Foundation and the National Institutes of Health can be downloaded from the Scholarly Database (SDB) at Indiana University. SDB supports keyword based cross-search of the different data types and data can be downloaded in bulk, see Figures 4.10 2q and 4.11 2r for interface snapshots.
Register to get a free account or use 'Email: firstname.lastname@example.org' and 'Password: nwb' to try out functionality.
Search the four databases separately or in combination for 'Creators' (authors, inventors, investigators) or terms occurring in 'Title,' 'Abstract,' or 'All Text' for all or specific years. If multiple terms are entered in a field, they are automatically combined using the Boolean operator 'OR.' Entering 'breast cancer' will match any record with 'breast' or 'cancer' in that field. Using the Boolean operator AND (for example, 'breast AND cancer') would only match records that contain both terms. Double quotations can be used to match compound terms, e.g., "breast cancer" retrieves records with the phrase "breast cancer," but not records where 'breast' and 'cancer' are present in isolation. The importance of a particular term in a query can be increased by putting a ^ and a number after the term. For instance, 'breast cancer^10' would increase the importance of matching the term 'cancer' by ten compared to matching the term 'breast.'
Figure 4.102q: Scholarly Database 'Home' page and 'Search' interface
Results are displayed in sets of 20 records, ordered by a Solr internal matching score. The first column represents the record source, the second the creators, third comes the year, then title and finally the matching score. Datasets can be downloaded in different subsets and formats for future analysis.
Figure 4.112r: Scholarly Database search results and download interfaces