This month I’m going to take a break from statistical methods and focus, instead, on a particular statistic: citation counts. Recently I discovered Google Scholar Citations, and I was blown away. Introduced in November 2011, this web-based service has the potential to revolutionize the counting of citations to individual persons and their published works. And I don’t think the academic/scientific world will ever be quite the same.
In this post, I’m going to (a) tell you what I know about Google Scholar Citations, (b) compare it with some of its competitors, and (c) make some predictions about how it will be received. In a later post, I’ll share some thoughts about the pros and cons of these developments.
I’m a citation counter from way back. I began my career as a sociologist of science in the mid 1970s at Cornell where I worked with Scott Long and Bob McGinnis. As part of a project to study the careers of biochemists, we painstakingly counted their citations from the many-volumed set of the Science Citation Index. Actually, most of the counting was done by a small army of undergraduates who spent endless hours in the science library. I’ve also published a few articles on the measurement properties of citation counts. (Click here to see, e.g., articles by Long et al. (1979, 1980), Allison (1980), Hargens et al. (1976)).
Citation counting got a lot easier when the Science Citation Index moved to the internet, where it became known as the Web of Science. But it’s still tedious work if you’re trying to count citations to persons. That’s because many people have names in common (or at least last names and initials), and it’s often hard to distinguish one person from another. To be accurate, you really need to work from a CV so you know which publications to include and which to ignore.
Google solves this problem by letting each author be the curator of his or her own collected works. Actually, Google was not the first to do this. In 2008 Thomson Reuters, the folks responsible for the Web of Science, introduced ResearcherID. This web-based system allows any researcher to get a unique ID number and to develop a list of publications attached to that number. (My ResearcherID is A-1345-2007).
Just last year, an ambitious non-profit group launched Open Researcher and Contributor ID (ORCID) using software based on the Thomson Reuters system. (Who knew that telling researchers apart was such a big deal?) ORCID has the potential to be the Google’s biggest competitor in this game, but I’ll say more about that later. (My ORCID is 0000-0002-0646-5242).
The problem with both ResearcherID and ORCID is that their software is cumbersome, and it’s hard to generate a truly comprehensive list of all your articles and books. ORCID does not even count citations and is not yet searchable for researchers (other than yourself).
Google’s system is easier, more comprehensive, more interesting and much more fun. In just a few minutes, you can generate a list of all your publications. As some have reported to me, you may even find a few articles that you had neglected to include on your CV. On your citation page, Google displays the list of your publications along with the number of citations each one has ever received. You can find my citation page here.
The default is to sort the list by number of citations, although you can also sort by year. If you click on the title of a publication, you get a wealth of information about that item, often including links to a pdf file of the publication itself. If you click on the citation count for a publication, you get a list of all the publications that have cited that publication. That list can be subsetted by any specified date range.
Located above the list of publications is your total citation count and a count of citations made in the last five years. It’s not clear how far back Google goes for the total citation count. Also reported are the h-index, the i10-index, and a graph of your citation counts by year.
You can choose to make your citation page public or private. Citation counts are automatically updated, and you can request email alerts whenever any of your publications is cited. You can also request that your publications be automatically updated.
When you create your profile, you can freely specify your fields of interest. For example, I listed quantitative methods, statistics, and sociology of science. Here’s where things get really interesting. When you click on the field name, you get a list of everyone else who has chosen that same field and made their page public, ranked by total citation count. Clicking on each name takes you to that person’s citation page, where you can also explore other fields that they have listed.
Will Google Scholar Citations take off? Personally, I think it’s irresistible. Of course, Google makes it easy to invite others to join, so I’m expecting this service to go viral within the next year.
The biggest potential problem with Google Scholar Citations (a problem shared with ResearcherID and ORCID) is that researchers can lay claim to publications that are not their own, often unintentionally. Google presents you with a list of possible articles or sets of articles and then asks you to verify them. If you’re not careful, you can easily add articles by other people with the same or similar name.
For example, I recently looked at the citation page of a young researcher with a common Chinese surname. I was startled to see that she had over 8,000 citations even though she had been publishing for only a few years. A quick look at the publication list made it evidently clear that most of the articles were not hers. So it would certainly be unwise to use Google Scholar Citations as a basis for hiring or promotions without careful scrutiny of the publication list.
I expect that eventually this problem will solve itself. People will begin to take their citation page as seriously as they take their CV, and peer pressure will induce the vast majority of researchers to “keep it clean.” Google could help by providing a mechanism whereby other researchers could flag potential errors they find in someone else’s publication list.
Another “problem” with Google Scholar is that it counts citations from just about anything it can find on the web, including power point slides, course syllabi, unpublished papers, and blogs. Consequently, citations counts from Google Scholar are typically much larger than those from the Web of Science, which only counts citations from an approved list of journals. For example, my total citation count from Google is just about double what I get from the Web of Science, and others have reported similar ratios. Some people may view this as a feature rather than a bug.
A third bug/feature is the option to keep your page private. That’s great for your privacy but not great for other people who want to find out about you and your work. Again, my expectation is that peer pressure will eventually induce most researchers to make their page public, especially among those with significant numbers of publications and citations.
Long-term, the most serious competitor to Google is ORCID which is non-profit, “community based”, and has the sponsorship of approximately 20 major universities, libraries, research institutions, and publishers (including Thomson Reuters). ORCID, Inc. is attempting to position itself as the official arbiter of who is who in the research community. They also charge hefty fees to their institutional members in return for access to their software interface. In the near future, I fully expect that these institutions (and others) will require that all their members get an ORCID. (Currently, ORCID claims to have signed up about 43,000 researchers). Journals and granting agencies may also require that all submitters have an ORCID.
At present, however, ORCID’s user interface is way too limited to be of much use to practicing researchers. And it’s not clear that they have any method for preventing researchers from claiming publications that are not theirs. Finally, as far as I know, there’s nothing to prevent Google from becoming an institutional member of ORCID and piggy-backing on its ID system.
So, my prediction is that virtually every researcher is going to have number. It may be the static, permanent number of ORCID. Or it may simply be the total citation count, which can get larger every day.
In this post, I’ve told you only what I think is going to happen. In a later post, I’ll share with some thoughts about whether this is a good or bad thing for the scientific/academic community. Before I do that, I’d love to hear what you think.