The different lists of university rankings have attracted increasing attention because of their potential as a weapon in the increasingly fierce global competition between universities. A university that is confronted with a lower position in the rankings has to provide a plausible explanation. And universities that are placed on a higher place in the list naturally celebrate this. Let us take a look at the Netherlands. A few weeks ago, the Leiden Ranking produced by CWTS was good news for the Erasmus University (EUR) in Rotterdam. They were placed as the 6th university in Europe. The university immediately published an advertisement in the national newspapers to congratulate its researchers with this leading position in the Netherlands. The advertisement had the facts right, but it emphasized the criterion that puts the EUR highest (number 6 in the list of 100 largest European universities): the number of citations per publication. This indicator is favorable for universities with large medical faculties and hospitals, because these are large research fields with on average much more references and citations than, for example, in the technical sciences or in philosophy. And it matters which universities are used as the relevant group to rank. Using the same indicator of citations per paper puts the EUR number 9 among the 250 largest European universities, because 3 smaller universities appear in the top before even Oxford and Cambridge. Still a very good score and still number 1 of the Netherlands in this ranking. But how does it look when we use other indicators? CWTS now uses two different indicators to take field differences into account. How does the EUR score in these lists? The traditional CWTS "crown indicator" puts the EUR on number 8 among the 100 largest and number 14 among the 250 largest European universities. The improved CWTS indicator gives the EUR a score of 11 among the 100 largest and 15 among the 250 largest universities in Europe. In all these cases, the EUR is highest among the Dutch universities. If size is taken into account in combination with quality, however, the University of Utrecht has the highest score in the Netherlands (nr. 8) and the EUR ends on position 20, after Utrecht and the University of Amsterdam.
So what is the lesson here? First, ranking is a pretty complicated affair because there are many ways to rank universities. Rankings simplify these comparisons of many different dimensions. The universities are forced to build on this and reduce this complexity even further. This is facilitated by the fact that the different rankings produce different results. It enables universities to choose the most favorable ranking. It also enables universities to debunk a ranking by pointing to other results in the other rankings or even debunk the ranking as such by showing contradictions among ranking results. However, this does not disempower these rankings. As Richard Griffiths (professor of social and economic history in Leiden) stated two weeks ago in the university weekly Mare: "Such a list can be a pile of junk, but it is best not to be in the bottom of the pile." Universities are therefore also discussing to what extent mergers can help to improve their ranking scores. For example, it might be profitable for a technical university to be coupled to a large academic hospital.
Not only individual universities are actively engaged in the debate about rankings, the same holds for associations of universities. The Dutch university association VSNU concluded from the Times Higher Education Supplement (THES) ranking that the Netherlands is the fifth best academic country in the world. As science journalist Martijn van Calmthout wrote in De Volkskrant: this requires some creativity because the Netherlands as a whole does no longer belong to the world top (which does not mean that there are no fields where Dutch researchers belong to the best performers in the world). No Dutch university belongs to the 100 best universities in this ranking (which uses a very different set of indicators from the Leiden Ranking, see the next blog post). In fact, the Dutch universities group together pretty close. Their relative position depends on the indicator used. Leiden scores highest when external funding is the main criterion in the THES ranking. And Shanghai puts Utrecht highest (number 50 in the world list) followed by Leiden (at 70). How significant are the differences among the Dutch universities actually?
The differences between the different rankings creates a drive to keep producing new indicators to capture aspects and dimensions of quality that are not measured satisfactorily in the existing ones. This cannot go on endlessly. It may be time to take the perverse effects of this one-dimensional ranking more seriously. One way is to further develop truly multi-dimensional indicators, another to investigate the underlying properties of indicators more thoroughly, and a third to take the limits of indicators more seriously, especially in science policy. Will it be possible to combine these three strategies?
An observation at the CWTS Graduate Course Measuring Science: in most lectures, the presenters emphasize not only how indicators can be constructed, measured, and used, but also under what circumstances they should not be applied. Thed van Leeuwen, for example, showed on the basis of the coverage data of the Web of Science that citation analysis should not be applied in many fields in the humanities and social sciences, and certainly not for evaluation purposes. If the references in scientific articles in the Web of Science are analyzed, there are strong field differences in the extent to which they cite articles that are themselves covered by the Web of Science. In biochemistry this is very high (92 %), whereas in the humanities this drops to below 17 %. Since citation analysis is almost always based on Web of Science data, most relevant data on communication in the humanities is missed by citation analysis. Of course, this is well-known and it is the usual argument in the humanities and social sciences against the application of citation analysis. However, this also has meant that most scholars see CWTS principally as associated with any use of citation analysis. CWTS does currently not have a strong reputation as the source of critique of citation analysis, although it has systematically, at least since 1995, criticized the Impact Factor and has also been very critical of the very popular and equally problematic h-index. Interesting mismatch between practice and reputation?
Ron de Kloet, professor in medical pharmacology in Leiden and famous for his research on stress, about the journal impact factor in the university weekly Mare (my translation): "In the past, we did not have this complete idiocy around impact numbers". He thinks that those who have to judge scientists on their performance rely too easily on the journal impact factor. "In this way, the journal rather than the researcher is being assessed. And young researchers know that not their individual creativity counts but the visibility of the journal. This can make people obsessed and take away the pleasure in science." Wise words!
At the STI conference 2010 my colleagues Andrea Scharnhorst, Krzysztof Suchecki from the Virtual Knowledge Studio and I presented our work in progress on modeling the peer review system. The basic idea is simple: is it possible to model the peer review system as if it were a computer game such as Simcity? We followed a strategy where we try to make the model as simple and stupid as possible. So, iniitally we are not trying to mimic reality, but to set up an extremely simplified model of how peer review works in science and academia. Our model consists of two populations: researchers and journals. The researchers have two different roles: they are authors of scientific papers and they are reviewers who judge the quality of scientific papers written by other researchers. Each researcher has her own specific behaviour and the same holds for the journals. The trick of the model is that we incorporated a simulation of quality control, using multi-dimensional vectors. This is extracted from what we know how peer review works. Bascially, reviewers are comparing what they perceive of the work in different dimensions (such as the quality of writing, the images, the statistical reliability, how interesting the quesions are, etc.) with what they perceive as the required quality. We assume that this expected quality relates to the quality of the work that the researcher produces herself. The project is in an early stage, and we are now in the process of writing it up for a proper first publication, mainly on the methodology. At the conference we presented the following poster, that contains more details (I posted it on my Facebook account since this blog software system is apparently not able to process images unless they are very small):
The Erasmus University has opened the new academic year last week by embracing Open Access for all its research publications. From 1 January 2011, it will be obligatory for researchers at the university to deposit their publications, after peer review and corrections, in the institutional repository RePub. The repository staff will take care of web based storage and accessibility in accordance with the specific requirements of the publisher of the research article. According to the Rector Magnificus of Rotterdam, prof. Henk Schmidt, the university aims to make a big leap forward in open access. "Research has made clear that Open Access publications lead to an increase in the number of citations of scientific work". He emphasized that open access is desirable from both a societal and a scientific point of view. The step by the Erasmus University clearly also has the potential to make academic work that has a different form from an article in the traditional research journals more visible and citeable.
Recently, I read Daniel Kehlmann’s ficitonal history about Alexander von Humboldt and Carl Friedrich Gauss, Die Vermessung der Welt. intriguing way to write history of science, because it enables the author to insert internal dialogues which are actually quite plausible, yet by definition unproveable. The two characters are quite different and perhaps symbolize the two basic modalities in quantitative research, recognizable also within the field of scientometrics. Alexander von Humboldt is the outgoing guy, travelling the whole world. He is interested in the particulars of objects, collects huge amounts of birds, stones, insects, plants and describes their characteristics meticulously . Gauss, on the other hand, wants to stay home and thinks about the mathematical properties of the universe. He is interested in the fundamentals of mathematical operations and suspects that they can shed light on the structure of reality. In scientometrics, these two different attitudes come together but never without a fight. Building indicators means thinking through both the mathematical properties of indicators, because this directly affects the question of what the indicator is actually supposed to measure. In technical terms, the validity of the indicator. One also needs other types of insight to understand the validity, such as about what researchers are actually doing in their day to day routines, but a firm grip on the mathematical structure of indicators is indispensable. At the same time, the other attitude is also required. Von Humboldt’s interest in statistical description gives insight into the range of phenomena that one can describe with a particular indicator. A good scientometric group, in other words, needs both people like Gauss and people like Von Humboldt. And indeed, both types are present at CWTS. Let us see how the interactions between them will stimulate new fundamental research in scientometrics and indicator building.
The book has also some interesting observations about the obsession of the key actors for measuring the world and the universe. When Alexander von Humboldt travels through South America, he meets a priest Father Zea, who is sceptical about his expedition. He suspects that space is actually created by the people trying to measure space. He mocks Von Humboldt and reminds him of the time "when the things were not yet used to being measured". in that past, three stones were not yet equal to three leaves and fifteen grams of earth were not yet the same weight as fifteen grams of peas. Interesting idea of the things that need to get used to being measured, especially now that we are tagging our natural and social environments increasingly with RFID tags, social networking sites and smart phone applications such as Layar which adds a virtual reality layer of information to your current location. Later in the book, Gauss adds to this by pondering that his work in surveying (which he did for the money) did not only measure the land, but created a new reality by this act of measuring. Before, there had been only trees, moss, stones, and grass. After his work, a network of lines, angles, and numbers had been added to this. Gauss wondered whether Von Humboldt would be able to understand this.
Next week, we will host the 11th International Conference on Science and Technology Indicators here at CWTS in Leiden. The house will be packed.
For me, it will be a great opportunity to get updated about the latest developments in the field of STI indicator research. I am especially interested in five different areas: the role of web based data and indicators; changes in the process of evaluation; indicators for the humanities and social sciences; indicators for emerging types of scientific and scholarly output; and last but not least the constructive roles of science and technology indicators.
It is clear that more researchers are engaged in web based ways of working. This may mean that web based indicators are also becoming more relevant. However, this raises new problems with respect reliability and validity. Another question is whether the web will stimulate "lay scientometrics" where in principle anyone can do pretty sophisticated statistics with the help of software agents and robots. Will this create new challenges for professional scientometricians?
The web promises to help also in another area: the creation of indicators that do justice to the way researchers and scholars in the humanities work. It is well-known problem that international peer reviewed journals are not always the predominant outlet for research in these areas. Writing books in other languages than English is often more relevant. New media moreover enable forms like films, performances, blogs and wikis. These alternative forms are currently not well covered in STI indicators. This raises the question how the web can help to develop indicators that do more justice to the actual research work that humanists and social scientists are doing. It also raises a dilemma: should we try to capture all relevant work in indicators? What are the downsides of "too much information"?
This points to the way we are building an increasingly complex society, where knowledge and social interaction is made measurable in new ways (think about how retailers are monitoring their clients through their client cards), and where these measurements are fed back into the cycle of knowledge creation. I am curious how this will play out at the STI conference.