Book Review for SIGIR Forum, July 1996
Information retrieval: A Health Care Perspective by William Hersh, Springer-Verlag 1996, 320 pages, Hardcover, Acid-free paper, ISBN 0-387-94454-0
I must confess to being somewhat hesitant when asked to review this book, after all, my knowledge of health care is limited to thankfully infrequent visits to my doctor. Moreover, I have a pet hate for articles which expect laymen like me to glean understanding from examples full of obtuse medical terminology. I mention this because I suspect many in the IR community will be similarly reticent about taking a look at this book. Well, let me put your minds at ease. The book indeed contains numerous medical examples, but, where they form an intrinsic part of the text, they are (almost) all explained in plain everyday language. As such they are relevant and illuminating. The book is aimed squarely at the medical informatics community, but it is without doubt a valuable, readable and motivating introduction to IR in general, and hence deserves to be widely read.
Modern medicine is a highly complex undertaking. Its practitioners, doctors, nurses, researchers and administrators, rely heavily on the availability of accurate up-to-date information. The decisions they make have very real, often life or death consequences. Medicine is thus, of necessity, a very practical discipline. Not surprising then, that it should turn whereever possible, to computerised systems to help manage an increasingly difficult task. Early chapters of the book set out to explain the problem faced by the medical profession and to introduce basic concepts including terminology and the origin and form of medical knowledge and the needs of its users. Hersh distinguishes two major forms of health care information; patient specific (individual patient's medical records including lab results, vital signs, reports, etc.) and knowledge-based (as found in medical research journals, and indexes and summaries thereof). The book is primarily concerned with retrieval of this latter type of information, although one chapter late in the book is devoted solely to the former type. This deals with the processing of the clinical narrative which, while utilising many of the conventional IR methods described in preceeding chapters, is particularly interesting because of its different goals and the special techniques developed to handle the very terse language for which busy doctors are infamous!
The introductory chapters are followed by a detailed look at the evaluation of IR systems and an examination of some specific medical databases, in particular MEDLINE and other NLM (National Library of Medicine) information systems which are referred to throughout the remainder of the text. Health care systems must be seen to be useful and useable. Indeed, Hersh more than once suggests that conventional batch mode testing of IR systems may no longer be a realistic performance indicator given the level of interaction which todays user enjoys. These chapters, then, coming as they do before any real discussion of the mechanics of IR, reflect the book's emphasis on the user. Subsequent chapters cover the details of indexing and retrieval in boolean, vector-space, word statistical, probabilistic and linguistic systems. These are interspersed with another look at evaluation and at ways to assist the user via expert help, improved access and organisation, and better indexing terms. Additional chapters deal with hypertext-based information retrieval and with the Internet.
This, then, is a book about IR for the real world. It deals only with models without going into implementation details. Inverted file structures are referred to only in passing and there is absolutely no mention of signature files. Readers who need to know about such data structures and algorithms can easily find them in specialist texts such as [1]. Personally, I find this clear demarcation between model and implementation a major strength of this text, making it uncluttered and more comprehensible than many previous works (such as [2, 3].) Indeed, I found Hersh's style to be exceptionally clear. Although I would have liked a little more depth in some areas (for example, clustering and its application to searching and browsing), overall the content and length is just about right. There was no mention of compression (perhaps a little implementation oriented, but surely important enough to warrant some mention these days, even if only via reference to other texts such as [4]), however, this was the only ommision I noticed. There are considerable between section references (both backwards and forwards) and some repetition, although this is not excessive and should mean that chapters can be read independently of one another. There is, of course, an extensive list of external reference works (around 400), for the truly inquisitive.
Hersh is particularly well qualified to author such a book. He is both a medical doctor and someone who is actively researching and teaching in this area [5]. Overall, "Information Retrieval: A Health Care Perspective" gave me a renewed appreciation for the complexities involved in medical informatics. I recommend this book, not just to the medical community for whom it is undoubtably a valuable resource, but also to those in IR generally. It is both a great introduction to the field and a poignant reminder of the difficulties involved. I hope and believe it will encourage more people to take up this worthy challenge.
Dr. David Davenport
Computer Engineering &
Information Science Dept.,
Bilkent Univeristy, 06533 Ankara - TURKEY
{ Email: david@bilkent.edu.tr }
References
[1] Frakes W.B. & Baeza-Yates R (Eds.), Information Retrieval: Data Structures and Algorithms, Englewood Cliffs, Prentice-Hall, 1992.
[2] Salton G., Automatic Text Processing, Addison Wesley, 1989.
[3] Meadow C.T., Text Information Retrieval Systems, Academic Press, 1992.
[4] Witten, Moffat & Bell, Managing Gigabytes, Van Nostrand Reinhold, 1994.
[5] See Hersh's homepage at http://www.ohsu.edu/bicc-informatics/hersh/