Special Report: Voice recognition


Speaking up for voice recognition

The NHS has been slow to adopt voice recognition outside radiology and pathology; and it has not always seen the promised financial benefits when it has done so. New ways of introducing the technology, the new discharge letter target, the pressing need to save money, and integration with electronic patient record systems could yet change that. Kim Thomas reports.

One thing almost everybody agrees about voice recognition is that accuracy rates have shot up in the past three or four years.

This makes it an efficient way to create a document – faster than typing it yourself, and cheaper than paying a secretary to type it. Yet although radiologists and pathologists now use voice recognition as a matter of course, adoption in other areas is patchy at best. Why?

Back end voice recognition will only take you so far

The answer is to do with the traditional, embedded processes and management structures of hospital trusts. In most hospitals, a clinician dictates a letter or a report into a recorder, who then passes it to a medical secretary (or outsourcing agency) for transcription.

They return it to the clinician for checking and signing. It is possible to impose “backend” voice recognition on this process.

The clinician dictates the document into the computer via a microphone, creating an audio file. The software then transcribes the audio file and the secretary proofreads the transcription before returning it to the clinician to check and sign.

This process, however, is still inefficient. Malcolm Grant, managing director of GHG Software Development, which produces TalkingPoint voice recognition software, points out that the secretary has to listen to the recording to make sure it’s accurate.

By the time the time spent checking and correcting the file is factored in, the time saved is only about 10-15%.

The checking pool

It is possible to save more time by moving from a departmental model for medical secretaries, who may be doing multiple tasks, to a central pool of staff who check the voice-transcribed documents as they come in.

This is the model favoured Dictate IT customers such as the Royal Free London NHS Foundation Hospital Trust.

A team of eight Dictate IT staff is employed on site to check and finalise voice transcripts produced by clinicians – work previously carried out by 147 people. Mark Miller, Dictate IT’s managing director says that all the reports are ready by 3pm the following day.

Centralisation is the only way that backend voice recognition can be made to work efficiently, says Miller: “If you’re not prepared to do the underlying reorganisation, then don’t bother wasting your money on speech recognition engines.”

Front end comes to the fore

The alternative approach – that favoured in radiology and pathology departments – is to use front end (or “interactive” recognition), in which the clinician speaks into the voice recognition software, sees the words appear and checks the document.

This bypasses the need for a secretary (except, perhaps, for adding patient details). Historically, trusts have been reluctant to adopt this because clinicians, for the most part, have not been enthusiastic about taking on the administrative burden.

But vendors are reporting an increase in interest in front end recognition from trusts. The change is being driven by two factors.

One is increased adoption of electronic patient records, because, as radiologists have found, voice recognition is usually a quicker way of entering information into the record than typing.

The latest version of Nuance’s Dragon Medical software works with all the main EPR systems, and, says, Alan Fowles, senior vice president, healthcare international sales and operations at Nuance: “We see speech recognition and the EPR or clinical information system going hand-in-hand.”

The second factor is an apparent decline in the number of medical secretaries, which has resulted in some clinicians carrying out their own administrative tasks. Jim Stapleton, head of operations at G2Speech, says: “The onus is on the consultant to be able to type themselves.”

Stapleton says the company has seen a move in its existing customer base towards the use of front end speech recognition in clinics. In some cases, the consultant dictates the discharge letter during the clinic.

“We have a few consultants in trusts who will do interactive voice recognition, print the letter out and hand it to the patient directly,” he says.

Talking to the EPR (literally)

Both Medisec and GHG have seen a similar increase in uptake in clinics. The process has been made smoother by the integration of voice recognition software with the patient administration system so that patient details are included automatically.

This is crucial to improving efficiency, says Ceri Rothwell, client relationship director at Medisec. “A speech-recognised piece of text is no good unless it’s integrated within a clinical system of some sort.”

Medisec has found that front end adoption can be increased by allowing clinicians to begin with back end usage. The clinician dictates the document as usual, and it is transcribed in a back end process.

The secretary then checks and corrects the content. But the voice recognition software learns from the corrections, which means that accuracy improves.

When it reaches 99% accuracy for a particular clinician, the clinician can then adopt front end recognition, spending a few seconds on correction, rather than several minutes.

It makes it easier to tackle the reluctance of consultants to use the technology, says Rothwell. “You’re taking all the heat out of that initial confrontation with very experienced senior clinicians on site.”

Targets, money

Trusts are now required to meet Commissioning for Quality and Innovation targets that discharge documentation should be sent electronically and should reach the GP surgery within 24 hours.

Front end recognition will make it easier to meet those targets, says Stapleton: “With instant turnaround you’re going to reduce your penalties.”

Grant argues that, as well as providing an efficiency gain, the move to clinicians dictating their own letters provides patient benefits.

“If you talk to a lot of GPs or clinicians, they will say their letters have improved since they used this system.” The ability to see the document in front of them immediately after dictating it enables them to edit it for accuracy and detail while it is fresh in their minds, he says.

Front end voice recognition could be particularly useful in the context of community care where traditionally health professionals have handwritten their notes and then had to type them, or hand them to others for typing, on return to the clinic.

Dictating notes directly into a laptop or tablet could save time and money. Both Nuance and G2Speech report an interest from mental health trusts – a move to voice recognition, suggests Stapleton, saves not just the cost of administration but of shipping tapes to secretaries.

No-one thinks that NHS trusts are in a hurry to adopt voice recognition: barriers of cost, entrenched working practices and consultant opposition are not easy to overcome.

But G2Speech, like other vendors, is seeing “gradual and consistent” adoption. “We tend to find that one hospital tries it in one department that for a particular reason can’t cope, find that it delivers and then they’ll put it in other departments,” says Grant.

He is confident that in the long-term the benefits will speak for themselves: “There are not many things that improve clinical care and reduce costs, and this is one of them.”