(Above: A snippet of the data ProPublica was given on prescribing in Medicare’s massive drug program.)
The federal government’s announcement last week that it would begin releasing data on physician payments in the Medicare program seems to have ticked off both supporters and opponents of broader transparency in medicine.
For their part, doctor groups are worried that the information to be released by the Centers for Medicare and Medicaid Services will lack context the public needs to understand it.
"The unfettered release of raw data will result in inaccurate and misleading information," AMA President Ardis Dee Hoven, MD, said in a statement to MedPage Today. "Because of this, the AMA strongly urges HHS to ensure that physician payment information is released only for efforts aimed at improving the quality of healthcare services and with appropriate safeguards."
On the other hand, healthcare hacker Fred Trotter has raised concerns about CMS’ plan to evaluate requests for the data on a case-by-case basis. That isn’t much of a policy at all, he wrote, giving federal officials too much discretion about what to release and what not to.
So, how is this all going to shake out?
Three recent examples offer some clues.
The first involves the Wall Street Journal and the Center for Public Integrity. The news organizations sued the government in 2009 to obtain records on physician claims in Medicare. They received the information they were seeking in a legal settlement, but had to agree not to publish physicians’ names in most cases. The data they received was so vast that it took data experts months just to load it into a computer, organize it and analyze it.
The news organizations never did receive a complete set of Medicare payment data. Instead, they received a 5 percent sample of the Carrier Standard Analytic File, which includes records of Medicare Part B (outpatient) billings and payments.
And that in itself was huge: In 2008 alone, it had about 42 million rows, each with 612 variables. It was about 38 gigabytes even before being imported into a database, data journalist Maurice Tamman wrote in a legal declaration. At the time, Tamman was a WSJ news editor. Tamman’s declaration was included in a successful lawsuit filed by Dow Jones (the Journal’s parent company) to lift a legal moratorium that had prevented Medicare from publicly releasing data on payments to individual physicians.
The second example is the project that my colleagues at ProPublica and I have been working on to examine how doctors and other health professionals prescribe medications in Medicare’s drug program. Instead of seeking individual medication claims, we sought aggregate records for each prescriber, grouped by drug. We gave up some information we wanted, such as characteristics of the patients, but we also were not subject to any limits in terms of our ability to name doctors.
The result is our Prescriber Checkup news application that lets consumers look up their doctor and see how he or she compares to others in their same specialty and state. Our stories identified examples of risky prescribing, high rates of name-brand prescribing and patterns that suggested fraud.
Even though we did not have individual details on every drug claim filled—more than 1 billion a year—the files we had were also vast: more than 70 million rows of data on the drugs prescribed by 1.6 million providers in 2011 alone. In cases in which a provider wrote fewer than 11 claims for a particular drug, the data were redacted.
Processing the data took us months, as well.
Finally, healthcare hacker Trotter obtained data from Medicare on referrals to and from providers within Medicare. He sought and received statistics on the number of patients who saw one doctor (Doctor A) within 30 days of seeing another doctor (Doctor B). He’s created DocGraph to show these referrals visually.
According to his website, Trotter received nearly 50 million pairs of referring parties involving about 1 million providers in 2011. Like the data ProPublica received, Trotter did not receive information on referrals in which fewer than 11 patients were involved.
Here are my takeaways:
1) Medicare is far more likely to release aggregate information than data on individual claims. This is mostly to protect patient privacy, but also because officials have grown increasingly comfortable writing programs to aggregate the data (as was the case with ProPublica and Trotter).
I would not be surprised, for instance, if Medicare released information on the number of times each provider billed for different procedures and services last year, as well as the number of patients each doctor treated, but few details about the patients themselves.
2) Expect redactions. It’s safe to assume that Medicare will redact data in which fewer than 11 patients are involved.
3) Medicare likely will not create a glamorous news application in which consumers can view the data. When the government released information on hospital charges last year, it released a big spreadsheet and left it to news organizations and others (see here and here) to come up with clever ways of displaying it. I see no reason why this will change.
4) Medicare, likewise, is unlikely to put together tip sheets and other context for interpreting the data. While the program should—and probably will—release basic information about what is being released, I don’t think officials will tell consumers how much weight they should give it. That’s up to the media and physician groups. If these groups, including the AMA, are dissatisfied with the media’s presentation of the data, they are welcome to create their own site with the data and the context they believe is important.
5) There will be far more requests for Medicare physician data than there will be Medicare staff assigned or available to fulfill them. This is a process that will take time and everyone should be patient.
6) Those wanting every morsel of Medicare data to be released will likely be disappointed. This is a massive, immensely complicated program with many interrelated parts. I would expect more information to be released each year, but it won’t happen overnight.
7) Finally, few news organizations or research groups are equipped to deal with such large data sets and produce meaningful content quickly. As noted above, all of this will take time.
All that said, let the data releases begin.