Archive for Biostatistics

Evaluating cancer drugs at FDA

In the June 2nd paper issue of BusinessWeek (published online 5/21) the article “Cancer’s Cruel Economics” by Catherine Arnst provides a high-level look at the difficulty some small copmpanies are facing getting their cancer therapies approved in the US. 

The focus is on cancer immunotherapies, particularly Antigenics’ Oncophage.  I last discussed Oncophage in 2006, after the first report of its Phase 3 results.  I have also devoted space in this forum to other cancer therapies mentioned in the BusinessWeek article including Dendreon’s Provenge and Genitope’s MyVax.

I was intrigued by a quote attributed to Richard Pazdur, head of CDER’s Oncology review division:

“[Post hoc subgroup analysis for differential treatment effect] is like shooting an arrow and then painting the bull’s-eye around it,” says Pazdur. “You cannot use subset analysis to salvage a failed trial.”

Pazdur’s concern regarding treatment effect inferences derived from post hoc subgroup (subset) analyses rests on firm grounds, but the quote suggests a black and white attitude towards their utility, without any room for compromise.  That’s too bad, because the rule of thumb Pazdur is apparently using to reject subgroup evidence of efficacy is imperfect, undoubtedly resulting in the rejection of some effective therapies.

I’m not going to write a manuscript-length post describing the many risks inherent in inferential subgroup analyses.  There are many published reviews you can find that do that.  Suffice to say that the risks of both false-negative and false-positive inferences are inflated with subgroup analyses relative to the main analysis (primary hypothesis test), whether the analyses are pre hoc (defined before the trial results accrue) or post hoc (sometimes called retrospective).  Pre hoc analyses are less susceptible to Pazdur’s target drawing, especially when the specifics of the subgroup are rigorously pre-defined than post hoc, and so they are preferred by regulators.

What I’ve found to be less well represented in the literature is a situation in which the weight of evidence presented in the subgroup analysis is sufficient to, as Pazdur says, “rescue a failed study.”  I’ll not focus specifically on Provenge or Oncophage, but the example from the literature I’ll cite is relevant to both.

In order to determine whether any subgroup analyses provide evidence sufficient to warrant drug approval, it is necessary to first know the expected false-positive rate of a subgroup analysis, given a false-positive rate of 5% in the overall (main) analysis.  A 5% rate is chosen, because that rate is usually considered an acceptable one by clinical practitioners and drug regulators.  Thus, the null hypothesis is rejected falsely in 1 in 20 trials.  FDA usually requires two independent experiments (trials) for evidence of efficacy, resulting in an overall false-positive rate of 2.5% (0.05×0.05), though one statistically significant experiment with corroborating evidence from others is sometimes sufficient, particularly for accelerated approvals.

In an important study published in 2001 by the UK’s NHS, Brookes et al used simulations of 100,000 clinical trials each to determine the false-positive (and false-negative) rates of subgroup analyses for different types of study designs.  They simulated two subgroup analysis (ignoring the effects of multiple analyses, which inflate Type 1 error) and tested a variety of relative treatment effect and subgroup sizes.

The simulations showed that when there was in reality no main or subgroup effect of treatment, and the overall (main) analysis of treatment was falsely positive (i.e. null hypothesis was rejected at the nominal p<0.05) then the chance of falsely declaring one subgroup as demonstrating a treatment effect was high.  For a survival study, this chance was 61%.  In other words, with no real main treatment or subgroup effect, when there was a false-positive main effect, one of two subgroups analyzed will appear to have a treatment effect well over half the time. 

However, under the same set of circumstances, when the main effect is not rejected (i.e. a true negative inference is made), then one subgroup will show evidence of a treatment effect much less often, only 6.5% of the time, approaching the overall effect false-positive rate of 5%.  In other words, the probability of falsely rejecting a subgroup-specific null hypothesis in the absence of overall and subgroup effects is reasonably low if the overall effect is correctly negative.

Of course, the above simulation findings aren’t by themselves capable of determining whether a subgroup-specific effect is real or not.  They simply suggest that the regulator need not reject out-of-hand statistical evidence of a subgroup-differential treatment effect when evidence for an overall effect is absent, as Dr. Pazdur’s quote suggests he is willing to do in some cases. 

Evidence that the apparent subgroup effect in a survival study is real will be strengthened by the following factors:

  • The main effect does not contradict the purported subgroup effect 
  • The subgroup-specific analysis was defined a priori
  • A significant test of interaction between overall treatment effect and the subgroup is in evidence prior to any subgroup-specific test
  • The total number of subgroups analyzed is small, and, if not, an inference of treatment effect made on any one subgroup uses an appropriately conservative adjustment of the significance level
  • There is strong biological plausibility for the differential subgroup effect
  • The size of the subgroup is large relative to the total sample size (i.e. relatively representative of the total population)
  • The conduct of the study, particularly the handling of dropouts and non-compliant subjects, creates confidence in the quality of the subgroup data

Finally, as I’ve argued before in the case of Provenge, when the evidence of efficacy is marginal, regulators have a duty to the public they serve to weigh with utmost care and without bias the risk of introducing an ineffective medicine versus the risk of withholding ready availability of an effective medicine from a gravely ill population without other treatment options. 

Sphere: Related Content

Comments (3)

Hearing on FDA’s Role in Evaluating Safety of Avandia

Yesterday’s Hearing on FDA’s Role in Evaluating Safety of Avandia held by the U.S. House Committee on Oversight and Government Reform made for entertaining TV.  Most of the elements of a good courtroom drama were on display:  a protaganist whose interests in ethics and science keep him too busy to pay attention to the effects his pronouncements have on financial markets (played by Dr. Steve Nissen); a fact-challenged villain (the TZEs?) out to destroy the reputation of said protagonist by exposing his lust for glory (played by Rep. Darrell Issa); and a captive forum filled with minor characters who fill the backstory of the hero, challenge the villain, and stand at the ready to interject the bits of comic relief and moralizing needed for pacing.  On the downside, the third act was very weak, and I missed the caustic irony and Adamsesque speechifying of James Spader’s character on Boston Legal (hey…why not invite Mr. Spader and a couple of BL writers to sit in on some of these hearings; it would surely boost rating).

For those of you who didn’t catch the action live, you can watch it using the above link.  Other than the entertainment value, though, there was much of substance to take away from the event.  One of the take aways of some interest was that FDA Commish von Eschenbach stated, after several minutes of hemming and hawing that FDA has all the power it needs to enforce existing rules governing DTC advertising.  It simply lacks the resources to do the job properly.  Other than that, much of what was said by FDA staff and the Congressmen present was rehash of what we’ve heard surrounding the PDUFA renewal debates.  There were some hints that an appetite exists among FDA and in Congress to rethink the extent of premarketing risk assessments for chronic-use drugs intended for large populations, but I wasn’t convinced that this event was enough impetus to keep the ball rolling.  Regular readers know that I support such rethinking.

I also wanted to at least try to clarify the panelists’ (esp. Bruce Psaty’s) response to the question that Rep. Issa kept asking everyone he could corner regarding the exclusion of zero-event studies from the Nissen meta-analysis.  Issa kept asserting that exclusion of the zero-event studies (i.e. those without any events of myocardial infarction, MI, or death) from the meta-analysis was inappropriate, because it reduced the apparent incidence of MI.  It is correct that eliminating zero-event studies would reduce the apparent MI incidence rate (i.e. the rate of appearance of new MI events during the observation intervals).  However, the meta-analysis used by Nissen did not use the incidence rate of MI in the rosiglitazone group and compare it with the incidence rates in the other treatment groups.  If it had, the authors would reported an incidence rate ratio (IRR), or a relative risk, which displays the ratio of probabilities of observing the outcome for each treatment comparison.  In order to use the relative risk, the meta-analysis should have access to every study and all observations, and, to get the best estimate, the observation interval should be the same among the pooled studies.  Given these restrictions, use of the relative risk can be problematic in the real world.  Nissen couldn’t be sure these conditions would be met by his analysis, so he wisely chose to calculate an odds ratio for each of the treatment comparisons, which isn’t subject to as many restrictions for proper interpretation. 

The odds ratio as used in the paper is simply the ratio of the odds of observing an event in each of the various treatment groups compared the pooled studies.  It is calculated by dividing the number of events by the number of non-events in each group then calculating the ratio between groups.  In other words, if there were 10 total observations in one group and 6 were MI or death events, the odds of an MI or death event in that group would be 6/4 or 1.5.  If the same total observations (10) occurred in a second group, but only 4 MI or death events were observed, the odds ratio would be:  (6/4)/(4/6)=2.25.  If the observation interval were the same in each study and the relative risk were calculated instead, here is what it would be for the same data:  (6/10)/(4/10)=1.5.  So, in this example, the odds ratio would be higher the relative risk, making the problem appear worse then it might otherwise be perceived (as Issa accused Nissen of doing).  But this example uses a common event that occurs in around half of the patients.  Instead, if we consider a relatively uncommon event that occurs in less than 2% of patients (like in the Nissen paper), we’ll see a different result.  Let’s say that the total observations in each group is 100.  In group A, the event occurs in 2 patients and does not occur in 98.  In group B, the event occurs in 1 patient and does not occur in 99.  The odds ratio for A:B is:  (2/98)/(1/99)=2.02.  The relative risk for A:B is (2/100)/(1/100)=2.00.  Now, you can see that the odds ratio closely approximates the relative risk.  This is simply a reflection of the nature of numbers, fractions in particular, where a risk reduction of 20% (i.e. a rel. risk of 0.8) is not analogous to a risk increase of 20% (i.e. a rel risk of 1.2) but is instead analogous to a risk increase of 25% (i.e. rel risk of 1.25, the inverse of 0.8).  I’ve found that this point is easily understood but frequently not considered by clinicians.  Perhaps it is not as easily understood by Republican Congressmen from California ;-) ?

Sphere: Related Content

Comments (1)

ADAPT: The Wrong Way to Stop a Clinical Trial

In his Nov. 17th editorial “ADAPT: The Wrong Way to Stop a Clinical Trial” Steve Nissen of the Cleveland Clinic blasts the NIH and PI of the ADAPT trial (a comparison of celocoxib, naproxen and placebo in Alzheimer’s disease) for terminating the trial unnecessarily.  Kudos to Dr. Nissen for his bold public statements.

The actions to stop the ADAPT trial are clearly antithetical to widely accepted prinicples for conducting interim safety analyses, as I have described in a previous blog post and recently published white paper.  It’s shocking that the U.S.’s leading health-research authority would be responsible for such blatant scientific misconduct.  It’s the type of action that should result in a GAO investigation and in the loss of job for those found to be chiefly responsible.

Sphere: Related Content

Comments

Bipartisan call for GAO to investigate FDA’s use of inappropriate studies for drug approvals

See Congressman Markey’s web page for details, as they’re too long to reproduce here:  Congressman Edward Markey - September 6, 2006 - BIPARTISAN CALL FOR GAO TO INVESTIGATE FDA’S USE OF INAPPROPRIATE STUDIES FOR DRUG APPROVALS

In a nutshell, U.S. Reps. Edward Markey (D-MA), John Dingell (D-MI), Henry Waxman (D-CA), Bart Stupak (D-MI) of the House Energy and Commerce Committee, and Senate Finance Committee Chairman Charles Grassley (R-IA) have requested that the Government Accountability Office (GAO) investigate FDA’s reliance on non-inferiority studies, with particular reference to antimicrobial drugs.

The Congressmen are proposing the GAO crawl way up FDA’s butt for this investigation, which apparently was spurred by Congressional committee hearings into the FDA approval of Sanofi-Aventis’ antibiotic Ketek (see, for instance, this article on lawyersandsettlements.com).  The list of requested inquiries includes:

1. In the past 10 years, a list of all of those products approved by FDA Office of Antimicrobial Products that established effectiveness on the basis of non-inferiority studies. For each product:
    a. The indication for which the product was approved; whether this indication is or is not serious and life threatening (pursuant to the FDA’s definition contained in 21 CFR 312.81);
    b. The sponsor of the NDA;
    c. The date on which the product was approved;
    d. The name(s) of the active control drug(s) or comparator(s);
    e. Whether the comparator was approved in the U.S.;
    f. The margin used in the trial;
    g. The treatment difference between the active control and the test drug and the associated confidence intervals;
    h. Any groups or subgroup analyses included in labeling;
    i. Whether the active control drug used to establish non-inferiority for each medication was itself approved on the basis of a placebo study or other superiority trial, or if it too was approved on the basis of non-inferiority;
    j. A copy of the explanation contained in the applicant’s analysis of the study for why the results of the non-inferiority trials could be believed to assure the effectiveness of the drug (as required by 21 CFR 314. l26(b)(2)(iv)) and, if the required explanation was not included in the analysis of the study, an explanation as to why it was not included; and
    k. A copy of any analysis(es) by FDA staff relevant to whether the non-inferiority trial or trials were adequate to establish the effectiveness of the new drug.

The Congressmen also asked for information relating to FDA identification of and guards against “biocreep.” [Biocreep is described by FDA as the selection of successively less effective comparator agents, which individually fit a statistical confidence interval relative to the product to which it was compared.  This process, over time, may result in the presumed ‘equivalence’ of statistically and clinically inequivalent products.]

They also wanted to know the results of an internal FDA conference on non-inferiority held last year and whether FDA’s acceptance of non-inferiority trials to establish drug effectiveness adheres to the principles for such trials that the agency has set forth (e.g. in ICH E9).

Wow.  I can’t recall a Congressional request for a GAO investigation that is intended to second-guess and micromanage FDA’s scientific processes as much as this one.  It begs the question of motivating forces at work: Is this just political grand-standing or is it more indicative of genuine overseer skepticism of FDA’s sincerity and effectiveness?  Given the political season, I suspect the former is at least as operative as the latter. 

Which is not to say that Markey et al’s concerns are not worthy of public discussion.  They are.  Non-inferiority studies are subject to big-time biases and inaccurate conclusions if not planned and scrutinized carefully.  The documents linked above attest to FDA’s awareness of these challenges, so what is the scientific basis for this investigation request?  Is this a fishing expedition, or does this group of Congressmen already have evidence (perhaps from an insider) that FDA has shirked its scientific responsibilities? 

Personally, I’d like to see the Congressmen back off of their request and instead request that FDA hold a special advisory committee hearing into this issue, where experts (including public guests) can give testimony and hold open debate.  Such an advisory committee could be a joint session of three standing committees: the Drug Safety and Risk Management Committee, the Anti-Infective Drugs Committee and the Pharmaceutical Science Committee.  Ad hoc members could be added to ensure complete and balanced expertise in non-inferiority study design and interpetation.

Evaluation of scientific evidence is best left in the hands of scientists.  Let scientists first determine whether FDA is living up to its responsibilities.  If they are not, then get the politicos involved.  It scares me to see politicians playing scientist in the name of public safety and accountability.  It conjures up mental images of Galileo’s trial before the Roman Inquisition.  Congressman Ed Markey–Scientific Inquisitor?

Sphere: Related Content

Comments (1)

Interim Analysis Part 1: Role of the Data Monitoring Committee

For this month’s PCE column, I wanted to say something about interim analyses. They’re in the news nearly every day, attesting to their importance, and yet they are not well understood by most people in the industry, the academic community or lay public. Problem is, there’s a lot to say about interim analyses, and I don’t want to short-change the topic. My compromise is to discuss the Data Monitoring Committee (or DMC) this month and save other interim-analysis topics for a future article(s). The primary references are FDA’s March 2006 final guidance on DMCs and EMEA’s (i.e. CHMP) 2005 guideline on DMCs. If you are going to pick only one of these documents to read, I can recommend the EMEA doc for its brevity and the FDA doc for its thoroughness. I would imagine that ICH has some interest in providing a harmonized guideline, but they apparently have higher priority issues to tackle.

DMCs were first used in the 1960s in NIH-sponsored clinical trials. The rationale for the DMC then (when they were primarily referred to as Data and Safety Monitoring Boards, or DSMBs) was the same as it is now: minimize trial operational biases by using sponsor-independent personnel to monitor study integrity and subject safety. But you might have heard the term Independent DMC. Sounds redundant according to the aforementioned rationale, doesn’t it? It’s not. At some point during evolution of the DMC, probably when industry started to sponsor large clinical trials and senior executives began getting cryptic recommendations on their most important investigational drugs from scientific groups without ties to industry, sponsors decided that it probably wasn’t in their interests to let independent DMCs run their pivotal-trial interim analyses. At that time, industry sponsors relaxed the composition of DMCs to include their own employees. Sometimes these employees were behind a “firewall” to minimize study bias. However, it was not unusual for sponsor employees directly involved in trial design, conduct and final analysis to supervise DMC activities, particularly for non-pivotal studies.

Fairly recently, though, the pendulum has swung back towards industry-sponsor independence for DMC members. The reason is simply that investigators in larger trials with “hard” outcomes (e.g. mortality) didn’t look kindly upon industry sponsors as the primary overseers of trial subject safety and study integrity, and neither did regulators. Quite frankly, had industry enjoyed a sterling reputation during the 1990’s, I have little doubt that the pendulum would have stayed where it was. The call for independent DMCs (and independent steering and endpoint committees) is simply another manifestation of the public’s general distrust of anything the industry sponsors. In any case, thus was born again the truly independent DMC, a DMC composed entirely or nearly entirely of non-sponsor personnel. When the sponsor has a member on an independent DMC, he or she usually plays an administrative or oversight role only, without a vote and with limited access to data.

FDA describes factors leading to the increasing use of DMCs today:

• The growing number of industry-sponsored trials with mortality or major morbidity endpoints;
• The increasing collaboration between industry and government in sponsoring major clinical trials, resulting in industry trials performed under the policies of government funding agencies, which often require DMCs;
• Heightened awareness within the scientific community of problems in clinical trial conduct and analysis that might lead to inaccurate and/or biased results, especially when early termination for efficacy is a possibility, and need for approaches to protect against such problems;
• Concerns of IRBs regarding ongoing trial monitoring and patient safety in multicenter trials.

Implicit in the above factors is increasing pressure on industry sponsors to relinquish more of their direct control over clinical-trial design, conduct and analysis. Pressure to use DMCs is just one reflection of this bigger trend. Independent steering committees, publication committees, endpoint committees, etc are others.

There is no U.S. law mandating use of DMCs, except in the emergency-use setting, when informed consent of the participant is excepted [21 CFR 50.24(a)(7)(iv)]. FDA recommends use of a DMC for: “any controlled trial of any size that will compare rates of mortality or major morbidity…DMCs are generally not needed, [however], for trials at early stages of product development.” Let’s just spend a moment on this point. Is FDA not recommending a DMC for most studies because a DMC would not be useful, or because a DMC is not worth the additional effort necessary to implement given its value in early-phase studies? I believe that FDA’s recommendation is based on the value proposition. In other words, this is FDA’s judgment, based on their sense of the value of a DMC to the study sponsor. IRBs, investigators and other regulators might well have differing judgments from FDA. It is crucial for a sponsor to consider these other opinions prior to determining finally whether to implement a DMC, regardless of phase of development. Indeed, phase of development has different meanings for different therapy areas and study designs. [Example: A Phase 2 oncology trial might have a mortality endpoint, whereas a Phase 2 hypertension trial likely will have a blood pressure endpoint.]

FDA goes on to refine the decision drivers for implementing a DMC:

• The study endpoint is such that a highly favorable or unfavorable result, or even a finding of futility, at an interim analysis might ethically require termination of the study before its planned completion;
• There are a priori reasons for a particular safety concern, as, for example, if the procedure for administering the treatment is particularly invasive;
• There is prior information suggesting the possibility of serious toxicity with the study treatment;
• The study is being performed in a potentially fragile population such as children, pregnant women or the very elderly, or other vulnerable populations, such as those who are terminally ill or of diminished mental capacity;
• The study is being performed in a population at elevated risk of death or other serious outcomes, even when the study objective addresses a lesser endpoint;
• The study is large, of long duration, and multi-center.

FDA’s decision framework makes a lot of sense to me. The bottom line to ask yourself as a sponsor: Would a DMC help mitigate risk to subjects, where the potential risk to subjects is potentially only marginally outweighed by potential benefit? Would a DMC help maintain study integrity? If either of these two questions is answered by “Yes” strongly consider using a DMC.

So, who should be on the DMC, and what is the correct number of members? To the latter, keep the number as small as feasible while making sure to have the appropriate expertise on board. FDA says three is a minimum number. I would say that three is generally too few; four provides a margin of safety to cover unforseeable absences. These numbers don’t include personnel necessary to administer all DMC activities (i.e. logistics). The DMC constituency should include: a Chair (who can appoint the other members and is responsible for communications beyond the DMC), one or more statisticians (to perform and/or assist in interpetation of data analyses), clinicians (therapeutic area and/or clinical research experts), and others. Among the “others” to consider, FDA suggests epidemiologists and non-scientists with interest in the study outcome. I consider these reasonable suggestions. Finally, it’s important to assess potential conflicts of interest when deciding whom to appoint to a DMC. Here’s FDA suggestion for doing this:

• Ensure that those with serious conflicts of interest are not included on the DMC;
• Provide disclosure to all DMC members of any potential conflicts that are not thought to impede objectivity and thus would not preclude service on the DMC;
• Identify and disclose any concurrent service of any DMC member on other DMCs of the same, related or competing products.

Once a sponsor determines to form a DMC (and who will administer it–for a large pivotal study strongly consider outsourcing DMC administration), the first step is to create a DMC Charter. The Charter is essentially a contract between the DMC and the sponsor. It describes the DMC mission, membership, and all operating procedures. One of the key parts of this Charter is how data will be shared between the sponsor and the DMC.

FDA describes how this should operate ideally. Basically, the sponsor and its personnel should remain blinded to all comparative data in a blinded trial: “We recommend that any part of the interim report to the DMC that includes comparative effectiveness and safety data presented by study group, whether coded or completely unblinded, be available only to DMC members during the course of the trial, including any follow-up period—that is, until the trial is completed and the blind is broken for the sponsor and investigators.” The way this accomplished easily is by creating a sponsor-indepedent data analysis group (another reason why outsourcing is a good idea). Short of this, create a data-analysis group within the sponsor that isn’t otherwise involved with the study design or conduct. Read the rest of this entry »

Sphere: Related Content

Comments (1)