About - Replicate! Statistical planning and analysis

Why “Replicate!”?

In research, “replication” is gathering data a second time, analyzing it, and getting conclusions equivalent to the first time. We are in the midst of a replication crisis, particularly in the health sciences, and statistical practice plays no small part in this.

Statistical inference theory imagines that the analyst fits one predetermined model and reports the result. In reality, things aren’t so simple: we usually fit several models, or consider various possible assumptions; we learn the model from the data, as of course we should. But how far then do we invalidate statistical assumptions? Is there a way to carry out rich exploration, and still yield reliable results?

Yes, it is possible, if the analyst is careful, disciplined, and creative. To my mind, the predominate value of statistics is to predict the future; a statistical result that can’t replicate is worthless–or even worse than useless, since it can give false confidence to decision-makers.

I named my consultancy “Replicate!” in order to keep replication uppermost in my mind at all times.

Why might I be the best person for your needs?

Value

I’m an evangelist for the value of probabilistic thinking and learning from data in decision-making. I started a consultancy because I want to spread that value more widely. If I can’t add value, I’ll go find another corporate job. If I see opportunities to charge you less money, I’ll bring them to your attention.

Context

I’ll always ask who the stakeholders are and what the consequences of various outcomes are. I’ll interpret results in light of those.

Maximizing clarity

I find that meeting Nature where she is usually brings a team greater clarity. For instance, I’m reluctant to bin continuous data, preferring instead to find ways to work with continuous data effectively. Greater clarity accrues from two sources:

Greater statistical efficiency, as information is not eliminated
Greater focus on the real aspects of the process at hand, rather than creating an interceding “processed” layer.

In my experience, few other statisticians actively resist binning data. Why not? Working with continuous data raises daunting complexities. What of outliers? Nonlinear relationships? Non-Gaussian distributions? Over the years I’ve curated tools and strategies to deal with these issues.

Orientation

One doesn’t hire a divorce lawyer to write up a patent. Statisticians also have different orientations and priorities attuned to their work areas. A large proportion of statisticians work with clinical trials. If you go looking for a statistician, you’re likely to find one who works with clinical trials.

However, the statistical priorities of clinical trials–which involve presenting conclusions to a skeptical audience–are different from those of, say, optimizing an industrial process or estimating the characteristics of an assay.

I’ve run across many potential clients working in areas other than clinical trials who have had frustrating experiences with statisticians who were grounded in clinical trial practice. We’re not all the same! Find the right person for the right job.

Skills

Technical
- Bayesian methods
- Semiparametric modeling (smoothing, splines, finite mixture modeling)
- Machine learning (single-hidden layer neural nets, SVM, gradient boosting, clustering, …)
- Reliable exploratory multivariable modeling, especially per Frank Harrell Jr.’s Regression Modeling Strategies
- DoE (design of experiments): fractional factorial designs, response-surface designs, …
- Variance components / mixed-effects models
- Quality control lot acceptance procedures, per “Acceptance Sampling in Quality Control“, Second Edition, Schilling and Neubauer.
- Statistical process control (Shewhart charts, cumulative sum (CUSUM) charts, exponentially weighted moving average (EWMA) charts), such as in “Cumulative Sum Charts and Charting for Quality Improvement“
Interpersonal / conceptual
- Commitment to listen carefully to client
- Commitment to comprehend the problem context
- Aptitude for clarifying and making operational a client’s sometimes-nebulous goals
- Aptitude for thinking through a process to address contingencies, such as in a validation protocol

Biography

My statistical origin story says a lot about who I am and how I work.

I’ve always been quantitatively oriented, and through high school I imagined I would be a physicist in academia. I attended the University of Chicago in part for their strong physics program (in my mind, at least–that was the location of the first sustained nuclear reaction). However, during my freshman year, I realized that physics didn’t actually hold my attention.

I bounced among majors–economics, sociology, public policy studies–and during my senior year I still didn’t have a clear idea what I would do. Quantitative methods were a common thread, however, and I approached my favorite statistics professor, Stephen Stigler, for advice. He suggested that I pursue graduate work in statistics. For some reason that had not been thinkable before, but it was like a light going off. I was a statistician, in fact I had always been a statistician. I was fascinated by the epistemological power that statistics offered, particularly probabilistic modeling.

I attended the University of Minnesota, the only one of my colleagues with neither a mathematics nor a statistics degree. I had studied math but I was always the last to finish a proof. Nevertheless, I listened in class, and I continually asked myself, “Where does the professor’s point fit into the larger landscape? What does this do for us?” Somehow, I developed a keen eye for finding the right analytical approach for a given problem. In a consulting class taught by Professor Douglas Hawkins, I found myself the first to put my hand in the air.

I decided to pursue a Ph.D. and then found an industry job at the infectious disease diagnostics division of Becton Dickinson (now BD). The division’s product portfolio was incredibly wide, from plated media to rapid immuno-based assays to refrigerator-sized incubation instruments to DNA-detection systems (at that time they used a proprietary system other than PCR). We were a small team with the remit to get involved just about anywhere we could add value. Initially, we were assigned to an area, and we followed products throughout their development: early optimization, quality control methods, validation studies, clinical studies, and customer complaint investigations. Later we refined the organization to separate support for development from clinical operations, and I opted to stick with development. For a curious statistician, It was a veritable garden.

Being a small team, we rarely had experts we could turn to. If I didn’t have a ready answer, I had to figure it out–there was no one else at the table. It was a fabulous place to gain experience and learn about many different areas.

I was particularly fascinated that the instrumented systems returned diagnostic answers without any human involvement after loading the sample. I read up on every machine learning method I could, and attended conferences.

Generally, I was gullible for any statistical method that made a bold claim. I did a lot of reading, and I finally realized that many claims didn’t hold up. Still, I think I averaged roughly one valuable new tool in my toolbox per year, and after a number of years, that’s substantial!

While I loved working at BD, management changed over time and the available turf for statisticians gradually shrank. I moved to Novartis to support the development of prototype companion diagnostics, and conducting validation studies for regulatory permission to include them in trials. After a few years, Novartis decided to farm this development out to diagnostic partners, and I moved into retrospective biomarker analysis of clinical data (Phases II and III). Here I refined and practiced strategies for exploratory modeling efforts whose results would Replicate!

Finally, I left Novartis to start a freelance business, and here I am, still an evangelist for the value of statistics, hoping to provide value wherever I can manage to do so.

From my first days in graduate school until today, I’ve found an aptitude for understanding what different analytic tools do for us. I’ve found my colleagues have often asked my opinion for sanity checks, or for the best analysis strategy in a challenging situation.

Software freedom

Information is important in our lives, correct? Then the digital devices that we use to create, store, and communicate information are also important. In fact, these devices should support your agency and allow you to avoid vendor lock-in or forced obsolescence. I’ve come to believe that using software that supports user freedom, as defined by the Free Software Foundation, is the strongest bulwark in support of user agency. The US should adopt legal protections for online privacy, but until we do, please don’t imagine that you are powerless against Big Tech and privacy invasion.

Nowadays GNU R is a popular choice software for statistical analysis. Many fail to realize that R is an official member of the GNU Project, the project to develop a Free (as in Freedom) operating system and software stack.

If you work with me, you might wonder why I use Nextcloud for cloud storage and file sharing, and BigBlueButton via meet.coop for video conferencing. Or why I won’t install the LinkedIn Android app, or don’t have a working copy of Microsoft Office. Or why I have a de-Googled Android phone. Well, this is why.

Until I find it impossible to do otherwise, James Garrett, LLC will use Free (as in Freedom) Software.