Breaking into data science
Doctoral programs in the language sciences are designed to groom students to become researchers, to work in a lab, to publish theoretically interesting research. The nature of the survivor bias problem in academia is exactly why it’s hard to figure out how data science works — you might feel like an outsider, like the process is impossibly opaque — and that’s because it is opaque. But it doesn’t have to be.
A bit more background on me: In 2016, I started applying for postdocs. I found the whole process dreadfully stressful — so many PIs (in my subfield at least) are really out to make you feel inadequate. I had one person tell me about all the other people he respected and all the other people he hoped would do a postdoc with him. Because my graduate program was in the midst of a major restructuring (two retirements and two departures in psycholinguistics), and because I was in the middle of an existential crisis, I apparently shouted into the void that is Twitter and somehow landed a job.
This is the first in a series of blog posts about dos and don’ts in the data science interview process. I’ll try to refer to as much to other blog posts as possible, but I’m tailoring this toward psycholinguistics folks to reassure you about the process and give you some tips I’ve got through lots of interviews.
Interviewing at Stitch Fix
Like I said, I shouted into the void. I’m really not kidding. I literally tweeted, asking if anyone was hiring in the natural language processing space. And I got a bite, in what felt like the weirdest turn of events (though it turns out that this can be an effective strategy to make contacts and get internal referrals). And that turned into a month-long process that I’m still surprised I made it through, precisely because I had no idea what I was doing. I had no idea about the buzz words that interviewers use (underlined), I didn’t know how many interviews there would be, nada.
One easy way to think about the data science interview process is to compare it to giving a talk within your department. By and large, your colleagues are out there to ask somewhat difficult, but not impossible questions. You’re talking about things you understand and are working with (and if it’s collaborative work, that you don’t know by heart, but have been writing about and talking about for a while).
Sending out your resume or CV. After I shouted into the void, I improbably emailed my CV to the coworker who responded to my improbable tweet. My CV! Goodness. For future applications, many months later, I would eventually craft a resume. More information on that later. Once my resume was forwarded the person who would eventually become my boss scheduled an “introductory call” to basically probe who I was and what I wanted out of a position.
The introductory call. There’s a lot of boiler plate that surrounds these. First, typically a recruiter (or sometimes the hiring manager, depending on how big the team is) will ask you about your background and what you’re looking for in a role. If you’re looking at data science positions, there’s a range of right answers, but a good strategy is to talk about things you like about about the company you’re talking to. If you think they might have a particularly interesting problem, you should tell them about that. If you’ve used their service or product, you should mention that. It could be as simple as, “I read a blog post about XYZ and I was really inspired.” You can even talk about why you want to leave your academic position — to do concrete work that matters, for example. In the US, Canada, the UK, etc., they’ll typically ask you whether you need visa sponsorship to work there. At the end of every call, you’ll be given a more boiler plate about next steps, which typically involve a technical interview (or, in my case, two).
The technical interview. In my case I had a fairly straightforward experience. I was interviewing for the recommendations team, so I was asked questions about how to make recommendations, how I could quantify different things, and how I would approach different problems. The ideal interview starts out with a simple question (e.g. how do we generalize from population X to population Y?) and escalates a bit toward more complex and nuanced questions (what methods could you use to do this), or a bit of suggestions about alternative methods. Whether this interview has a strong coding component varies by company I think — I’ll talk about other ways that this interview can go in later posts. But at Stitch Fix I actually had two technical interviews, I think because I was pretty junior (still in graduate school) and, nominally, because I wasn’t local and they would need to fly me out. I ended up getting a fairly similar interview to the first one, but asking slightly different questions, more methodological. Again, every call ends with the phrase next steps.
The onsite. Sometimes you get lunch! Sometimes they schedule you for first thing in the morning. At Stitch Fix I had most of the afternoon scheduled, something like 1-5pm. I talked with the hiring manager, several people closer to the team I would be working with, and someone who was mostly there to make sure I was a good person and fit the general philosophy about team members (“bright, kind, and goal-oriented”). I don’t remember a lot of what got talked about, but some of the interviews were about my research to date and tools I’d used, some were stats questions (e.g. how many degrees of freedom are there in an interaction? law of large numbers, etc.) and many others were building up to an approach (how do you estimate how good someone is at their job?). In all, I had no idea what to expect, but during my last meeting my future manager told me I’d have to talk with his boss, which was simultaneously a “fit” interview, and an informational one. I literally had this call in my hotel room at CUNY in 2016, right after a totally deflating postdoc “audition.”
Was that it?
I mean, sort of. I got a call the next day and there was a subsequent call to talk about the details of an offer letter. I really had no idea what to expect out of any of this — it was a whirlwind of numbers and figures specific to joining a pre-IPO (not yet public) startup and I didn’t know how to interpret any of them.
But the thing I want to emphasize here is that literally everyone who’s gone through a PhD in psycholinguistics can get through data science interviews. In all likelihood, you have a good handle on best practices for working with your data. You know when to log transform a variable, you know why some aspects of your data look the way they do. You’ve got tons of experience working with imperfect data (behavior!) and can think about concrete explanations for why the data look like they do. Your whole career has probably been training you for A/B testing and you didn’t even know it.