Tim Hopper came to my attention through John D. Cook our first interviewee for Profile in Computational Imagination. I asked John to recommend someone that he knows and respects as having computational imagination, he nominated Tim. Tim's professional interests range from the History of Mathematics to Operations Research, Data Science and developing software for start-ups. Read on to find out more about Tim, his professional pathway and his strong views on distributed software development teams.
M: Imagine a classroom full of bright and inquisitive college students about to decide on their college major. As succinctly as possible describe to them the domain that you work in.
T: I currently build tools for data analysis. I write software that will do complex calculations to help analysts and engineers find hidden structure in data. For example, feed a bunch of documents into my software, and it will summarize the topics found across those documents.
M: What still interests, even excites you about your chosen domain?
T: One thing I love about my field is that there are an endless supply of things to learn. Godel proved that there is essentially no end to mathematics! We can always come up with new systems and find new proofs. Between undergrad and graduate school, I had five semesters of linear algebra alone, but there is so much more linear algebra I haven't seen yet. For someone with a naturally curious personality, that's a fun place to be. On the software side, there are constantly new developments: new languages, new libraries and new tools for helping development. But even if I only programmed in a single language for the rest of my life, I think I'd be learning to program until the day I die.
M: What kinds of problems do you solve?
T: At one level, I solve the problem of translating statistics research papers into algorithms and algorithms into code. This starts by carefully understanding notation and equations in the papers; that sometimes requires filling in tedious mathematical details. At another level, I solve lots of routing software engineering-related problems: tracking down bugs, structuring code, designing API, serializing data, setting compiler flags, writing helpful unit tests, etc.
M: What are the most gratifying outcomes from work that you have done?
T: I recently started testing code that I have been developing for three months. After many math and engineering challenges, I was able to run the algorithm on a dataset and detect meaningful latent structure! Especially after spending so much time in the weeds (e.g. finding off-by-one errors and trying to optimize loops), it's exciting to pass in some data and see sensible results pop out.
M: Did you realize early on that you wanted to focus on this domain? What attracted you? If not, how did you get turned in this direction?
T: No. My path to where I am today has been roundabout. I started college thinking I wanted to be a physics professor; I didn't really know what that meant, but I knew I liked physics in college. I left college knowing I didn't want to be a computer programmer and was planning to be a math historian! I was admitted into the History of Mathematics PhD program at the University of Virginia.
My first semester at UVA was mostly pure math classes (abstract algebra, point-set topology, and measure theory). Although I intended to focus my research on the development of applied mathematics in the 20th century, my highly theoretical coursework made me second guess my intention of becoming an academic; I was itching to do something that wasn't entirely cerebral. After a notable quarter life crisis, I decided to leave UVA and entered a PhD program in Operations Research at the North Carolina State University.
I had discovered operations research as a discipline while browsing PhD programs as an undergrad. Since I've always been interested in efficiency and mathematics, I was excited to find out that there was a field that combined the two so intimately! Operations research seemed like a great way to combine my love for math with my urge to do something more than solve textbook abstract algebra problems.
I started my OR PhD program in the fall of 2010. At the same time, "data science" was exploding as a discipline. At the end of my first semester, I read Drew Conway's AMA on data science and was fascinated by the new applications of math and computation to real life problems and new ways to use algorithms to make better decisions.
Because my graduate program was multi-disciplinary, I was about to focus my coursework on topics relevant to this field of data science: machine learning, graphical models, algorithms, stochastic processes, linear algebra, etc. At the same time, I found a renewed interest in programming and was teaching myself R and Python.
In the summer of 2011, I was an R&D intern on the algorithms team at Kiva Systems in Boston. Working there firmed up my growing desire to use math and computation to solve real world problems. The next summer (after two years at NCSU), my PhD adviser decided to move to Michigan. I decided to cut my losses, get a masters degree, and find a job in industry.
M: What were some key milestones along the journey to where you are today professionally?
T: Though I had taken some computer science classes in college, I wasn't particularly interested in computer programming other than as a tool to help me do calculations in math and physics classes. The summer after my junior year of college, I was accepted to a math REU at Rochester Institute of Technology. When I met my summer research adviser, the first question he asked me was how I was at programming!
M: Cost-per-Observation is trending toward zero creating "Big data." However, this trend is unevenly distributed. What data is still missing or too expensive in your field? Put another way, what would you like to be able to measure that you currently don't?
T: Though not in my own field, I wish it were cheaper to collect data about the more mundane things in everyday life. I dream about the kinds of tools that could be built around data like "exactly what food is in my pantry/fridge".
M: What does your tool chain look like?
T: Currently, I spend most of my time with my 15" Retina MBP. When I'm at home, I connect it to dual 27" displays. Outside of my home, I often use my iPad as a second display with Duet. When I'm doing development, I spend most of my time in Sublime Text 3(with _lots_ of customizations and packages), iTerm2, IPython Notebooks, and GitHub.
I'm thankful that I don't work directly with cloud servers at the moment, although my work relies on a number of cloud services. We do continuous integration with Travis CI, store all our code on Github, release code on Anaconda.org communicate over Slack and Google Hangouts.
M: Do you build your own custom software? [none, some, all] - If some or all, what drives your need to do custom software development?
T: In some sense, all I do is build custom software. At the same time, all the software I build relies on tons of libraries written by others.
I have a bad habit of writing hacky scripts to take care of mundane tasks. For example, I have an always-on computer at home that watches for certain files to change on Dropbox and takes action accordingly. Tools like this are sometimes very useful, but can often be a time sink as I maintain and debug them.
M: What recent advancements in tools, if any, do you have high hopes for?
T: I'm excited by all the tools that are making it easier and easier for software teams to work in a distributed fashion. I think we already can take Github, Slack, Google Hangouts, and Trello for granted, but I'm excited about how they're making physical location less of an issue for collaboration. I'm confident we're in the early years of tools like this; I suspect they'll keep getting better and better.
M: The Silicon Valley start-up scene seems strongly focused on in-person teams all sitting in the same open office room - so despite being the source of tools that you mention that enable distributed teams they largely don't do distributed development. Tell me more about your experiences with distributed software development teams. Why should we care about distributed teams and the technology that supports them?
T: Three years ago, I thought remote work wouldn’t interest me any time soon. I started to be open to the possiblity when working at a large company that was partially distributed. I ended up functioning like a remote employee: 95% of my interactions with colleagues were virtual or on the phone. I decided that if I could do that from an office, I could do it just as well at home.
After that, I joined Parse.ly where the entire product team is distributed. Andrew Montalenti, the CTO at Parse.ly, spends a lot of time thinking about how to make distributed teams work well; he recently wrote that they view “a distributed team is an asset, not a problem to be managed”. At Parse.ly, our code was kept and reviewed on Github. We used Flowdock for realtime team communication, Yammer for daily updates from each team member, Floobits for pair programming, Google Hangouts for 1–1 and group meetings, and Trello for managing projects.
As you hear about the tools we used for distributed work, you realize these are the exactly same tools being used by co-located teams in 2015!. Of course, collocated teams rely on collocated meetings, hallway interactions, and looking over one shoulders. However, I would guess that most teams in the software space are doing a huge amount of communication through their keyboards and in the cloud (even when those messages end up at the desk next to them). With the right commitments and strategies from management, many teams could operate in a remote model and be no less effective (or no less ineffective, as it may be).
Working at Parse.ly confirmed my interest in remote working; after Parse.ly, I moved to another (though much smaller) distributed team. I love it. I love the flexibility it grants me. I love the time and stress saved by not commuting. I love having great coffee as I start work in the morning. I love the quiet work environment. I love my work not dictating where I live. I love not living in Silicon Valley.
At this point, I have a hard time imagining sitting in an office five (or more) days a week. Moreover, I have a hard time understanding how others deal with the constant barrage of distractions. Several years after leaving my last “office” job, I vividly remember day after day of frustration with Constant Throat Clearer and Annoying Laugher and Incessant Talker. I’ve sat near similar folks in every office I’ve been in, and I haven’t even been in an open floor plan office! I have considered the possibility that I am just Excessively Irritable Guy, but I sincerely don’t know how people do mentally taxing work (like software development and data science) in a room with constant audible distractions. Despite Joel Splosky’s warnings most physical offices seem to have these distractions.
(I won’t even go into memories of terrible food in the cafeteria, horrific coffee in the break room and the HVAC system that I had zero control over, etc.)
I routinely hear two responses when I tell people I work from remotely. First, people think it sounds lonely or feel like they’d go stir crazy working from. This is possible! Personally, I overcome it by being slightly more disciplined to interact with friends during the week; for example, I try to meet up with friends for lunch or breakfast regularly. At the same time, it is important to note that remote worked doesn’t have to mean home work. The only physical tool I need to work is my laptop. I routinely work half days from local coffee shops and cafes, sometimes with remote working friends. I also take advantage of my flexibility and travel to visit friends for a week and work while I am there.
The second response is that people don’t understand how a remote team can effectively communicate. I have already mentioned that many teams in the software field are using the exact same communications technologies that distributed teams use. Much of their communication is already going through the cloud. People also forget how dysfunctional communication is among many co-located teams. Petty disagreements keep key players from talking. Office politics keep team A from sharing with team B. Lazy managers avoid stopping by to talk with their reports and vice versa. Sure, distributed teams have communications challenges, but every team has communications challenges. The plethora of tools for synchronous communication in 2015 has greatly reduced the communications challenges distinct to distributed teams.
M: If you are daydreaming about the next wave of tools for distributed teams what is on your wish list?
T: I really do think the the tools available to distributed teams today are quite good. Perhaps I’m not imaginative enough, but I rarely feel like the tools available to me are inadequate for distributed teams in particular.
Unfortunately, one of the weakest tools for distributed teams is still video chat. Google Hangouts are an extraordinary improvement over what we had just 5 years ago, but they remain the cause of much frustration for distributed teams (with login issues, call dropouts, wonky controls, etc). Hopefully we will get something with the simplicity and reliability of Facetime that works for groups. Having direct integration with Slack and other chat tools would be even better! Google may be getting there incrementally.
I would also love to see tools that make it easier to share sketches and handwriting among groups. For example, I would love to be able to use my iPad as a whiteboard in group chat. Similarly, I’m looking forward to easier tools for remote pair programming (and otherwise sharing code). There are a number of pair programming solutions, but they’re often difficult to get running. I am eager to see what the future brings for collaboration in Jupyter notebooks. Of course Google Docs-style collaborative editing of notebooks would be awesome, but I’d love to see things like presenter mode where a host could walk a (remote) group through a notebook without just screen-sharing.
I’m encouraged about the success and explosion of Slack as a communications tool. We will see a lot of improvements in that space, I’m sure. There are opportunities to build intelligent tools on top of these mediums. For example, Slack could auto-summarize the discussion that has been had since you were last logged on. Similarly, it could do some sort of discrimination algorithm to automatically tag/highlight/breakout the various conversations happening in a channel.
More than technological transformations, I am eager to see managerial transformations that will make distributed teams more viable going forward.
M: Looking around your domain what has surprised you during the past couple of years?
T: The rapid rise of deep learning and the return of neural networks. When I was studying reinforcement learning in grad school only a few years ago (2010-2012), neural networks were largely regarded as a novelty of the past. When I started hearing about deep learning towards the end, I tried to find papers about deep learning for reinforcement learning; I only came across a class project done by Stanford students. Only three years later, I regularly see papers coming out on the topic.
M: What book or books do you find yourself turning to repeatedly?
T: A huge proportion of the questions that I have day-to-day are answered by StackOverflow. I have recently gotten rid of most of my printed technical books since I typically find myself Googling questions instead of thumbing through books to find answers.
M: What blogs, if any, do you follow?
T: I've actively been decreasing the number of blogs I subscribe to. John Cook's blog is the only one in the realm of data and software that I read faithfully. These days, I'm most likely to read a blog post if I see it tweeted by a number of people I respect.
M: What websites do you frequent to stay current professionally?
T: Twitter! It would be hard to overestimate the value of following a bunch of smart, technical people over the last 5 years. Twitter keeps me apprised to what tools people are using, what things people are reading, what conferences people are attending. It gives me the opportunity to quickly bounce ideas off some of the brightest minds I know. I often end up at other websites related to my profession, but I almost always find them from Twitter.
M: Many thanks to Tim for sharing his professional path and perspectives.