Sanyam Bhutani: Hello Sebastian, Thank you for taking the time to do this.
Sebastian Ruder: Thanks for having me.
Could you tell the readers about how you got started? What got you interested in NLP and Deep Learning?
Sebastian Ruder: I was really into maths and languages when I was in high school and took part in competitions. For my studies, I wanted to combine the logic of maths with the creativity of language somehow but didn’t know if such a field existed. That’s when I came across Computational Linguistics, which seemed to be a perfect fit at the intersection of computer science and linguistics. I then did my Bachelor’s in Computational Linguistics at the University of Heidelberg in Germany, one of my favourite places in Europe. During my Bachelor’s, I got most excited by machine learning, so I tried to get as much as exposure to ML as possible via internships and online courses. I only heard about word2vec as I was finishing my undergrad in 2015; as I learned more about Deep Learning at the start of my Ph.D. later that year, it seemed to be most exciting direction, so I decided to focus on it.
Sanyam Bhutani: You started your research right after graduation.
What made you pick research as a career path instead of the industry?
Sebastian Ruder: After graduating, I was planning to get some industry experience first by working in a startup. A PhD was always something I had dreamed of, but I hadn’t seriously considered it at that point. When I discussed working with the Dublin-based NLP startup Aylien, they told me about the Employment-based Postgraduate Programme, a PhD programme that is hosted jointly by a university and a company, which seemed like the perfect fit for me. Combining research and industry work can be challenging at times, but has been rewarding for me overall. Most importantly, there should be a fit with the company.
Sanyam Bhutani: You’ve been working as a researcher for 3 years now. What has been your favorite project during these years?
Sebastian Ruder: In terms of learning, delving into a new area where I don’t know much, reading papers, and getting to collaborate with great people. In this vein, my project working on multi-task learning at the University of Copenhagen was a great and very stimulating experience. In terms of impact, being able to work with Jeremy, interacting with the fastai community, and seeing that people find our work on language models useful.
Sanyam Bhutani: Natural Language Processing has arguably lagged behind Computer Vision.
What are your thoughts about the current scenario? Is it a good time to get started as an NLP Practitioner?
Sebastian Ruder: I think now is a great time to get started with NLP. Compared to a couple of years ago, we’re at a point of maturity where you’re not limited to just using word embeddings or off-the-shelf models, but you can compose your model from a wide array of components, such as different layers, pretrained representations, auxiliary losses, etc. There also seems to be a growing feeling in the community that many of the canonical problems (POS tagging and dependency parsing on the Penn Treebank, sentiment analysis on movie reviews, etc.) are close to being solved, so we really want to make progress on more challenging problems, such as “real” natural language understanding and creating models that truly generalize. For these problems, I think we can really benefit from people with new perspectives and ideas. In addition, as we can now train models for many useful tasks such as classification or sequence labelling with good accuracy, there are a lot of opportunities for applying and adapting these models to other languages. If you’re a speaker of another language, you can make a big difference by creating datasets others can use for evaluation and training models for that language.
Sanyam Bhutani: For the readers and the beginners who are interested in working on Natural Language Processing, what would be your best advice?
Sebastian Ruder: Find a task you’re interested in for instance by browsing the tasks on NLP-progress. If you’re interested in doing research, try to choose a particular subproblem not everyone is working on. For instance, for sentiment analysis, don’t work on movie reviews but conversations. For summarization, summarize biomedical papers rather than news articles. Read papers related to the task and try to understand what the state-of-the-art does. Prefer tasks that have open-source implementations available that you can run. Once you have a good handle of how something works, for research, reflect if you were surprised by any choices in the paper. Try to understand what kind of errors the model makes and if you can think of any information that could be used to mitigate them. Doing error and ablation analyses or using synthetic tasks that gauge if a model captures a certain kind of information are great ways to do this.
If you have an idea how to make the task more challenging or realistic, try to create a dataset and apply the existing model to that task. Try to recreate the dataset in your language and see if the model performs equally well.
Sanyam Bhutani: Many job boards (For DL/ML) require the applicants to be post-grads or have research experience.
For the readers who want to take up Machine Learning as a Career path, do you feel having research experience is a necessity?
Sebastian Ruder: I think research experience can be a good indicator that you’re proficient with certain models and creative, innovative to come up with new solutions. You don’t need to do a Ph.D. or a research fellowship to learn these skills, though. Being proactive, learning about and working on a problem that you’re excited about, trying to improve the model, and writing about your experience is a good way to get started and demonstrate similar skills. In most applied ML settings, you won’t be required to come up with totally new ways to solve a task. Doing ML and data science competitions can thus similarly help you demonstrate that you know how to apply ML models in practice.
Sanyam Bhutani: Given the explosive growth rates in research, How do you stay up to date with the cutting edge?
Sebastian Ruder: I’ve been going through the arXiv daily update, adding relevant papers to my reading list, and reading them in batches. Jeff Dean recently said during a talk at the Deep Learning Indaba that he thinks it’s better to read ten abstracts than one paper in-depth as you can always go back and read one of the papers in-depth. I agree with him. I think you want to read widely about as many ideas as possible, which you can catalogue and use for inspiration later. Having a good paper management system is key. I’ve been using Mendeley. Lately, I’ve been relying more on Arxiv Sanity Preserver to surface relevant papers.
Sanyam Bhutani: You also maintain a great blog, which I’m a great fan of.
Could you share some tips on effectively writing technical articles?
Sebastian Ruder: I’ve had the best experience writing a blog when I started out writing it for myself to understand a particular topic better. If you ever find yourself having to put in a lot of work to build intuition or do a lot of research to grasp a subject, consider writing a post about it so you can accelerate everyone else’s learning in the future. In research papers, there’s usually not enough space to properly contextualize a work, highlight motivations, and intuitions, etc. Blog posts are a great way to make technical content more accessible and approachable.
The great thing about a blog is that it doesn’t need to be perfect. You can use it to improve your communication skills as well as obtain feedback on your ideas and things you might have missed. In terms of writing, I think the most important thing I have learned is to be biased towards clarity. Try to be as unambiguous as possible. Remove sentences that don’t add much value. Remove vague adjectives. Write only about what the data shows and if you speculate, clearly say so.
Get feedback on your draft from your friends and colleagues. Don’t try to make something 100% perfect, but get it to a point where you’re happy with it. Feeling anxiety when clicking that ‘Publish’ button is totally normal and doesn’t go away. Publishing something will always be worth it in the long-term.
Sanyam Bhutani: Do you feel Machine Learning has been overhyped?
Sebastian Ruder: No.
Sanyam Bhutani: Before we conclude, any tips for the beginners who are afraid to get started because of the idea that Deep Learning is an advanced field?
Sebastian Ruder: Don’t let anyone tell you that you can’t do this. Do online courses to build your understanding. Once you’re comfortable with the fundamentals, read papers for inspiration when you have time. Choose something you’re excited about, choose a library, and work on it. Don’t think you need massive compute to work on meaningful problems. Particularly in NLP, there are lot of problems with a small number of labelled examples. Write about what you’re doing and learning. Reach out to people with similar interests and focus areas. Engage with the community, e.g. the fastai community is awesome. Get on Twitter. Twitter has a great ML community and you can often get replies from top experts in the field way faster than via email. Find a mentor. If you write to someone for advice, be mindful of their time. Be respectful and try to help others. Be generous with praise and cautious with criticism.
Sanyam Bhutani: Thank you so much for doing this interview.