Building an Academic Paper Discovery System

As a researcher, one of the most important and under-emphasized skills is how to stay on top of the latest research. New papers are constantly being published. In 2018, 2 new papers were published every minute on PubMed in the biomedical field alone. That is a subfield of academia and only one of a couple different digital archives. The actual rate of publication is likely 10x that.

With so many papers being published, it can be difficult to sift through them all and find the papers that matter to you. There are thousands of journals, and with all research becoming more and more specialized, there is no single journal that you can follow which will have everything you care about in your field and nothing you don't.

So how do you manage it?

Alternatively, how do you eat an elephant? (or if you are a fan of the book Bird by Bird by Anne Lamott, how do you write a bird report in a day?)

One-piece at a time.

Many large, insurmountable tasks can be accomplished through one step at a time, and a persistent habit. When it comes to paper management, this is the method that has worked for me.

The rest of this post outlines the daily and weekly habits that have allowed me to stay on top of the latest research. This is not an approach for how to dive into a new topic, rather it is the tools and tricks that I use to stay up to date on research that may help inform and improve my own scientific work. My process can be broken into 3 parts, Discovery, Management, and Consumption which we will tackle one at a time.

Discovery

The first step is how to stay informed of new research. For discovery, there are 2 approaches that I use that we will go through here

STORK - your one stop shop for paper alerts

I have been using the STORK service for the last couple of years and have greatly appreciated their approach. Like any paper alert system, you put in the key words or researchers that you want to keep up to date on and then the stork system will aggregate all new papers that come out with those key words and email you the digest. Because it is a third party company, it is database agnostic which helps when your field is spread across different journals and databases. You can choose the frequency with which you receive your emails, but I recommend getting a daily digest. This serves both as a daily reminder to look for papers and to read your paper of the day (we will cover this in consumption below). Additionally, do not think of your keyword list as a static one time set up. Our research interests change, new researchers emerge in the field, while others may drift to focus on other projects. All of this is natural but will mean you need to change what keywords you focus on. Finally, I use the free version of STORK, they have a paid version as well

Twitter

Ah, Twitter. There isn't a more misunderstood social media service out there in my opinion. For those that don't know, Academic Twitter is one of the most useful resources when used correctly. And the first step of using twitter correctly is to ruthlessly curate who you follow. If need be, create a separate account, use twitter lists, or use the mute button if you feel obligated to follow someone due to social pressures. The simple truth is you want the ability to scan through EVERY tweet that appears on your timeline and the only way to accomplish that is to limit who has the ability to show up on your timeline to 50-100 accounts total. (Side note: I recommend this approach for all your interests that you want to follow on twitter, not just the academic ones). These accounts are often researchers who use twitter to disseminate their own and the field's research. This is both useful for the papers themselves, and for the discussion surrounding the paper. I have had valuable exchanges on twitter both publicly in the comments of a paper's tweet, and privately in the author's DMs. Similar to the STORK daily digest, you should try to check twitter at least daily. And to reiterate, the only way you will be able to maximize the value of twitter is through this aggressive curation.

Note too that at no point do you need to tweet out content yourself. Tweeting can be a valuable way to get feedback on ideas and to share your research, but I am no expert on that aspect of twitter, I just use it as a way to discover new content.

Curators

If you are lucky, you can find groups or individuals who will curate research for you. These are invaluable if you can find ones that are related to your research. A few examples that I have followed at various points are:

For these sources, you will be limited to their publishing schedule, but they are already taking a first step of wading into the paper deluge, so take advantage of it!

Management

Now that you have links to all these cool papers you need a way to manage them. I recommend a reference manager in general (this will also help when it comes time to write your paper), and will make my case for my personal favorite: Paperpile. But regardless of which reference manager you use, you first need to find the full paper

Collection

I highly recommend you find and save the pdf of your newly discovered papers WHEN YOU INITIALLY CLICK ON THEM. If you are unsure if the paper is worth your time, read the abstract first, but as soon as you know, download the pdf. Despite there sometimes being better reading experiences when read on the web (and one thing I like about paperpile is it automatically provides the link to the journal's webpage if you prefer the web reading experience), you want to have the pdf file. The biggest reason is that it ensures that the paper won't go behind a paywall later, either due to a change at the journal or a change in what you personally have access to. Having worked in industry, I did not have access to all the academic institutional journal licenses I have now, so finding the pdf is the most reliable way to ensure that I could read the full paper, and it is future proof should my institutional access ever change.

Given that, I want to plug a few different tools to help find the full pdf for those without institutional access.

The Google Scholar Button - this is a chrome extension which will search for papers based on the title or other information, super useful to quickly find referenced papers as well as to find non-paywalled versions
UnPaywall - this chrome extension automatically recognizes when you are looking at an academic journal and when a version exists that is not paywalled it both lets you know and provides the link
The Open Access Button - another chrome extension for finding open access versions of a paper, I don't know if or how it's different than Unpaywall, but both are free so I use both.
Emailing the author - when technology fails, I and many others have had good success with emailing the corresponding author and asking for a copy of the manuscript. I haven't had someone tell me "no", its normally either they happily respond with the paper or you never hear from them. As a plus, this can be the start of a conversation with a researcher in a related field

Storage - Paperpile

Once you have the pdf, you should import it into your chosen reference manager, which for me is Paperpile. I perform this task weekly, such that I will save the pdf to my desktop or downloads folder, then once a week or so do a large import into Paperpile. I do not bother with what the name of the pdf file is, nor do I worry about where I save it (hence the generic desktop or downloads folders). Paperpile, like most modern reference managers, will extract the pertinent info (title, authors, year, journal, etc) and display it in their interface, ensuring it is easily searchable. They will also rename the pdf and save it to your google drive account with a reasonable name ("first_author year title") and do some basic folder management, though, in reality, you should use their interface to find a paper (the google drive folder is a nice backup though).

Now it is not perfect, though I have found Paperpile to be 99% accurate, only occasionally failing to extract the relevant info, normally when the only pdf I could find is in a weird pre-print format, though most standard preprint formats import fine. If there is an issue, you can manually add the DOI number or preprint server link to the paper's metadata easily and Paperpile will then extract the relevant details from there. Finally, as with all modern reference managers, you have the ability to integrate it with a word processing system (in this case google docs) to auto-generate and format citations and bibliographies.

Cataloging

This is the most important step in managing your paper collection. About once a week, (preferably when you upload a bunch of papers at once), you should go through all uncategorized papers and catalog them with tags.

Yes Tags

Tags are by far the best way to catalog your papers, simply because it easily allows for you to have a paper in multiple categories. Now you can use folders if you want and Paperpile easily allows for a folder structure (which I use to manage projects), but tags offer the ability to both categorize by multiple concepts (for instance gait and vision) and to search by multiple categories, allowing you to find that one paper on methods and movement smoothness without having to think about what folder you saved that too (methods -> motor-> smoothness or motor->smoothness-> methods)? Tags are a more natural way of surfacing the paper you want. Plus when you start to embark on a new project, you can combine the tags you think may be useful and immediately be reminded of what you already know.

I recommend outlining the basics of your tagging system when you start but then allow for the tags to evolve as new papers and interests arise. One of the great features of Paperpile is the ability to tag multiple papers at once, which is really useful when migrating systems (I started with 1000 papers in a google drive folder structure that was just unmanageable).

Consumption

Finally, the most important part of any paper discovery system is actually reading the papers. I do all of the steps above without ever reading more than the abstract of a paper. I prefer to have a network of interesting papers at my fingertips so that I can start a new project from there rather than from a google scholar or Pubmed search.

In order to consume properly, you need to have a habit. Personally, I strive to read a minimum of one paper a day. It is the first thing I do each morning (whether I am working from home or in the lab. The paper should be tangentially related to something I care about but is not the same as trying to find an answer to a problem. As I said at the outset, this is about staying on top of the research, not starting from scratch. When I go through the tagging process, I will star the papers I want to read right now. Sometimes I grab papers because they will be good references later, sometimes it is an idea for a project I am not ready to start, sometimes it is a followup to a meta-analysis I had previously done that I'll expand on in the future. Starring the papers gives me one location to look at when choosing my daily paper to read.

The second step is that when you read a paper, you have to take notes. You want to distill the paper into a single page of bullet points and images if possible. and then you want to place those notes in reference to something. Paperpile has a built-in annotation tool that is pretty good. I have previously taken notes in a google doc. Recently I switched to using Roam Research and am enjoying easily being able to link back and forth with my notes. How I use Roam is another post, but if curious, I do recommend you check it out.

Conclusion

Having a good system for discovering, managing, and consuming academic papers is integral to my success as a researcher. The habit of reading a paper a day and the system to be alerted to new papers of interest has allowed me to be informed and inspired with each new bit of research I see. And while I may not see everything, I am able to read quite a bit of the literature, one bite at a time.