The Best Articles in Data Science
The most useful articles in Data Science from around the web—beginners to advanced—curated by thought leaders and our community. We focus on timeless pieces and update the list whenever we discover new, must-read articles or videos—make sure to bookmark and revisit this page.
Top 5 Data Science Articles
At a glance: these are the articles that have been most read, shared, and saved in Data Science by Refind users in 2023.
What is ...?
New to Data Science? These articles make an excellent introduction.
What is Reverse ETL? The Definitive Guide
Learn everything there is to know about Reverse ETL, how it fits into the modern data stack, and why it's different than ETL.
Learn Julia For Beginners – The Future Programming Language of Data Science and Machine Learning Explained
Julia is a high-level, dynamic programming language, designed to give users the speed of C/C++ while remaining as easy to use as Python. This means that developers can solve problems faster and more…
«So now the function is defined to take in only a string. Let us test this out to make sure we can only call the function with a string value»
An introduction to data science and machine learning with Microsoft Excel
Microsoft Excel is a powerful tool for learning the basics of data science and machine learning.
What is Hierarchical Clustering?
By Nagesh Singh Chauhan, Big data developer at CirrusLabs What is Clustering?? Clustering is a technique that groups similar objects such that the objects in the same group are more similar to each…
Trending
These links are currently making the rounds in Data Science on Refind.
Researchers Warn We Could Run Out of Data to Train AI by 2026. What Then?
AI trained on ever-more data has yielded ChatGPT and DALL-E 3. But research shows online data stocks are growing more slowly than datasets used to train AI.
«Another option is to use AI to create synthetic data to train systems. In other words, developers can simply generate the data they need, curated to suit their particular AI model.»
How real is the threat of data poisoning to generative AI?
A new tool has been created to poison image-output models like Midjourney and DALLE-2. Are text-output models like ChatGPT and Copilot next?
«Nightshade might trip up AI models now, he says, but future filtering techniques and generative model architectures will probably be able to swallow the poison with no ill effects. The same would presumably apply to facial recognition and deepfake algorithms, necessitating a new subset of arms race between the hacker and the hacked.»
The Growing Impact of AI on Data Science in 2023
Emerging AI trends such as natural language processing and reinforcement learning are set to bring in the next frontier of Data Science.
«This year, in particular, is expected to see the rapidly accelerated adoption of RL as businesses realize and harness its untapped potential.»
Short Articles
Short on time? Check out these useful short articles in Data Science—all under 10 minutes.
The Myth of Objective Data
When we view objectivity and subjectivity as opposites rather than complements, we distort the empirical realities of data collection.
«This despair helps my students recognize an apparently banal assignment as a real design situation. It teaches them that data is created, not found; and that creating it well demands humanity, rather than objectivity.»
The limits of our personal experience and the value of statistics
The world is huge; to get a clear idea of what our world is like, we have to rely on carefully collected, well documented statistics.
Same data, different stories: How to manipulate the graphs to support your narrative
How to shape the narrative with graph manipulation without sacrificing your credibility. The art of tweaking graphs to better support your story.
How Do They Know This?
An informative and apolitical new book reminds us that statistics are not always what they seem.
«But for the most part, official statistics are imperfect but good enough.»
Explaining base rate neglect
In a seminar for a team from an investment manager I described how base rates are often neglected when people are grappling with conditional probabilities.
Long Articles
These are some of the most-read long-form articles in Data Science.
Dashboards Are Dead: 3 Years Later
What’s the purpose of dashboards in 2023?
Spinning Data into Thought
How Computers Think: Introduction
Are You Ready to Hire a Data Scientist? Advice for Founders
Are you ready to hire a data scientist? Mengying Li, Growth Data Science Lead at Notion, shares her framework for testing whether you should invest in this key hire and how to find the right data…
Meet Julia: The Future of Data Science
The next big thing, or just massively overhyped?
Megastudy scepticism
In December last year Katherine Milkman and friends published a “megastudy” testing 54 interventions to increase the gym visits of 61,000 experimental participants.
Thought Leaders
We monitor hundreds of thought leaders, influencers, and newsletters in Data Science, including:
Andrew Ng
Co-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs

Hilary Mason
Co-Founder of @HiddenDoorCo. Formerly Founder of @FastForwardLabs (acquired by @Cloudera). I ♥ data and cheeseburgers. She/her.
Werner Vogels
CTO @ Amazon
Nathan Yau
making charts

Data Science Renee
Sr Director of DS at @HelioCampus || Author of SQL for Data Scientists (Wiley) || @DataSciGuide @NewDataSciJobs || @paix120 || Not views of employer || she/her
What is Refind?
Every day Refind picks the most relevant links from around the web for you. Picking only a handful of links means focusing on what’s relevant and useful. We favor timeless pieces—links with long shelf-lives, articles that are still relevant one month, one year, or even ten years from now. These lists of the best resources on any topic are the result of years of careful curation.
How does Refind curate?
It’s a mix of human and algorithmic curation, following a number of steps:
- We monitor 10k+ sources and 1k+ thought leaders on hundreds of topics—publications, blogs, news sites, newsletters, Substack, Medium, Twitter, etc.
- In addition, our users save links from around the web using our Save buttons and our extensions.
- Our algorithm processes 100k+ new links every day and uses external signals to find the most relevant ones, focusing on timeless pieces.
- Our community of active users gets the most relevant links every day, tailored to their interests. They provide feedback via implicit and explicit signals: open, read, listen, share, mark as read, read later, «More/less like this», etc.
- Our algorithm uses these internal signals to refine the selection.
- In addition, we have expert curators who manually curate niche topics.
The result: lists of the best and most useful articles on hundreds of topics.
How does Refind detect «timeless» pieces?
We focus on pieces with long shelf-lives—not news. We determine «timelessness» via a number of metrics, for example, the consumption pattern of links over time.
How many sources does Refind monitor?
We monitor 10k+ content sources on hundreds of topics—publications, blogs, news sites, newsletters, Substack, Medium, Twitter, etc.
Who are the thought leaders in Data Science?
We follow dozens of thought leaders in Data Science, including Andrew Ng, Hilary Mason, Werner Vogels, Nathan Yau, Data Science Renee.
Missing a thought leader? Submit them here
Can I submit a link?
Indirectly, by using Refind and saving links from outside (e.g., via our extensions).
How can I report a problem?
When you’re logged-in, you can flag any link via the «More» (...) menu. You can also report problems via email to hello@refind.com
Who uses Refind?
400k+ smart people start their day with Refind. To learn something new. To get inspired. To move forward. Our apps have a 4.9/5 rating.
Is Refind free?
Yes, it’s free!
How can I sign up?
Head over to our homepage and sign up by email or with your Twitter or Google account.