9 Best Articles in 2022
Gengo 🦁
The 50 Best Free Datasets for Machine Learning
Gengo 🦁
5 min read · From 2018 · What are some open datasets for machine learning? We at Gengo decided to create the ultimate cheat sheet for high quality datasets.
Reader View · Shared by 343, including dunschtig, Balda, melbic, Pagely®, Arne Keuning, Oleg Baskov, Philipp Laurim, Joona Tuunanen 🇫🇮, Vojtěch Hýža, Atul Pradhananga, Marc Hofer, Matthew Turland, Mark Pitman, Oliver Raduner, von Moerenburgh, Joanna Strom, Herbert Bay, Gideonro, Kenneth Kalmer
blog.google
Discovering millions of datasets on the web
blog.google
2 min read · From 2020 · Across the web, there are millions of datasets about nearly any subject that interests you. If you’re looking to buy a puppy, you could find datasets compiling complaints of puppy buyers or studies on…
Reader View · Shared by 277, including Nico Müller 🇺🇦, Antonio Ramos, Mark Kaigwa, Kaggle, Brian D. Earp, Ph.D., Vladimer Botsvadze, Desmond Williams, Tibor Martini 🇺🇦, Abel Caballero Díaz, Artnome - Bird and Worm Society, Elena Neira, Esther Schindler, Matthias Lampe, Thomas Euler
github.com
caesar0301/awesome-public-datasets
github.com
From 2015 · awesome-public-datasets - An awesome list of high-quality open datasets in public domains (on-going).
Shared by 161, including Dominik Grolimund, blinch, Christopher Möller, Moritz Klack, bouiboui, Ian Lurie 🇺🇦, Marck Vaisman, hallo met kasper, Data Science Renee, Kaspar Manz, René Clausen Nielsen, Nils Hitze, Sacha Roger, Nico Müller 🇺🇦, William El Kaim, Arin Basu, Philipp Laurim, Jesús Torres 👩💻, Jacob Jarnvall, Agustin
analyticsvidhya.com
25 Open Datasets for Deep Learning Every Data Scientist Must Work With
analyticsvidhya.com
~19 min read · From 2018 · Are you searching for quality deep learning datasets? We have listed 25 quality deep learning datasets you should work with to improve your DL skills!
Reader View · Shared by 149, including Oleg Baskov, Dr. Ganapathi Pulipaka 🇺🇸, rohit, Nico Müller 🇺🇦, Ferit (at 🏠) 🌙, Marco Unternaehrer, Philipp Laurim, Jacob du Toit, Mark Kaigwa, Thomas Power, Yves Mulkers
opendatascience.com
12 Excellent Datasets for Data Visualization in 2022
opendatascience.com
4 min read · Jul 5th · Data visualization requires quality data just as much as any other project. Finding data visualization datasets can be frustrating, but these datasets offer excellent resources to support…
Reader View · Shared by 334, including Kirk Borne, Katja Evertz
autodeskresearch.com
The Datasaurus Dozen
autodeskresearch.com
8 min read · From 2017 · Datasets which are identical over a number of statistical properties, yet produce dissimilar graphs, are frequently used to illustrate the importance of graphical representations when exploring data. This paper presents a novel method for generating such datasets, along with several examples. Our technique varies from previous approaches in that new datasets are iteratively
Reader View · Shared by 127, including Martin Stabe, Alejandro Vidal, Tom Connor, Bryan Onel, Charlie O'Keefe, dunschtig, Martina Pugliese, Stephanie A Kowalski, Alain, Alberto Cairo, Charles Kubicek, Barnaby Skinner, Luca Hammer, Oscar MacDonald, Jeff Atwood, René Clausen Nielsen, Chris {he, they}
blog.google
Making it easier to discover datasets
blog.google
3 min read · From 2018 · Similar to how Google Scholar works, Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page. To create…
Reader View · Shared by 93, including Bernhard Huessy, Google Webmasters, Jose Luis Calvo, Vojtěch Hýža, Hannes Gassert, Elena Neira, jean marc manach, seungho kim, Francesco Corea, MIT CSAIL, Pete Skomoroch, AlgoCompSynth by znmeb, Kaggle, Dominik Grolimund
github.com
awesomedata/awesome-public-datasets: A topic-centric list of high-quality open datasets in public domains. By everyone, for everyone!
github.com
From 2018 · awesome-public-datasets - A topic-centric list of high-quality open datasets in public domains. By everyone, for everyone!
Shared by 52, including Nico Müller 🇺🇦, Ben Collins, Jacob du Toit, Ahmad Ragab, nikeemah 🌶🥑, Vikram Dutt, Grégoire Japiot 🌻, Sebastian Raschka, Claudio Perez Gamayo, Kirk Borne
VentureBeat
3 big problems with datasets in AI and machine learning
VentureBeat
~12 min read · 2021-12-17 · Datasets in AI and machine learning contain many flaws. Some might be fixable, according to experts -- given enough time and resources.
Reader View · Shared by 105, including Theodora (Theo) Lau - 劉䂀曼 🌻, Bob E. Hayes, Florian Graillot, Ronald van Loon, Víctor González Pacheco, Kelly Hungerford 🌻, Harold Sinnott 📲 #DigitalTransformation, Mark Tabladillo PhD
Trending
opendatascience.com
12 Excellent Datasets for Data Visualization in 2022
opendatascience.com
4 min read · Jul 5th · Data visualization requires quality data just as much as any other project. Finding data visualization datasets can be frustrating, but these datasets offer excellent resources to support…
Reader View · Shared by 334, including Kirk Borne, Katja Evertz
VentureBeat
Tapping into the pulse of marketing with data visualization
VentureBeat
4 min read · Jul 23rd · Once datasets are cleaned, data visualization remodels them into intelligible graphics that put actionable insights on full display.
Reader View · Shared by 73, including Nicolas Babin
honeycomb.io
Datasets, Traces, and Spans—Oh My!
honeycomb.io
3 min read · Jul 22nd · By giving an overview into datasets, traces, and spans, you’ll get a peek behind the curtain into how Honeycomb facilitates observability.
Reader View · Shared by 6, including Software Daily
Towards Data Science
50 Public Sources for Machine Learning Datasets
Towards Data Science
~15 min read · Jul 11th · Best places to find free datasets to kickstart your machine learning and data science projects.
Reader View · Shared by 6, including Vikram Dutt
More like this
The Verge
Google launches new search engine to help scientists find the datasets they need
The Verge
8 min read · From 2018 · Dataset Search could be a scientist’s best friend
Reader View · Shared by 333, including Aleyda Solis 🇺🇦, Jean-Patrick, Gabriele, Victor Lee, luis antónio santos, Aníbal Monasterio Astobiza, Thomas Power, Warren Whitlock, Chris Messina, Kohei Asai, Florian Hanke 🍎, Leni Krsová 🎮, Grégoire Japiot 🌻, Florian Graillot, Connecticut SEO, Julian de Keijzer, Francesco Corea, Jose Luis Calvo, Javi Cantón, Dominik Grolimund
Towards Data Science
Google just published 25 million free datasets
Towards Data Science
2 min read · From 2020 · Here’s what you need to know about the largest data repository in the world
Reader View · Shared by 309, including Graeme Anderson, Mark Kaigwa, Kirk Borne, KOstas, Dr. Ganapathi Pulipaka 🇺🇸, Niklaus Gerber, Javi Cantón, Fabrizio Bianchi, Thomas Spreng, Jonathan Kogan, MIT CSAIL, Tom D'Amico, Katja Evertz, Desmond Williams
medium.com
Fueling the Gold Rush: The Greatest Public Datasets for AI
medium.com
8 min read · From 2017 · It has never been easier to build AI or machine learning-based systems than it is today. The ubiquity of cutting edge open-source tools…
Reader View · Shared by 184, including Nico Müller 🇺🇦, Kenneth Kalmer, Leslie D, Ha Duong, Frank Cieslik, Cameron Yick, Flavio Rump, Francesco Corea, Thomas Huhn, Bryan Onel, Ward Plunet, Barnaby Skinner, Aleyda Solis 🇺🇦, Thompson Marzagão, Jakub Albert Ferenc, Michael Musgrove, #bottish, Thomas Power, Dominik Grolimund, Bertrand Maltaverne
MIT Technology Review
AI datasets are filled with errors. It's warping what we know about AI
MIT Technology Review
2 min read · 2021-04-01 · Our understanding of progress in machine learning has been colored by flawed testing data.
Shared by 172, including Bertrand Maltaverne, Carla Gentry, 🟣 Antonio Vieira Santos #FutureOfWork, Harold Sinnott 📲 #DigitalTransformation, ipfconline, Andreas Staub, Benjamin, Dr Catherine Breslin, David Garner, Bob E. Hayes, Theodora (Theo) Lau - 劉䂀曼 🌻, R.NFT R “Ray” Wang 王瑞光 1A #Metaverse #RuleTheWorld, Sebastien Meunier, Martin Ford
lionbridge.ai
11 Best Climate Change Datasets for Machine Learning
lionbridge.ai
3 min read · 2020-10-20 · The open climate change datasets in this list come from government repositories and educational institutions and include temperature data, sea ice data, and more.
Reader View · Shared by 39, including ipfconline, AI, Gengo 🦁, Dr. Sally Eaves #TechForGood, Brian Ahier, Terence Mills, Nicolas Babin
academictorrents.com
Academic Torrents
academictorrents.com
From 2014 · We've designed a distributed system for sharing enormous datasets - for researchers, by researchers. The result is a scalable, secure, and fault-tolerant repository for data, with blazing fast…
Shared by 84, including Paolo Sinelli, Jesús Torres 👩💻, Bryan Onel, APKYP, Gerardo Segura, Leni Krsová 🎮, rohit, René Clausen Nielsen, Chini Ni, William El Kaim
github.com
huggingface/datasets
github.com
2020-09-13 · 🤗 Fast, efficient, open-access datasets and evaluation metrics for Natural Language Processing and more in PyTorch, TensorFlow, NumPy and Pandas - huggingface/datasets
Shared by 24, including Vikram Dutt, Nico Müller 🇺🇦, Pete Skomoroch, Dayyan Smith, AIKing.Eth - Vincent Boucher
medium.com
The Best Public Datasets for Machine Learning and Data Science
medium.com
~13 min read · From 2019 · What are the best datasets for machine learning and data science? After reviewing datasets hours after hours, we have created a great…
Reader View · Shared by 31, including MIT CSAIL, Kirk Borne, Ronald van Loon, Elena Neira, Nicolas Babin, Jose Luis Calvo, Dr. Ganapathi Pulipaka 🇺🇸
Massachusetts Institute of Technology (MIT)
When it comes to AI, can we ditch the datasets?
Massachusetts Institute of Technology (MIT)
5 min read · Mar 15th · MIT researchers have developed a technique to train a machine-learning model for image classification, which does not require the use of a dataset. Instead, they use a “generative model” to produce…
Reader View · Shared by 148, including Dean Anthony Gratton, Andreas Staub, Nige Willson, Carla Gentry, Dr. Ganapathi Pulipaka 🇺🇸, Nicolas Babin, Bob E. Hayes, Iain Brown, PhD