Datasets for STAT courses
Dataset resources for STAT courses
Statistics courses at Kenyon may require the locating and downloading of real-world datasets in order to practice running statistical analyses or tests. Below are links to a number of sources for datasets that can serve this purpose. I've tried to group them by general topic so that you can use data in an area that is of interest to you.
In order to be useful, may of them will require some filtering or whittling down. That means you'll have to select which variables to analyze, and may want to limit the population under consideration to some smaller subgroups, but that's fairly easily done in spreadsheet or text editing apps.
Demographic datasets
-
Social Explorer This link opens in a new windowKenyon subscription database that includes demographic data pulling from a number of sources, including the U.S. Census Bureau, available via maps or data tables, and stretching back to the first decennial census in 1790.
-
US Census data portal: data.census.govThe data.census.gov site provides easy to access to aggregate data on population, housing, economic and geographic information from several censuses and surveys.
Economic datasets
-
Federal Reserve Bank of New York: Data and StatisticsU.S. economic data related to a variety of economic indicators.
-
Federal Reserve Bank of St. Louis: Economic DataU.S. and international economic data related to a variety of economic indicators.
-
World Bank Data CatalogA “One-Stop Shop” for development data produced, acquired or used by the World Bank, including datasets from the Microdata Library, EnergyData.Info, Finances, and World Bank Open Data.
-
International Monetary Fund (IMF) DataAccess to macroeconomic and financial data, browsable by country and by indicator.
Health-related datasets
-
Teaching of Statistics in the Health Sciences datasetsA great collection of health-related datasets curated by medical researchers, and available for download.
-
CDC Data Portal: data.cdc.govThe CDC's primary data portal, which contains links to a wide range of health-related datasets.
Politics datasets
-
Social Explorer This link opens in a new windowKenyon subscription database that provides election/voting data in addition to other demographic information.
-
Roper Center Public Opinion Archives (with iPOLL) This link opens in a new windowKenyon subscription database that provides thousands of U.S. and international survey datasets, many of which are related to politics and/or voting.
-
Varieties of Democracy datasetsVarieties of Democracy (V-Dem) tracks high-level conceptualizations of democracy around the world, and provides data downloads as well as reference materials to help understand the data.
-
FiveThirtyEight datasetsFiveThirtyEight makes all of its data freely available for use, and includes lots of elections and polling datasets.
-
Gapminder data downloadGapminder uses polling to expose common misconceptions about important global issues, with data arranged by indicator.
Sports datasets
-
Sports ReferenceData and statistics related to the MLB, NBA, NFL, NHL, pro soccer, and some college sports. Includes WNBA data, but little to no other data on women's sports as far as I can tell. Data downloads are more complex queries may require a paid membership.
-
TennisAbstract.comData and statistics on WTA and ATP players. Displayed data may only be available for top 50 players, but more comprehensive data is available via the creator's GitHub.
-
FiveThirtyEight datasetsThough more widely known as a politics resource, FiveThirtyEight also provides plenty of sports-related datasets.
Vehicle/automotive datasets
-
FuelEconomy.govOfficial U.S. government source for fuel economy data, providing datafiles back to 1978.
-
GoodCarBadCarU.S. automotive sales figures broken down by make and model, with detailed tables available from 2019-present (under "By Year"). Data downloads and customized queries likely require a paid membership.
Miscellaneous other data sources
-
Gapminder data downloadGapminder uses polling to expose common misconceptions about important global issues, with data arranged by indicator.
-
UC Irvine Machine Learning datasetsThe UC Irvine Machine Learning Repository includes datasets for use in machine learning applications; however, these would also be useful to STAT students.
-
StatLib datasets archiveThough it hasn't been updated since 2005, this archive of plaintext datasets contains a variety of examples that may be useful to STAT students.
-
Sage Data This link opens in a new window
Statistical data in a wide range of subject areas. Find, visualize, and share U.S. and international statistics from trusted sources that cover business, crime, health, labor and more