Links
- What do Hiring Managers Look For in a Data Scientist’s CV?
 - Interesting books:
 - Python tutorials (for non-Python programmers):
- Python for Programmers, collection of links from the Python wiki.
 - Learn Python Programming: The Definitive Guide.
 - Python for Programmers, in “Python 3 Patterns, Recipes and Idioms”.
 
 - Python Data Science Handbook at GitHub
 
References
- NumPy reference
 - Pandas documentation
 - SciPy Reference
 - StatsModels documentation
 - Scikit-learn user guide
 - Spark DataFrames guide
 
Data Sets
Some data sources that may be interesting:
Available on the Cluster
- Reddit Comments and Submissions
 - GHCN global weather data
 - OpenStreetMaps planet data dump
 - Wikidata dump
 - Wikipedia database downloads
 
Other Data Sets and Sources
- Stack Exchange Data Dump [25GB]
 - Yelp Academic Data Set
 - Statistics Canada Developer resources
 - Public data sets on AWS S3
 - Archive.org data set collection
 - Academic Torrents: “making [many] TB of research data available”
 - Open Science Data Cloud: “Repository for public data sets of scientific interest”
 - Great Github list of public data sets
 - Canada Open Government
 - Opendatasoft
 - Machine learning data sources:
 
Extras
- Data Science Workflow: Overview and Challenges
 - For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights
 - 2017 The State of Data Science & Machine Learning
 - One Data Science Job Doesn’t Fit All (at LinkedIn)
 - Andrew Ng's Coursera course "Machine Learning"
 - Berkeley DS 100, Principles and Techniques of Data Science
 
Updated Thu April 17 2025, 13:01 by ggbaker.