Links
- What do Hiring Managers Look For in a Data Scientist’s CV?
- Interesting books:
- Python tutorials (for non-Python programmers):
- Python for Programmers, collection of links from the Python wiki.
- Learn Python Programming: The Definitive Guide.
- Python for Programmers, in “Python 3 Patterns, Recipes and Idioms”.
- Python Data Science Handbook at GitHub
References
- NumPy reference
- Pandas documentation
- SciPy Reference
- StatsModels documentation
- Scikit-learn user guide
- Spark DataFrames guide
Data Sets
Some data sources that may be interesting:
Available on the Cluster
- Reddit Comments and Submissions
- GHCN global weather data
- OpenStreetMaps planet data dump
- Wikidata dump
- Wikipedia database downloads
Other Data Sets and Sources
- Stack Exchange Data Dump [25GB]
- Yelp Academic Data Set
- Statistics Canada Developer resources
- Public data sets on AWS S3
- Archive.org data set collection
- Academic Torrents: “making [many] TB of research data available”
- Open Science Data Cloud: “Repository for public data sets of scientific interest”
- Great Github list of public data sets
- Canada Open Government
- Opendatasoft
- Machine learning data sources:
Extras
- Data Science Workflow: Overview and Challenges
- For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights
- 2017 The State of Data Science & Machine Learning
- One Data Science Job Doesn’t Fit All (at LinkedIn)
- Andrew Ng's Coursera course "Machine Learning"
- Berkeley DS 100, Principles and Techniques of Data Science
Updated Thu April 17 2025, 13:01 by ggbaker.