Links
- What do Hiring Managers Look For in a Data Scientist’s CV?
- Interesting books:
- Python tutorials (for non-Python programmers):
- Python for Programmers, collection of links from the Python wiki.
- Learn Python Programming: The Definitive Guide.
- Python for Programmers, in “Python 3 Patterns, Recipes and Idioms”.
- Python Data Science Handbook at GitHub
References
- NumPy reference
- Pandas documentation
- SciPy Reference
- StatsModels documentation
- Scikit-learn user guide
- Spark DataFrames guide
Data Sets
Some data sources that may be interesting:
Available on the Cluster
- Reddit Comments and Submissions
- GHCN global weather data
- OpenStreetMaps planet data dump *
- Wikidata dump *
- Wikipedia database downloads *
* docs pending: ask Greg if interested
Other Data Sets and Sources
- Stack Exchange Data Dump [25GB]
- Yelp Academic Data Set
- Statistics Canada Developer resources
- Public data sets on AWS S3
- Archive.org data set collection
- Academic Torrents: “making [many] TB of research data available”
- Open Science Data Cloud: “Repository for public data sets of scientific interest”
- Great Github list of public data sets
- Canada Open Government
- Opendatasoft
- Machine learning data sources:
Extras
- Data Science Workflow: Overview and Challenges
- For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights
- 2017 The State of Data Science & Machine Learning
- One Data Science Job Doesn’t Fit All (at LinkedIn)
- Andrew Ng's Coursera course "Machine Learning"
- Berkeley DS 100, Principles and Techniques of Data Science
Updated Tue Nov. 19 2024, 10:22 by ggbaker.