Advanced computational methods with UCSF clinical data on Information Commons
Getting ready to apply Machine Learning and other advanced computational methods to your research? You can do it with UCSF Information Commons, a high performance compute environment powered by AWS Apache Spark cluster. In this hands-on workshop, we will go through a real case study to explore de-identified UCSF Electronic Health Records using UCSF Information Commons. You will learn how to query UCSF clinical data and gain some of the skills necessary for building your own computational models in this environment.
In this workshop, you will learn how to do the following on Information Commons:
- Run SQL queries to extract de-identified clinical data of interest
- Manage your files on the cluster
- Launch JupyterHub and run Jupyter notebooks with Python, R or SparkSQL code
- Train a machine learning model using Spark-based tools
In order to benefit from this workshop, you must have an Information Commons account (see Accessing Information Commons) and permission to access UCSF de-identified clinical data (see Research Data and Tools Access Request). Please make sure that you do this by January 20, as this process can take up to 2 weeks.
We also strongly advise that you are comfortable with Unix shell scripting, SQL, and Jupyter notebooks. Familiarity with AWS s3 commands, Python and concepts of machine learning will also be helpful. Tutorials are available on the Information Commons Wiki.
Be sure to bring your laptop to the workshop!
Geoff Boushey is an Application Developer for the Data Science Initiative and Center for Knowledge Management in the UCSF Library
Angelo Pelonero is an Instructional Designer for the Data Science Initiative in the UCSF Library and for the Bakar Computational Health Sciences Institute at UCSF
And others from the Bakar Computational Health Sciences Institute and Library
- Thursday, February 6, 2020
- 3:00pm - 5:00pm
- Mission Bay