UCGHI Summary Report 2019-2022
Introduction
Hello Everyone!
Today we’ll be going over the UCGHI Student Ambassador Summary Report for the 2019-2022 cohorts.
Import Dataframes
We begin by importing the necessary csv files.
# import pandas for dataframe
import pandas as pd
Get dataframe with Ambassador Demographics
# read in and check ambassador demographics csv
df = pd.read_csv("2019-2022 Ambassador Demographics - Sheet1.csv")
df.head()
Year | Major(s) and/or Minor(s) | Campus | Degree | COE | Name | |
---|---|---|---|---|---|---|
0 | 2021-2022 | Sociology | UCSB | Graduate | PH | Alex Maldonado |
1 | 2021-2022 | Human Biology and Society / Global Health | UCLA | Undergraduate | PH | Alma Rincongallardo |
2 | 2021-2022 | Global Studies | UCSB | Undergraduate | PH | Alyssa Mandujano |
3 | 2021-2022 | Urban and Regional Planning | UCLA | Graduate | PH | Amanda Caswell |
4 | 2021-2022 | Biology / Environmental Science | UCR | Undergraduate | PH | Andrew Tseng |
... | ... | ... | ... | ... | ... | ... |
153 | 2019-2020 | Public Health - Maternal, Child & Adolescent H... | UCB | Graduate | CGHJ | Victoria Nguyen |
154 | 2019-2020 | Public Health - Maternal, Child, and Adolescen... | UCB | Graduate | CGHJ | Rebecca Astatke |
155 | 2019-2020 | Human Biology / Anthropology | UCI | Undergraduate | CGHJ | Catthi Ly |
156 | 2019-2020 | Public Health | UCM | Undergraduate | CGHJ | Ifunanya Okezie |
157 | 2019-2020 | Biochemistry and Cellular Biology / Art Histor... | UCSD | Undergraduate | CGHJ | Ikran Ibrahim |
158 rows × 6 columns
Get dataframe with campus coordinates
# import campus coordinates and check
df2 = pd.read_csv("Campus coordinates - Sheet1.csv")
df2
Campus | LATITUDE | LONGITUDE | COUNT | |
---|---|---|---|---|
0 | UCR | 33.9737 | 117.3281 | 6 |
1 | UCSD | 32.8801 | 117.2340 | 20 |
2 | UCSB | 34.4140 | 119.8489 | 13 |
3 | UCB | 37.8719 | 122.2585 | 20 |
4 | UCLA | 34.0689 | 118.4452 | 26 |
5 | UCD | 38.5382 | 121.7617 | 18 |
6 | UCM | 37.3647 | 120.4241 | 11 |
7 | UCSC | 36.9821 | 122.0593 | 13 |
8 | UCSF | 37.7632 | 122.4582 | 7 |
9 | UC Hastings | 37.7812 | 122.4158 | 1 |
10 | UCI | 33.6405 | 117.8443 | 17 |
11 | Charles Drew | 33.9256 | 118.2425 | 5 |
We then merge the two dataframes into one in order to obtain the corresponding coordinates for each campus for each student.
This will make it simpler for us later on. We’ll be using this merged dataframe for most of the code.
# merge dataframes to obtain coordinates
df3 = df.merge(df2, on="Campus")
df3
Year | Major(s) and/or Minor(s) | Campus | Degree | COE | Name | LATITUDE | LONGITUDE | COUNT | |
---|---|---|---|---|---|---|---|---|---|
0 | 2021-2022 | Sociology | UCSB | Graduate | PH | Alex Maldonado | 34.4140 | 119.8489 | 13 |
1 | 2021-2022 | Global Studies | UCSB | Undergraduate | PH | Alyssa Mandujano | 34.4140 | 119.8489 | 13 |
2 | 2021-2022 | Biological Anthropology / Sociology | UCSB | Undergraduate | PH | Ashley Willis | 34.4140 | 119.8489 | 13 |
3 | 2021-2022 | Chemistry | UCSB | Undergraduate | PH | Isabella Perez | 34.4140 | 119.8489 | 13 |
4 | 2021-2022 | Psychology | UCSB | Undergraduate | CGHJ | Arianna Macias | 34.4140 | 119.8489 | 13 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
152 | 2019-2020 | Public Health | UCM | Undergraduate | CGHJ | Irene Guzman | 37.3647 | 120.4241 | 11 |
153 | 2019-2020 | Psychology / Public Health | UCM | Undergraduate | CGHJ | Jacqueline Partida | 37.3647 | 120.4241 | 11 |
154 | 2019-2020 | Public Health | UCM | Undergraduate | CGHJ | Sydney Adams | 37.3647 | 120.4241 | 11 |
155 | 2019-2020 | Public Health | UCM | Undergraduate | CGHJ | Ifunanya Okezie | 37.3647 | 120.4241 | 11 |
156 | 2020-2021 | Law | UC Hastings | JD | CGHJ | Salina Isaq | 37.7812 | 122.4158 | 1 |
157 rows × 9 columns
# import enrollment info for each campus
enroll = pd.read_csv('UC Enrollment.csv')
enroll
Year | Campus | Enrollment | |
---|---|---|---|
0 | 2019 | UCB | 43185 |
1 | 2019 | UCLA | 44371 |
2 | 2019 | UCM | 8847 |
3 | 2019 | UCD | 38364 |
4 | 2019 | UCSD | 38736 |
5 | 2019 | UCSB | 26314 |
6 | 2019 | UCSC | 19494 |
7 | 2019 | UCI | 36908 |
8 | 2019 | UCSF | 3180 |
9 | 2019 | UCR | 25547 |
10 | 2020 | UCB | 42327 |
11 | 2020 | UCLA | 44589 |
12 | 2020 | UCM | 9018 |
13 | 2020 | UCD | 39074 |
14 | 2020 | UCSD | 39576 |
15 | 2020 | UCSB | 26179 |
16 | 2020 | UCSC | 19161 |
17 | 2020 | UCI | 36303 |
18 | 2020 | UCSF | 3201 |
19 | 2020 | UCR | 26434 |
20 | 2021 | UCB | 45036 |
21 | 2021 | UCLA | 46116 |
22 | 2021 | UCM | 9093 |
23 | 2021 | UCD | 40050 |
24 | 2021 | UCSD | 41885 |
25 | 2021 | UCSB | 26124 |
26 | 2021 | UCSC | 19841 |
27 | 2021 | UCI | 36505 |
28 | 2021 | UCSF | 3165 |
29 | 2021 | UCR | 26847 |
Check Demographics Info
Next, we’ll be looking through the demographics information to better understand and study each cohort / the ambassadors as a whole.
The questions we are currently interested in include:
Areas of Study - how many unique majors are there, and how many students fall into each category?
Campus - which campuses have the most ambassadors per year? which have the least?
Returner status - how many ambassadors return each year?
Degree type - how many students are undergraduate students vs. graduate students vs. doctoral students vs. other?
COE - how many students are in each COE each year?
Areas of Study
For areas of study, we will consider both majors and minors / specializations of the student ambassadors.
We begin by creating a list of all the majors and minors from the demographics dataframe.
studies = list(df3["Major(s) and/or Minor(s)"])
len(studies)
157
In order to create a data visualization that is readable, we’ll group the subjects into 4 categories. The following are the categories, as well as some examples of majors that fall into them:
Public Health / Global Health: Community Health Sciences, Epidemiology, Global Health, Global Studies, etc.
Computing / Mathematics / Engineering: Engineering, Statistics, Bioinformatics, Computer Science
Life / Physical Sciences: Biology, Psychology, Neurobiology, Brain Sciences, Medicine, Biomedical Sciences, Nursing, Pharmacy, Geography, Chemistry, Urban and Regional Planning
Social Sciences: Anthropology, Sociology, Gender Studies, Language, Political Science, Policy, Law, International Development / Relations, Labor Studies, Social Welfare, Legal Studies, Economics
In order to create our data visualization / see how many ambassadors fall into each category, we’ll create a dictionary with the categories as the keys.
For each category, if certain key words exist in ane element in list of majors and minors, we’ll add 1 to that category. For example, if the words “Public Health” or “Global” is in the element, we’ll add 1 to “Public Health / Global Health.”
Note: if an ambassador has multiple majors and/or minors, they will be counted more than once. For example, if am ambassador is majoring in public health and minoring in bioinformatics, 1 will be added to both the Public Health/Global Health category and the Computing/Mathematics/Engineering category.
studies_dict = {"Public Health / Global Health": 0,
"Computing / Mathematics / Engineering" : 0,
"Life / Physical Sciences" : 0,
"Social Sciences": 0}
for i in studies:
if any(word in i for word in ["Public Health", "Global"]):
studies_dict["Public Health / Global Health"] += 1
if any(word in i for word in ["Engineering", "Computer", "Statistics", "Bioinformatics"]):
studies_dict["Computing / Mathematics / Engineering"] += 1
if any(word in i for word in ["Bio", "Psychology", "Brain Sciences", "Medicine", "Nursing", "Pharmacy", "Geo", "Chemistry", "Urban and Regional Planning", "Environment"]):
studies_dict["Life / Physical Sciences"] += 1
if any(word in i for word in ["Poli", "Law", "International", "Labor", "Social Welfare", "Legal", "Economics", "Anthropology", "Gender", "Sociology", "Language"]):
studies_dict["Social Sciences"] += 1
Now that we have our categories with the corresponding values/count, let’s create a bar chart.
import plotly.express as px
fig = px.histogram(x=studies_dict.keys(),
y=studies_dict.values(),
title="Bar Chart of Main Areas of Study",
color_discrete_sequence=['navy'])
fig.update_layout(xaxis_title="Area of Study")
fig.show()
As we can see from the chart, our largest categories are Public/Global Health and Life/Physical Sciences. The smallest category is Computing/Mathematics/Engineering.
This seems logical, as the UCGHI Student Ambassador program focuses on Global Health issues, which tends to attract those interested in public/global health and the life sciences.
We should, however, keep in mind that the different campuses have different majors; some campuses may have more students in life sciences because there are more options or they’re more accessible.
Based on this chart, we can see that it may be beneficial to reach out to more departments in computing/math/engineering if we want a more interdisciplinary cohort of students.
Campus / Center of Expertise
Next, we will be looking at the different campuses our student ambassadors come from, as well as the centers of expertise these students belong to.
We are interested in the amount of students that come from each campus. We’ll be looking at which campuses have produced the most ambassadors and which have produced the least, as well as how many ambassadors are planetary health track vs. the center of gender health and justice track.
Let’s begin by visualizing this geographically. The size of each dot corresponds the amount of ambassadors. Feel free to zoom in to look more closely at the map.
import plotly.express as px
fig = px.scatter_geo(df3, lat='LATITUDE',
lon=df3['LONGITUDE']*-1,
size="COUNT",
hover_name="Campus",
color="Campus",
scope="usa",
center=dict(lat=35.3733, lon=-119.0187))
fig.update_layout(
title_text = '2019-2022 Student Ambassadors per Campus',
)
fig.show()
We’ll now be looking at the raw data / count of ambassadors, then we’ll account for the student population on each campus.
Note: we should keep in mind that these are students that got accepted into the ambassador program; there may have been more applicants / interested students from campuses that had less students accepted.
# Import necessary packages
import plotly.graph_objects as go
import numpy as np
# Initialize figure
fig = go.Figure()
# Add Traces
fig.add_trace(
go.Histogram(x=np.array(df3['Campus'][df3["COE"] == "PH"]), name="PH", marker_color = 'skyblue'))
fig.add_trace(
go.Histogram(x=np.array(df3['Campus'][df3["COE"] == "CGHJ"]), name="CGHJ", marker_color = 'navy'))
for i in df3['Year'].unique():
for j in df3['COE'].unique():
if j == "PH":
fig.add_trace(go.Histogram(x=np.array(df3['Campus'][(df3["Year"] == i) & (df3["COE"] == j)]), name=j, marker_color = 'skyblue'))
if j == "CGHJ":
fig.add_trace(go.Histogram(x=np.array(df3['Campus'][(df3["Year"] == i) & (df3["COE"] == j)]), name=j, marker_color = 'navy'))
# Add dropdown
fig.update_layout(
updatemenus=[
dict(active=0,
buttons=list([
dict(
label="All",
method="update",
args=[{"visible": [True, True, False, False, False, False, False, False]},
{"title": "All Student Ambassadors"}]),
dict(
label="2021-2022",
method="update",
args=[{"visible": [False, False, True, True, False, False, False, False]},
{"title": "2021-2022 Student Ambassadors"}]),
dict(
label="2020-2021",
method="update",
args=[{"visible": [False, False, False, False, True, True, False, False]},
{"title": "2020-2021 Student Ambassadors"}]
),
dict(
label="2019-2020",
method="update",
args=[{"visible": [False, False, False, False, False, False, True, True]},
{"title": "2019-2020 Student Ambassadors"}]
)
])
)
])
# Set title and barmode
fig.update_layout(title_text="Student Ambassadors per Campus per Year", barmode='stack')
fig.show()
Click on the different options on the dropdown menu to see how many ambassadors come from each campus, as well as how many are planetary health vs. center for gender health and justice. To see the count, hover over each bar / bar stack in the figure.
The figure includes data from each of the 3 cohorts, as well as a combination of all of them.
Keep in mind that this figure does not account for the ratio of ambassadors to student population on each campus (we’ll be looking at that soon).
Before we move onto the ratio of ambassadors to student population on each campus, let’s first take a look at the overall percentage of ambassadors for each center of expertise.
fig = px.pie(df3,
names='COE',
title='Student Ambassador COE 2019-2022',
color="COE",
color_discrete_map = {"PH":'skyblue', "CGHJ": "navy"})
fig.show()
More than half of the overall population of ambassadors are in the center for gender health and justice.
There could be multiple possible factors contributing this, such as the CGHJ being a more active center, more interest/applicants for this center and/or more students being accepted into this center, etc.
Moving on, for the next figure, we’ll be accounting for the student population on each campus for each school year. We are just interested in the proportion of ambassadors in relation the how many students are on campus, we we will not be including the number of planetary health vs. center for gender health and justice ambassadors.
We will be using the enroll dataframe as well as the demographics dataframe to calculate the proportions.
Note: the enroll dataframe has enrollment data from all the UCs except UC Hastings, and it does not have the enrollment data from Charles Drew. Therefore, we’ll just be looking at these UCs. However, since the average student population at UC Hastings and Charles Drew University is less than 1000, they would likely have the largest proportions.
enroll.head()
Year | Campus | Enrollment | |
---|---|---|---|
0 | 2019-2020 | UCB | 43185 |
1 | 2019-2020 | UCLA | 44371 |
2 | 2019-2020 | UCM | 8847 |
3 | 2019-2020 | UCD | 38364 |
4 | 2019-2020 | UCSD | 38736 |
Let’s see which campuses have the largest/smallest overall student population.
enroll.groupby(['Campus'])['Enrollment'].sum().sort_values(ascending=False)
Campus
UCLA 135076
UCB 130548
UCSD 120197
UCD 117488
UCI 109716
UCR 78828
UCSB 78617
UCSC 58496
UCM 26958
UCSF 9546
Name: Enrollment, dtype: int64
Moving on, let’s create an empty dataframe where we can store the proportions. This will be used for our visualizations.
prop = pd.DataFrame(columns = ['Year', 'Campus', 'Prop'], index = range(30))
prop.head()
Year | Campus | Prop | |
---|---|---|---|
0 | NaN | NaN | NaN |
1 | NaN | NaN | NaN |
2 | NaN | NaN | NaN |
3 | NaN | NaN | NaN |
4 | NaN | NaN | NaN |
We’ll change the years in enroll to match those in df3.
for k in range(len(enroll['Year'])):
if enroll['Year'][k] == 2019:
enroll['Year'][k] = '2019-2020'
elif enroll['Year'][k] == 2020:
enroll['Year'][k] = '2020-2021'
elif enroll['Year'][k] == 2021:
enroll['Year'][k] = '2021-2022'
enroll['Year'].unique()
array(['2019-2020', '2020-2021', '2021-2022'], dtype=object)
for i in df3['Year'].unique():
for j in df3['Campus'].unique():
for k in range(len(enroll)):
if (enroll['Year'][k] == i) & (enroll['Campus'][k] == j):
prop['Year'][k] = i
prop['Campus'][k] = j
prop['Prop'][k] = len(df3[(df3['Year'] == i) & (df3['Campus'] == j)]) / enroll['Enrollment'][k]
prop.head()
Year | Campus | Prop | |
---|---|---|---|
0 | 2019-2020 | UCB | 0.000162 |
1 | 2019-2020 | UCLA | 0.000068 |
2 | 2019-2020 | UCM | 0.000565 |
3 | 2019-2020 | UCD | 0.000156 |
4 | 2019-2020 | UCSD | 0.000155 |
fig = px.histogram(prop,
x="Campus",
y='Prop',
animation_frame="Year",
title="Proportion of Student Ambassadors per Campus",
color_discrete_sequence=['skyblue'])
fig["layout"].pop("updatemenus")
fig.show()
Click through the slider to see the different proportions throughout the years.
We’ll now be creating two visualizations that will allow us to compare the original data vs. the data that takes into account the student population on each campus.
For the original data, we’ll create a dataframe with count of each campus per year.
count = pd.DataFrame(columns = ['Year', 'Campus', 'Count'], index = range(30))
for i in df3['Year'].unique():
for j in df3['Campus'].unique():
for k in range(len(count)):
if (enroll['Year'][k] == i) & (enroll['Campus'][k] == j):
count['Year'][k] = i
count['Campus'][k] = j
count['Count'][k] = len(df3[(df3['Year'] == i) & (df3['Campus'] == j)])
count.head()
Year | Campus | Count | |
---|---|---|---|
0 | 2019-2020 | UCB | 5 |
1 | 2019-2020 | UCLA | 3 |
2 | 2019-2020 | UCM | 3 |
3 | 2019-2020 | UCD | 6 |
4 | 2019-2020 | UCSD | 4 |
fig = px.histogram(count,
x='Year',
y='Count',
color='Campus',
title='Student Ambassadors per Campus from 2019-2022',
color_discrete_sequence=['navy', 'skyblue', 'blue', 'royalblue', 'deepskyblue', 'turquoise', 'cyan', 'darkturquoise', 'lightgreen', 'teal'])
fig.show()
Student ambassador demographics raw data:
2019-2020 cohort:
- UC Davis
- UC Berkeley
- UCSD
2020-2021 cohort:
- UCLA
- UCI / UC Berkeley
- UC Davis / UCSD
2021-2022 cohort:
- UCLA
- UCSB
- UCI / UCSD
Overall: UCLA
fig = px.histogram(prop,
x='Year',
y='Prop',
color='Campus',
title='Proportion of Student Ambassadors per Campus from 2019-2022',
color_discrete_sequence=['navy', 'skyblue', 'blue', 'royalblue', 'deepskyblue', 'turquoise', 'cyan', 'darkturquoise', 'lightgreen', 'teal'])
fig.show()
Student ambassador demographics data when accounting for student population:
2019-2020 cohort:
- UCSF
- UC Merced
- UC Berkeley
2020-2021 cohort:
- UCSF
- UC Merced
- UCSC
2021-2022 cohort:
- UCSF
- UCSB
- UCLA
Overall: UCSF
From the figures above, we can see that UCLA has the most student ambassadors when we don’t account for the population. There could be multiple reasons for this, such as the prescence of the Center for Gender Health and Justice, the fact that UCLA has the largest student population, etc.
When we do account for the student population, UCSF has the highest proportion of student ambassadors. However, we should note that UCSF is a graduate school and only has about 3,000-4,000 students enrolled per year.
Note: UCSF and UC Merced have the smallest overall student populations and UCLA and UC Berkeley have the largest.
Degree Type
Now we’ll be focusing on the different degrees our ambassadors are studying towards.
First, we need to correct some mispellings in the dataframe. We’ll do this then use the unique() function to make sure it worked.
df3['Degree'][df3['Degree'] == "Undrgraduate"] = "Undergraduate"
df3['Degree'][df3['Degree'] == "Undergraduate "] = "Undergraduate"
df3['Degree'][df3['Degree'] == "Graduate "] = "Graduate"
df3['Degree'].unique()
array(['Graduate', 'Undergraduate', nan, 'MD', 'PhD', 'JD'], dtype=object)
Using the data, we’ll create a pie chart to see what degrees most ambassadors are studying for.
fig = px.pie(df3, names='Degree', title='Student Ambassador Degrees 2019-2022', color_discrete_sequence=["navy", "skyblue", "darkturquoise", "teal", "royalblue", "deepskyblue"])
fig.show()
As we can see from the chart, student ambassadors the past three years have been overwhelmingly undergraduate students. Not even all the other degrees combined can surpass, or even match, the amount of undergraduates. The next largest degree is graduate, followed by PhD, MD and then JD.
Note: the 3.18% null are the students who did not have their degree type filled out in the dataframe. Since this data was manually filled out from the UCGHI website, not all the information was available.*
The chart suggests that the program either appeals more or is advertised more to undergraduate students. In the future, it may be beneficial to target more students working towards different degrees (especially M.D. and J.D.) for a more diverse cohort.
Returners
Now, let’s take a look at how many students returned to the program throughout the years.
Note: this only accounts for those who were in the 2019-2022 programs and returned the following year(s). Since 2019-2020 was the first cohort, there were no returners that year.
returners = {}
for i in range(len(df3["Name "])):
for j in df3["Name "]:
returners[j] = 0
#once we have the character in dictionary, add up occurences
for j in df3["Name "]:
returners[j] += 1
for key, val in returners.items():
if val > 1:
print(key, val)
Kelly Song 2
Claire Amabile 2
Sean Sugai 2
Shirelle Mizrahi 2
Vandana Teki 3
Donna Pham 2
Colette Kirkpatrick 2
Eniola Owoyele 2
Geremy Lowe 2
Natasha Glendening 2
Kalani Phillips 3
Catthi Ly 2
Sydney Adams 2
returner_list = []
for key, val in returners.items():
if val > 1:
returner_list.append(key)
len(returner_list)
13
There are about 13 returning members from the previous 3 years, with 2 ambassadors being present for all 3 years and 11 being present for two years.
Survey Results
For our last section, we will just be looking at the amount of ambassadors per cohort, as well as the proportion of ambassadors that responded to the post program survey.
Note: Only survey data from the 2020-2021 and 2021-2022 cohorts were available.
len(df3[df3["Year"] == "2019-2020"]), len(df3[df3["Year"] == "2020-2021"]), len(df3[df3["Year"] == "2021-2022"])
(36, 70, 51)
2019-2020: 36 Ambassadors
2020-2021: 70 Ambassadors
2021-2022: 51 Ambassadors
19/70, 16/51
(0.2714285714285714, 0.3137254901960784)
2020-2021: 27% of ambassadors participated in post-program survey
2021-2022: 31% of ambassadors participated in post-program survey
References
Student demographic data: https://ucghi.universityofcalifornia.edu/get-involved/ucghi-student-ambassador-program
Student enrollment data: https://www.universityofcalifornia.edu/about-us/information-center/fall-enrollment-glance
Campus Coordinates: https://www.google.com
Link to slides: https://docs.google.com/presentation/d/1nX_3GqWHz-3xfUui9WDKPX_qctJ5IM3yGVXae5Eg_a0/edit?usp=sharing