UCGHI Summary Report 2019-2022

Introduction

Hello Everyone!

Today we’ll be going over the UCGHI Student Ambassador Summary Report for the 2019-2022 cohorts.

Import Dataframes

We begin by importing the necessary csv files.

# import pandas for dataframe
import pandas as pd

Get dataframe with Ambassador Demographics

# read in and check ambassador demographics csv
df = pd.read_csv("2019-2022 Ambassador Demographics - Sheet1.csv")
df.head()

	Year	Major(s) and/or Minor(s)	Campus	Degree	COE	Name
0	2021-2022	Sociology	UCSB	Graduate	PH	Alex Maldonado
1	2021-2022	Human Biology and Society / Global Health	UCLA	Undergraduate	PH	Alma Rincongallardo
2	2021-2022	Global Studies	UCSB	Undergraduate	PH	Alyssa Mandujano
3	2021-2022	Urban and Regional Planning	UCLA	Graduate	PH	Amanda Caswell
4	2021-2022	Biology / Environmental Science	UCR	Undergraduate	PH	Andrew Tseng
...	...	...	...	...	...	...
153	2019-2020	Public Health - Maternal, Child & Adolescent H...	UCB	Graduate	CGHJ	Victoria Nguyen
154	2019-2020	Public Health - Maternal, Child, and Adolescen...	UCB	Graduate	CGHJ	Rebecca Astatke
155	2019-2020	Human Biology / Anthropology	UCI	Undergraduate	CGHJ	Catthi Ly
156	2019-2020	Public Health	UCM	Undergraduate	CGHJ	Ifunanya Okezie
157	2019-2020	Biochemistry and Cellular Biology / Art Histor...	UCSD	Undergraduate	CGHJ	Ikran Ibrahim

158 rows × 6 columns

Get dataframe with campus coordinates

# import campus coordinates and check
df2 = pd.read_csv("Campus coordinates - Sheet1.csv")
df2

	Campus	LATITUDE	LONGITUDE	COUNT
0	UCR	33.9737	117.3281	6
1	UCSD	32.8801	117.2340	20
2	UCSB	34.4140	119.8489	13
3	UCB	37.8719	122.2585	20
4	UCLA	34.0689	118.4452	26
5	UCD	38.5382	121.7617	18
6	UCM	37.3647	120.4241	11
7	UCSC	36.9821	122.0593	13
8	UCSF	37.7632	122.4582	7
9	UC Hastings	37.7812	122.4158	1
10	UCI	33.6405	117.8443	17
11	Charles Drew	33.9256	118.2425	5

We then merge the two dataframes into one in order to obtain the corresponding coordinates for each campus for each student.

This will make it simpler for us later on. We’ll be using this merged dataframe for most of the code.

# merge dataframes to obtain coordinates
df3 = df.merge(df2, on="Campus")
df3

	Year	Major(s) and/or Minor(s)	Campus	Degree	COE	Name	LATITUDE	LONGITUDE	COUNT
0	2021-2022	Sociology	UCSB	Graduate	PH	Alex Maldonado	34.4140	119.8489	13
1	2021-2022	Global Studies	UCSB	Undergraduate	PH	Alyssa Mandujano	34.4140	119.8489	13
2	2021-2022	Biological Anthropology / Sociology	UCSB	Undergraduate	PH	Ashley Willis	34.4140	119.8489	13
3	2021-2022	Chemistry	UCSB	Undergraduate	PH	Isabella Perez	34.4140	119.8489	13
4	2021-2022	Psychology	UCSB	Undergraduate	CGHJ	Arianna Macias	34.4140	119.8489	13
...	...	...	...	...	...	...	...	...	...
152	2019-2020	Public Health	UCM	Undergraduate	CGHJ	Irene Guzman	37.3647	120.4241	11
153	2019-2020	Psychology / Public Health	UCM	Undergraduate	CGHJ	Jacqueline Partida	37.3647	120.4241	11
154	2019-2020	Public Health	UCM	Undergraduate	CGHJ	Sydney Adams	37.3647	120.4241	11
155	2019-2020	Public Health	UCM	Undergraduate	CGHJ	Ifunanya Okezie	37.3647	120.4241	11
156	2020-2021	Law	UC Hastings	JD	CGHJ	Salina Isaq	37.7812	122.4158	1

157 rows × 9 columns

# import enrollment info for each campus

enroll = pd.read_csv('UC Enrollment.csv')
enroll

	Year	Campus	Enrollment
0	2019	UCB	43185
1	2019	UCLA	44371
2	2019	UCM	8847
3	2019	UCD	38364
4	2019	UCSD	38736
5	2019	UCSB	26314
6	2019	UCSC	19494
7	2019	UCI	36908
8	2019	UCSF	3180
9	2019	UCR	25547
10	2020	UCB	42327
11	2020	UCLA	44589
12	2020	UCM	9018
13	2020	UCD	39074
14	2020	UCSD	39576
15	2020	UCSB	26179
16	2020	UCSC	19161
17	2020	UCI	36303
18	2020	UCSF	3201
19	2020	UCR	26434
20	2021	UCB	45036
21	2021	UCLA	46116
22	2021	UCM	9093
23	2021	UCD	40050
24	2021	UCSD	41885
25	2021	UCSB	26124
26	2021	UCSC	19841
27	2021	UCI	36505
28	2021	UCSF	3165
29	2021	UCR	26847

Check Demographics Info

Next, we’ll be looking through the demographics information to better understand and study each cohort / the ambassadors as a whole.

The questions we are currently interested in include:

Areas of Study - how many unique majors are there, and how many students fall into each category?

Campus - which campuses have the most ambassadors per year? which have the least?

Returner status - how many ambassadors return each year?

Degree type - how many students are undergraduate students vs. graduate students vs. doctoral students vs. other?

COE - how many students are in each COE each year?

Areas of Study

For areas of study, we will consider both majors and minors / specializations of the student ambassadors.

We begin by creating a list of all the majors and minors from the demographics dataframe.

studies = list(df3["Major(s) and/or Minor(s)"])
len(studies)

In order to create a data visualization that is readable, we’ll group the subjects into 4 categories. The following are the categories, as well as some examples of majors that fall into them:

Public Health / Global Health: Community Health Sciences, Epidemiology, Global Health, Global Studies, etc.

Computing / Mathematics / Engineering: Engineering, Statistics, Bioinformatics, Computer Science

Life / Physical Sciences: Biology, Psychology, Neurobiology, Brain Sciences, Medicine, Biomedical Sciences, Nursing, Pharmacy, Geography, Chemistry, Urban and Regional Planning

Social Sciences: Anthropology, Sociology, Gender Studies, Language, Political Science, Policy, Law, International Development / Relations, Labor Studies, Social Welfare, Legal Studies, Economics

In order to create our data visualization / see how many ambassadors fall into each category, we’ll create a dictionary with the categories as the keys.

For each category, if certain key words exist in ane element in list of majors and minors, we’ll add 1 to that category. For example, if the words “Public Health” or “Global” is in the element, we’ll add 1 to “Public Health / Global Health.”

Note: if an ambassador has multiple majors and/or minors, they will be counted more than once. For example, if am ambassador is majoring in public health and minoring in bioinformatics, 1 will be added to both the Public Health/Global Health category and the Computing/Mathematics/Engineering category.

studies_dict = {"Public Health / Global Health": 0,
               "Computing / Mathematics / Engineering" : 0,
               "Life / Physical Sciences" : 0,
               "Social Sciences": 0}

for i in studies:
    if any(word in i for word in ["Public Health", "Global"]):
        studies_dict["Public Health / Global Health"] += 1
    if any(word in i for word in ["Engineering", "Computer", "Statistics", "Bioinformatics"]):
        studies_dict["Computing / Mathematics / Engineering"] += 1
    if any(word in i for word in ["Bio", "Psychology", "Brain Sciences", "Medicine", "Nursing", "Pharmacy", "Geo", "Chemistry", "Urban and Regional Planning", "Environment"]):
        studies_dict["Life / Physical Sciences"] += 1
    if any(word in i for word in ["Poli", "Law", "International", "Labor", "Social Welfare", "Legal", "Economics", "Anthropology", "Gender", "Sociology", "Language"]):
        studies_dict["Social Sciences"]  += 1

Now that we have our categories with the corresponding values/count, let’s create a bar chart.

import plotly.express as px

fig = px.histogram(x=studies_dict.keys(), 
                   y=studies_dict.values(), 
                   title="Bar Chart of Main Areas of Study", 
                   color_discrete_sequence=['navy'])

fig.update_layout(xaxis_title="Area of Study")
fig.show()

As we can see from the chart, our largest categories are Public/Global Health and Life/Physical Sciences. The smallest category is Computing/Mathematics/Engineering.

This seems logical, as the UCGHI Student Ambassador program focuses on Global Health issues, which tends to attract those interested in public/global health and the life sciences.

We should, however, keep in mind that the different campuses have different majors; some campuses may have more students in life sciences because there are more options or they’re more accessible.

Based on this chart, we can see that it may be beneficial to reach out to more departments in computing/math/engineering if we want a more interdisciplinary cohort of students.

Campus / Center of Expertise

Next, we will be looking at the different campuses our student ambassadors come from, as well as the centers of expertise these students belong to.

We are interested in the amount of students that come from each campus. We’ll be looking at which campuses have produced the most ambassadors and which have produced the least, as well as how many ambassadors are planetary health track vs. the center of gender health and justice track.

Let’s begin by visualizing this geographically. The size of each dot corresponds the amount of ambassadors. Feel free to zoom in to look more closely at the map.

import plotly.express as px
fig = px.scatter_geo(df3, lat='LATITUDE', 
                        lon=df3['LONGITUDE']*-1, 
                        size="COUNT",
                        hover_name="Campus",
                        color="Campus",
                        scope="usa",
                        center=dict(lat=35.3733, lon=-119.0187))

fig.update_layout(
        title_text = '2019-2022 Student Ambassadors per Campus',
    )

fig.show()

We’ll now be looking at the raw data / count of ambassadors, then we’ll account for the student population on each campus.

Note: we should keep in mind that these are students that got accepted into the ambassador program; there may have been more applicants / interested students from campuses that had less students accepted.

# Import necessary packages

import plotly.graph_objects as go
import numpy as np

# Initialize figure

fig = go.Figure()

# Add Traces

        
fig.add_trace(
    go.Histogram(x=np.array(df3['Campus'][df3["COE"] == "PH"]), name="PH", marker_color = 'skyblue'))
fig.add_trace(
    go.Histogram(x=np.array(df3['Campus'][df3["COE"] == "CGHJ"]), name="CGHJ", marker_color = 'navy'))

for i in df3['Year'].unique():
    for j in df3['COE'].unique():
        if j == "PH":
            fig.add_trace(go.Histogram(x=np.array(df3['Campus'][(df3["Year"] == i) & (df3["COE"] == j)]), name=j, marker_color = 'skyblue'))
        if j == "CGHJ":
            fig.add_trace(go.Histogram(x=np.array(df3['Campus'][(df3["Year"] == i) & (df3["COE"] == j)]), name=j, marker_color = 'navy'))



# Add dropdown
fig.update_layout(
    updatemenus=[
        dict(active=0,
            buttons=list([
                dict(
                    label="All",
                    method="update",
                    args=[{"visible": [True, True, False, False, False, False, False, False]},
                           {"title": "All Student Ambassadors"}]),
                dict(
                    label="2021-2022",
                    method="update",
                    args=[{"visible": [False, False, True, True, False, False, False, False]},
                           {"title": "2021-2022 Student Ambassadors"}]),
                dict(
                    label="2020-2021",
                    method="update",
                    args=[{"visible": [False, False, False, False, True, True, False, False]},
                           {"title": "2020-2021 Student Ambassadors"}]
                ),
                dict(
                    label="2019-2020",
                    method="update",
                    args=[{"visible": [False, False, False, False, False, False, True, True]},
                           {"title": "2019-2020 Student Ambassadors"}]
                )
             ])
        )
    ])

# Set title and barmode
fig.update_layout(title_text="Student Ambassadors per Campus per Year", barmode='stack')

fig.show()

Click on the different options on the dropdown menu to see how many ambassadors come from each campus, as well as how many are planetary health vs. center for gender health and justice. To see the count, hover over each bar / bar stack in the figure.

The figure includes data from each of the 3 cohorts, as well as a combination of all of them.

Keep in mind that this figure does not account for the ratio of ambassadors to student population on each campus (we’ll be looking at that soon).

Before we move onto the ratio of ambassadors to student population on each campus, let’s first take a look at the overall percentage of ambassadors for each center of expertise.

fig = px.pie(df3, 
             names='COE', 
             title='Student Ambassador COE 2019-2022',
             color="COE",
             color_discrete_map = {"PH":'skyblue', "CGHJ": "navy"})
fig.show()

More than half of the overall population of ambassadors are in the center for gender health and justice.

There could be multiple possible factors contributing this, such as the CGHJ being a more active center, more interest/applicants for this center and/or more students being accepted into this center, etc.

Moving on, for the next figure, we’ll be accounting for the student population on each campus for each school year. We are just interested in the proportion of ambassadors in relation the how many students are on campus, we we will not be including the number of planetary health vs. center for gender health and justice ambassadors.

We will be using the enroll dataframe as well as the demographics dataframe to calculate the proportions.

Note: the enroll dataframe has enrollment data from all the UCs except UC Hastings, and it does not have the enrollment data from Charles Drew. Therefore, we’ll just be looking at these UCs. However, since the average student population at UC Hastings and Charles Drew University is less than 1000, they would likely have the largest proportions.

enroll.head()

	Year	Campus	Enrollment
0	2019-2020	UCB	43185
1	2019-2020	UCLA	44371
2	2019-2020	UCM	8847
3	2019-2020	UCD	38364
4	2019-2020	UCSD	38736

Let’s see which campuses have the largest/smallest overall student population.

enroll.groupby(['Campus'])['Enrollment'].sum().sort_values(ascending=False)

Campus
UCLA    135076
UCB     130548
UCSD    120197
UCD     117488
UCI     109716
UCR      78828
UCSB     78617
UCSC     58496
UCM      26958
UCSF      9546
Name: Enrollment, dtype: int64

Moving on, let’s create an empty dataframe where we can store the proportions. This will be used for our visualizations.

prop = pd.DataFrame(columns = ['Year', 'Campus', 'Prop'], index = range(30))
prop.head()

	Year	Campus	Prop
0	NaN	NaN	NaN
1	NaN	NaN	NaN
2	NaN	NaN	NaN
3	NaN	NaN	NaN
4	NaN	NaN	NaN

We’ll change the years in enroll to match those in df3.

for k in range(len(enroll['Year'])):
    if enroll['Year'][k] == 2019:
        enroll['Year'][k] = '2019-2020'
    elif enroll['Year'][k] == 2020:
        enroll['Year'][k] = '2020-2021'
    elif enroll['Year'][k] == 2021:
        enroll['Year'][k] = '2021-2022'
        
enroll['Year'].unique()

array(['2019-2020', '2020-2021', '2021-2022'], dtype=object)

for i in df3['Year'].unique():
    for j in df3['Campus'].unique():
        for k in range(len(enroll)):
            if (enroll['Year'][k] == i) & (enroll['Campus'][k] == j):
                prop['Year'][k] = i
                prop['Campus'][k] = j
                prop['Prop'][k] = len(df3[(df3['Year'] == i) & (df3['Campus'] == j)]) / enroll['Enrollment'][k]

prop.head()

	Year	Campus	Prop
0	2019-2020	UCB	0.000162
1	2019-2020	UCLA	0.000068
2	2019-2020	UCM	0.000565
3	2019-2020	UCD	0.000156
4	2019-2020	UCSD	0.000155

fig = px.histogram(prop, 
                   x="Campus", 
                   y='Prop',
                   animation_frame="Year",
                   title="Proportion of Student Ambassadors per Campus",
                   color_discrete_sequence=['skyblue'])
  
fig["layout"].pop("updatemenus")
fig.show()

Click through the slider to see the different proportions throughout the years.

We’ll now be creating two visualizations that will allow us to compare the original data vs. the data that takes into account the student population on each campus.

For the original data, we’ll create a dataframe with count of each campus per year.

count = pd.DataFrame(columns = ['Year', 'Campus', 'Count'], index = range(30))

for i in df3['Year'].unique():
    for j in df3['Campus'].unique():
        for k in range(len(count)):
            if (enroll['Year'][k] == i) & (enroll['Campus'][k] == j):
                count['Year'][k] = i
                count['Campus'][k] = j
                count['Count'][k] = len(df3[(df3['Year'] == i) & (df3['Campus'] == j)]) 
                
count.head()

	Year	Campus	Count
0	2019-2020	UCB	5
1	2019-2020	UCLA	3
2	2019-2020	UCM	3
3	2019-2020	UCD	6
4	2019-2020	UCSD	4

fig = px.histogram(count,
             x='Year',
             y='Count',
             color='Campus',
             title='Student Ambassadors per Campus from 2019-2022',
             color_discrete_sequence=['navy', 'skyblue', 'blue', 'royalblue', 'deepskyblue', 'turquoise', 'cyan', 'darkturquoise', 'lightgreen', 'teal'])

fig.show()

Student ambassador demographics raw data:

2019-2020 cohort:

UC Davis
UC Berkeley
UCSD

2020-2021 cohort:

UCLA
UCI / UC Berkeley
UC Davis / UCSD

2021-2022 cohort:

UCLA
UCSB
UCI / UCSD

Overall: UCLA

fig = px.histogram(prop,
             x='Year',
             y='Prop',
             color='Campus',
             title='Proportion of Student Ambassadors per Campus from 2019-2022',
             color_discrete_sequence=['navy', 'skyblue', 'blue', 'royalblue', 'deepskyblue', 'turquoise', 'cyan', 'darkturquoise', 'lightgreen', 'teal'])

fig.show()

Student ambassador demographics data when accounting for student population:

2019-2020 cohort:

UCSF
UC Merced
UC Berkeley

2020-2021 cohort:

UCSF
UC Merced
UCSC

2021-2022 cohort:

UCSF
UCSB
UCLA

Overall: UCSF

From the figures above, we can see that UCLA has the most student ambassadors when we don’t account for the population. There could be multiple reasons for this, such as the prescence of the Center for Gender Health and Justice, the fact that UCLA has the largest student population, etc.

When we do account for the student population, UCSF has the highest proportion of student ambassadors. However, we should note that UCSF is a graduate school and only has about 3,000-4,000 students enrolled per year.

Note: UCSF and UC Merced have the smallest overall student populations and UCLA and UC Berkeley have the largest.

Degree Type

Now we’ll be focusing on the different degrees our ambassadors are studying towards.

First, we need to correct some mispellings in the dataframe. We’ll do this then use the unique() function to make sure it worked.

df3['Degree'][df3['Degree'] == "Undrgraduate"] = "Undergraduate"
df3['Degree'][df3['Degree'] == "Undergraduate "] = "Undergraduate"
df3['Degree'][df3['Degree'] == "Graduate "] = "Graduate"
df3['Degree'].unique()

array(['Graduate', 'Undergraduate', nan, 'MD', 'PhD', 'JD'], dtype=object)

Using the data, we’ll create a pie chart to see what degrees most ambassadors are studying for.

fig = px.pie(df3, names='Degree', title='Student Ambassador Degrees 2019-2022', color_discrete_sequence=["navy", "skyblue", "darkturquoise", "teal", "royalblue", "deepskyblue"])
fig.show()

As we can see from the chart, student ambassadors the past three years have been overwhelmingly undergraduate students. Not even all the other degrees combined can surpass, or even match, the amount of undergraduates. The next largest degree is graduate, followed by PhD, MD and then JD.

Note: the 3.18% null are the students who did not have their degree type filled out in the dataframe. Since this data was manually filled out from the UCGHI website, not all the information was available.*

The chart suggests that the program either appeals more or is advertised more to undergraduate students. In the future, it may be beneficial to target more students working towards different degrees (especially M.D. and J.D.) for a more diverse cohort.

Returners

Now, let’s take a look at how many students returned to the program throughout the years.

Note: this only accounts for those who were in the 2019-2022 programs and returned the following year(s). Since 2019-2020 was the first cohort, there were no returners that year.

returners = {}

for i in range(len(df3["Name "])):
    for j in df3["Name "]: 
        returners[j] = 0
    #once we have the character in dictionary, add up occurences
    for j in df3["Name "]:
        returners[j] += 1

for key, val in returners.items():
    if val > 1:
        print(key, val)

Kelly Song 2
Claire Amabile 2
Sean Sugai 2
Shirelle Mizrahi 2
Vandana Teki 3
Donna Pham 2
Colette Kirkpatrick 2
Eniola Owoyele 2
Geremy Lowe 2
Natasha Glendening 2
Kalani Phillips 3
Catthi Ly 2
Sydney Adams 2

returner_list = []

for key, val in returners.items():
    if val > 1:
        returner_list.append(key)
        
        
len(returner_list)

There are about 13 returning members from the previous 3 years, with 2 ambassadors being present for all 3 years and 11 being present for two years.

Survey Results

For our last section, we will just be looking at the amount of ambassadors per cohort, as well as the proportion of ambassadors that responded to the post program survey.

Note: Only survey data from the 2020-2021 and 2021-2022 cohorts were available.

len(df3[df3["Year"] == "2019-2020"]), len(df3[df3["Year"] == "2020-2021"]), len(df3[df3["Year"] == "2021-2022"])

(36, 70, 51)

2019-2020: 36 Ambassadors

2020-2021: 70 Ambassadors

2021-2022: 51 Ambassadors

19/70, 16/51

(0.2714285714285714, 0.3137254901960784)

2020-2021: 27% of ambassadors participated in post-program survey

2021-2022: 31% of ambassadors participated in post-program survey

References

Student demographic data: https://ucghi.universityofcalifornia.edu/get-involved/ucghi-student-ambassador-program

Student enrollment data: https://www.universityofcalifornia.edu/about-us/information-center/fall-enrollment-glance

Campus Coordinates: https://www.google.com

Link to slides: https://docs.google.com/presentation/d/1nX_3GqWHz-3xfUui9WDKPX_qctJ5IM3yGVXae5Eg_a0/edit?usp=sharing

Written on August 17, 2022