Ari Hadjiyianni
Hi, my name is Ari Hadjiyianni. I’m a rising senior at UC Berkeley majoring in Statistics and minoring in Data Science. I am very interested in the ethics of data collection/cleaning, and am passionate about making data as publicly available as possible. For this project, I was mainly responsible for finding relevant sources, analyzing our data, and providing descriptive explanations for our findings.
Sean Tai
Hi, my name is Sean Tai. I’m a rising sophomore at UC Berkeley majoring in Data Science. I’m interested in the intersection between data science, culture, ethics, and society, and for this project I was working on the data side, focusing on the data critique, two sources of the annotated bibliography, and proofreading the data and written aspects.
Data Critique
The primary datasets in our digital humanities project are "Median Household Income 2024,” “Race and Ethnicity in Oakland,” and “Zillow House Data,” which all offer insights into social and economic issues in the Bay Area. They focus on ownership patterns, racial biases, gentrification effects, and movement patterns of minority groups. The “Median Household Income 2024” dataset shows median household incomes by regions and times, highlighting economic disparities. The "Race and Ethnicity" dataset shows the racial and ethnic makeup of the Oakland population, useful for understanding population shifts and the demographics regarding minorities. The “Zillow House Data” dataset allows us to view the typical value of homes in specific regions, allowing for deeper analysis on the economic variance in neighborhoods.
We aimed to understand if and how racial biases affect ownership through analysis of incomes and demographic changes over time. The historical discrimination of minority groups has had lasting effects within Oakland and other similar cities across America, one such example is the education gap found in children of low income families compared to children in wealthy neighborhoods. Gentrification further increases the already large wealth gap between the white middle class and minorities, specifically African Americans. The datasets allow for analysis on gentrification by linking changes in racial makeup with income and property costs through visualizations, and examining the distribution of minority group demographics in gentrified areas. We can also examine the relationship between income changes, property values, and gentrification to understand economic pressures on different racial groups, as highlighted by our datasets.
While our data offers various insights, it also has clear limitations. There are granularity issues with the demographic dataset, such as broad racial categories like "Asian," which encompasses many subgroups. The data also lacks demographic details like age, gender, and education level, which could be useful for further understanding the impact of gentrification on various demographics. We also found granularity issues with the median household income dataset, encountering margins of error, outliers, and incomplete data for certain zip codes. For the crime data set, we randomly sampled a smaller portion of data to work with, as the entire dataset was far too large to run data analysis on. Additionally, Zillow uses a custom metric, the typical value for homes in the 35th to 65th percentile range, which might not capture specific neighborhood variations or housing types. Correlations don't imply causation and attributing causes to trends needs more detailed data and analysis. Additionally, many of the data sets only go back a few years, and do not span a large period of time to accurately determine long-term effects of gentrification. External factors like policy changes, economic cycles, and unrelated migration may influence trends and are hard to account for. Our assumptions and biases are also critical to state, as they also shape our interpretation and analysis. Focusing on gentrification might cause us to miss other factors affecting demographic and economic changes, and our interpretation of terms like "gentrification" or "racial bias" might differ from others.
The datasets come from government databases, census reports, and other official sources. We aggregated data from official reports and visualizations, focusing on monitoring economic trends. While the datasets offer insights into race, economics, and urban development, we must acknowledge their limitations. Data completeness, causality challenges, external factors, and biases shape our analysis outcomes. Recognizing these limitations is essential to maintain the accuracy and integrity of our findings.