W2V takes terms from a large corpus of text and models them onto a vector space, based on word associations from your dataset. These Word Associations take into account each word's immediate context (its ten neighboring words).
Following the data modeling (large-scale unstructured text), The platform then generates a visualization of this vector space, which lets us perform analysis e.g. detect synonymous/synonym-ish words and highlight related words. At the heart of this project, is W2V's ability to identify key words that were more frequent - and more unique - to each group using results from 2 different W2V models – one for each group's application texts.
We coded these Key Terms into categories, then analyzed those categories for overarching themes.
Empathy is an essential aspect for all facets of an educational system, specifically in art classrooms. Lack of empathy and collaborative skills can cause issues in relations and classroom management throughout the school year. Utilizing resources such as group lessons, community activities, and classroom reflections can allow for the kind of collaboration needed for students, teachers, and all those involved in a school system to succeed in their endeavors.
The education field has historically underrepresented teachers based on gender, race, and sexual orientation. This has led to a severe lack of diversity in the field. By utilizing professional workshops, educators will have greater awareness of barriers that have caused such underrepresentation. Educators will be given tools to reflect on how this facet of the educational system can be improved.
Each row in this dataset depicts a single non-profit organization (NPO), labeled by their Employer Identification Number (EIN).
Each row contains the National Taxonomy of Exempt Entities (NTEE) code assigned to each NPO by the IRS (if any) and the official Essential/Non-Essential status connected to that NTEE code.
Each row of this dataset depicts a single Ohio-based non-profit organization (NPO) (identified by Employer Identification Number) and a hand-coded determination of their 'essential' status.
This determination of essential status is guided by the official IRS definition and based strictly on the NPO's own mission statement and activities language supplied in their 2019 tax form.
This CSV file contains the topic distribution of each EIN as uncovered using six parallel Latent Dirichlet Allocation (LDA) Topic Models.
Each row depicts a topic and topic-score associated with an Ohio NPO (identified by Employer Identification Number) generated from one model run.
The sum of topic scores possible for every row associated with an EIN therefore will not exceed 6.0 (6 models x 100%)
Topic scores below .01 (1%) are not included.
Each topic from the models is further identified as Essential/Non-Essential by subject matter expert, Dr. Michael Jones, guided by the official IRS definition.
The topic models are generated on unstructured text language from the mission statement and activities language taken from the 2019 tax forms of Ohio non-profit organizations.