Index Catalog // Scholar@UC

Dataset thumbnail: Coherence_Evaluations - cas_random5k.csv

51. Coherence_Evaluations - cas_random5k.csv

Type:: Dataset
摘抄:: CSV files containing the topic coherence scoring pertaining to datasets of: DocumentCount = 5,000 Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus Coherence was scored across every combination of: TopicCount: 10-40 Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric] Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric] The columns in this file include: Validation_Set: Which search term this scoring pertains to Topics: Number of topics in the model Alpha: Hyperparameter alpha selection from the 6 options above Beta: Hyperparameter beta selection from the 6 options above Coherence: The topic coherence score for the given model-row Perplexity: The perplexity score for the given model-row
作者:: McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 08/10/2022
更改日期:: 08/10/2022
创建:: 2022
证书:: Open Data Commons Public Domain Dedication and License (PDDL)

Dataset thumbnail: Coherence_Evaluations - cas_environmental.csv

52. Coherence_Evaluations - cas_environmental.csv

Type:: Dataset
摘抄:: CSV files containing the topic coherence scoring pertaining to datasets of: DocumentCount = 5,000 Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus Coherence was scored across every combination of: TopicCount: 10-40 Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric] Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric] The columns in this file include: Validation_Set: Which search term this scoring pertains to Topics: Number of topics in the model Alpha: Hyperparameter alpha selection from the 6 options above Beta: Hyperparameter beta selection from the 6 options above Coherence: The topic coherence score for the given model-row Perplexity: The perplexity score for the given model-row
作者:: McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 08/10/2022
更改日期:: 08/10/2022
创建:: 2022
证书:: Open Data Commons Public Domain Dedication and License (PDDL)

Dataset thumbnail: Coherence_Evaluations - cas_pollution.csv

53. Coherence_Evaluations - cas_pollution.csv

Type:: Dataset
摘抄:: CSV files containing the topic coherence scoring pertaining to datasets of: DocumentCount = 5,000 Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus Coherence was scored across every combination of: TopicCount: 10-40 Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric] Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric] The columns in this file include: Validation_Set: Which search term this scoring pertains to Topics: Number of topics in the model Alpha: Hyperparameter alpha selection from the 6 options above Beta: Hyperparameter beta selection from the 6 options above Coherence: The topic coherence score for the given model-row Perplexity: The perplexity score for the given model-row
作者:: McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 08/10/2022
更改日期:: 08/10/2022
创建:: 2022
证书:: Open Data Commons Public Domain Dedication and License (PDDL)

Dataset thumbnail: Coherence_Evaluations - cas_climate.csv

54. Coherence_Evaluations - cas_climate.csv

Type:: Dataset
摘抄:: CSV files containing the topic coherence scoring pertaining to datasets of: DocumentCount = 5,000 Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / Chicago Novel Corpus [nvl] / Newspaper Corpus [nws] SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus Coherence was scored across every combination of: TopicCount: 10-40 Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric] Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric] The columns in this file include: Validation_Set: Which search term this scoring pertains to Topics: Number of topics in the model Alpha: Hyperparameter alpha selection from the 6 options above Beta: Hyperparameter beta selection from the 6 options above Coherence: The topic coherence score for the given model-row Perplexity: The perplexity score for the given model-row
作者:: McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 08/10/2022
更改日期:: 11/11/2022
创建:: 2022
证书:: Open Data Commons Attribution License (ODC-By)

Dataset thumbnail: Coherence_Evaluations - cas_earth.csv

55. Coherence_Evaluations - cas_earth.csv

Type:: Dataset
摘抄:: CSV files containing the topic coherence scoring pertaining to datasets of: DocumentCount = 5,000 Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus Coherence was scored across every combination of: TopicCount: 10-40 Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric] Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric] The columns in this file include: Validation_Set: Which search term this scoring pertains to Topics: Number of topics in the model Alpha: Hyperparameter alpha selection from the 6 options above Beta: Hyperparameter beta selection from the 6 options above Coherence: The topic coherence score for the given model-row Perplexity: The perplexity score for the given model-row
作者:: McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 08/10/2022
更改日期:: 08/10/2022
创建:: 2022
证书:: Open Data Commons Public Domain Dedication and License (PDDL)

Dataset thumbnail: Poetry Foundation Metadata

56. Poetry Foundation Metadata

Type:: Dataset
摘抄:: This file contains poet metadata for a dataset of poems collected from Poetry Foundation. This metadata relates to a poetry corpus collected from Kaggle available on that site: https://www.kaggle.com/datasets/ultrajack/modern-renaissance-poetry Sean Ayres collected publication metadata and generated other relevant metadata fields based on his subject matter expertise. The original Kaggle poetry dataset contains approximately X lines of poetry. The metadata for this dataset references X poets
作者:: Ayres, Sean and McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 07/11/2022
更改日期:: 07/11/2022
创建:: 2022
证书:: Open Data Commons Attribution License (ODC-By)

Dataset thumbnail: Project Gutenberg Poetry Metadata

57. Project Gutenberg Poetry Metadata

Type:: Dataset
摘抄:: This file contains book metadata for a Project Gutenberg dataset of poetry metadata. This metadata relates to a poetry corpus collected by Allison Parrish available on GitHub: https://github.com/aparrish/gutenberg-poetry-corpus Sean Ayres collected the publication metadata and generated other relevant metadata fields based on his subject matter expertise. The poetry dataset contains approximately three million lines of poetry The metadata for the dataset references X hundreds of books.
作者:: Ayres, Sean; McCabe, Erin E., and Parrish, Allison
提交者:: Erin E. McCabe
上传日期:: 07/11/2022
更改日期:: 07/11/2022
创建:: 2022
证书:: Open Data Commons Attribution License (ODC-By)

Dataset thumbnail: Vocabulary Count Feature Vectors for Medical School Classifier

58. Vocabulary Count Feature Vectors for Medical School Classifier

Type:: Dataset
摘抄:: Classifier algorithms use the features (collectively known as Feature Vectors) of each item in a dataset to assess the classification to which that item belongs. In this classifier approach, each item represents one document containing the application essay combined with unstructured language describing relevant activities of a single applicant. For privacy, the full text of this document is not provided. Instead, each document is represented only by its features. The feature vector for this classifier is based on the term frequency for each of the identified terms. E.G. Doc_A contains 0 occurrences of any terms identified as family medicine vocabulary, and 10 occurrences of terms from the the non-family-medicine vocabulary.
作者:: Boylan, Andrew and McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 05/14/2021
更改日期:: 05/14/2021
证书:: Open Data Commons Public Domain Dedication and License (PDDL)

Dataset thumbnail: Vocabulary Comparison of Medical School Applications

59. Vocabulary Comparison of Medical School Applications

Type:: Dataset
摘抄:: W2V takes terms from a large corpus of text and models them onto a vector space, based on word associations from your dataset. These Word Associations take into account each word's immediate context (its ten neighboring words). Following the data modeling (large-scale unstructured text), The platform then generates a visualization of this vector space, which lets us perform analysis e.g. detect synonymous/synonym-ish words and highlight related words. At the heart of this project, is W2V's ability to identify key words that were more frequent - and more unique - to each group using results from 2 different W2V models – one for each group's application texts. We coded these Key Terms into categories, then analyzed those categories for overarching themes.
作者:: McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 05/14/2021
更改日期:: 05/14/2021
证书:: Open Data Commons Public Domain Dedication and License (PDDL)

Dataset thumbnail: IRS Classification of Ohio Non-Profit Organizations

60. IRS Classification of Ohio Non-Profit Organizations

Type:: Dataset
摘抄:: Each row in this dataset depicts a single non-profit organization (NPO), labeled by their Employer Identification Number (EIN). Each row contains the National Taxonomy of Exempt Entities (NTEE) code assigned to each NPO by the IRS (if any) and the official Essential/Non-Essential status connected to that NTEE code.
作者:: Jones, Michael and McCabe, Erin E.
提交者:: Erin E. McCabe
上传日期:: 05/06/2021
更改日期:: 05/07/2021
创建:: 2020-11-01
证书:: Open Data Commons Public Domain Dedication and License (PDDL)

51. Coherence_Evaluations - cas_random5k.csv

52. Coherence_Evaluations - cas_environmental.csv

53. Coherence_Evaluations - cas_pollution.csv

54. Coherence_Evaluations - cas_climate.csv

55. Coherence_Evaluations - cas_earth.csv

56. Poetry Foundation Metadata

57. Project Gutenberg Poetry Metadata

58. Vocabulary Count Feature Vectors for Medical School Classifier

59. Vocabulary Comparison of Medical School Applications

60. IRS Classification of Ohio Non-Profit Organizations

限定搜索

工作类型

创建者

学科

学

语言

出版者

创建日期

集合

搜索条件

搜索结果

51. Coherence_Evaluations - cas_random5k.csv

52. Coherence_Evaluations - cas_environmental.csv

53. Coherence_Evaluations - cas_pollution.csv

54. Coherence_Evaluations - cas_climate.csv

55. Coherence_Evaluations - cas_earth.csv

56. Poetry Foundation Metadata

57. Project Gutenberg Poetry Metadata

58. Vocabulary Count Feature Vectors for Medical School Classifier

59. Vocabulary Comparison of Medical School Applications

60. IRS Classification of Ohio Non-Profit Organizations

限定搜索