CSV files containing the topic coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / Chicago Novel Corpus [nvl] / Newspaper Corpus [nws]
SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
CSV files containing the topic coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc]
SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
This file contains poet metadata for a dataset of poems collected from Poetry Foundation. This metadata relates to a poetry corpus collected from Kaggle available on that site: https://www.kaggle.com/datasets/ultrajack/modern-renaissance-poetry
Sean Ayres collected publication metadata and generated other relevant metadata fields based on his subject matter expertise.
The original Kaggle poetry dataset contains approximately X lines of poetry.
The metadata for this dataset references X poets
This file contains book metadata for a Project Gutenberg dataset of poetry metadata. This metadata relates to a poetry corpus collected by Allison Parrish available on GitHub: https://github.com/aparrish/gutenberg-poetry-corpus
Sean Ayres collected the publication metadata and generated other relevant metadata fields based on his subject matter expertise.
The poetry dataset contains approximately three million lines of poetry
The metadata for the dataset references X hundreds of books.
Belinda Reynolds (music), Ting Luo (piano and poetry), Charles Woodman (images)
WORDS is a multimedia piano work with spoken words and visuals. The vocal part consists of an aural collage of Miss Luo, reciting in Mandarin and English, a poem written by herself, about the story of her grandfather, a composer during the late 20th century in China. The interplay between the audio collage, the video, and the piano part, create a multimedia composition that immerses the audience in a blanket of enticing, reflective sound-visual experiences.