The Dataset contains raw data that indicates the start and stop time of water flowing at fixtures in the Marian Spencer Hall Cafeteria restroom during hours of operation. The data were collected as part of an effort to develop and test a novel method of measuring flow to calculate the probability that the fixture is busy (fixture p-value). The fixture p-value is one of the parameters necessary to predict peak demand in buildings for pipe sizing purposes.
There are two .csv files, a README file and a sample of the data collection template with contact information. The dataset also contains a MATLAB code written to accept data in the suggested format and estimate the fixture probability of use.
Aurek Chattopadhyay, Reagan Maddox, Glen Horton, Nan Niu, Ganesh Malla, Tanmay Bhowmik, Jianzhang Zhang, and Juha Savolainen, Completeness of Natural Language Requirements: A Comparative Study of User Stories and Feature Descriptions (submitted to REFSQ 2023: https://2023.refsq.org)
CSV files containing the coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one from) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / News [nws]
SearchTerm[s] = (one from) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
CSV files containing the coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one from) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / News [nws]
SearchTerm[s] = (one from) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
CSV files containing the coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one from) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / News [nws]
SearchTerm[s] = (one from) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
CSV files containing the coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one from) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / News [nws]
SearchTerm[s] = (one from) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
Text and Metadata for 14,399 newspaper articles.
Transcripts collected from Internet Archive
Date Range: 2010-2022
File includes meta/data:
- Unique-id (uid)
- Title (incl. search term)
- Date
- Link (url)
- Abstract
- Text
Text matching the following terms:
- space explor*
- space mission
- space science
- spaceship
- space tour*
- space transport*
- spacecraft
- space shuttle
- outer space
- astronom*
- astrop*
- astrona*
- planet
- NASA
- star trek
- star wars
- lunar
- space flight
Text and Metadata for 9,061 newspaper articles.
Newspapers included: New York Times, Wall Street Journal, & Washington Post
Date Range: 2017-2022
File includes meta/data:
- Unique-id (uid)
- Title (incl. source paper & section name)
- Date
- Link (url)
- Author
- Text
Text matching the following terms:
- space explor*
- space mission
- space science
- spaceship
- space tour*
- space transport*
- spacecraft
- space shuttle
- outer space
- astronom*
- astrop*
- astrona*
- planet
- NASA
- star trek
- star wars
- lunar
- space flight
CSV files containing the topic coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc]
SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
CSV files containing the topic coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one of) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc]
SearchTerm[s] = (one of) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row