The Dataset contains raw data that indicates the start and stop time of water flowing at fixtures in the Marian Spencer Hall Cafeteria restroom during hours of operation. The data were collected as part of an effort to develop and test a novel method of measuring flow to calculate the probability that the fixture is busy (fixture p-value). The fixture p-value is one of the parameters necessary to predict peak demand in buildings for pipe sizing purposes.
There are two .csv files, a README file and a sample of the data collection template with contact information. The dataset also contains a MATLAB code written to accept data in the suggested format and estimate the fixture probability of use.
Aurek Chattopadhyay, Reagan Maddox, Glen Horton, Nan Niu, Ganesh Malla, Tanmay Bhowmik, Jianzhang Zhang, and Juha Savolainen, Completeness of Natural Language Requirements: A Comparative Study of User Stories and Feature Descriptions (submitted to REFSQ 2023: https://2023.refsq.org)
CSV files containing the coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one from) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / Chicago Novel Corpus [nvl] / Newspaper Corpus [nws]
SearchTerm[s] = (one from) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
CSV files containing the coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one from) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / News [nws]
SearchTerm[s] = (one from) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
CSV files containing the coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one from) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / News [nws]
SearchTerm[s] = (one from) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row
CSV files containing the coherence scoring pertaining to datasets of:
DocumentCount = 5,000
Corpus = (one from) Federal Caselaw [cas] / Pubmed-Abstracts [pma] / Pubmed-Central [pmc] / News [nws]
SearchTerm[s] = (one from) Earth / Environmental / Climate / Pollution / Random 5k documents of a specific corpus
Coherence was scored across every combination of:
TopicCount: 10-40
Hyperparameter-Alpha: [0.01, 0.31, 0.61, 0.91, symmetric, asymmetric]
Hyperparameter-Beta: [0.01, 0.31, 0.61, 0.91, automatic, symmetric]
The columns in this file include:
Validation_Set: Which search term this scoring pertains to
Topics: Number of topics in the model
Alpha: Hyperparameter alpha selection from the 6 options above
Beta: Hyperparameter beta selection from the 6 options above
Coherence: The topic coherence score for the given model-row
Perplexity: The perplexity score for the given model-row