Prediction Results¶

As the final part of poetic’s main workflow, the post-processing of prediction results consists of diagnostics, summary, and file output. The package’s Diagnostics class provides all these functionalities with a few simple methods, which are documented below.

Predictions Class¶

Both predict() and predict_file() methods of the Predictor class returns an instance of the Predictions class:

import poetic

pred = poetic.Predictor()
score = pred.predict("Is this poetic?")

In the example above, the score object will be a Predictions object, which can then call methods to run diagnostics and save results.

Inheritance¶

The Predictions class inherits from the Diagnostics class, and all methods are also inherited with the only difference in the constructor. The advantage of using an inherited class instead of using the Diagnostics class directly is that the preprocessing of keras predictions can occur separately. Thus, the Predictions class serves as an internal interface to distinguish from manually instantiated instances of the Diagnostics class.

To use the toolchain and methods separately, use the Diagnostics class instead. All methods of the Predictons will be documented with the Diagnostics class unless they are overridden.

Diagnostics Class¶

As the base class for Predictions, the Diagnostics class provides a more genralized framework for working with any prediction results. In the future, more abstractions may be added to allow for more versatility to use independently.

A typical workflow will involve making predictions, running diagnostics, and saving the results to a file:

import poetic

pred = poetic.Predictor()
score = pred.predict("Is this poetic?")
score.run_diagnostics()

print(score.generate_report())
score.to_file(path="<PATH>")

Instantiation¶

To use the Diagnostics class, only the predictions argument is required as a list of floats:

import poetic

results = poetic.Diagnostics(predictions = [1, 0, 0.5])

# OR: with sentences
sentences = ["Hi.", "I am poetic", "How about you?"]
results_sentences = poetic.Diagnostics(predictions = [1, 0, 0.5], sentences=sentences)

The sentences argument is optional. If used, it will store the corresponding sentences of the predictions as a class attribute; otherwise, it will be None, and all other methods are largely unaffected, except the contents of the outputs.

Diagnostic Statistics¶

As of now, the Diagnostics class supports five-number summary for predictions. As part of the workflow, it is automatically called by the run_diagnostics() method, and the results are stored in the diagnostics attribute of the object. As an example:

import poetic

results = poetic.Diagnostics(predictions = [1, 0, 0.5])
results.run_diagnostics()
# Get the diagnostic results
print(results.diagnostics)

The diagnostics attribute is a dictionary with three keywords: “Sentence_count”, “Five_num”, and “Predictions”. The corresponding values are the following:

“Sentence_count”: An int of the length of entries.

“Five_num”: Five number summary stored with a dictionary.

“Predictions”: A list of floats from the predictions attribute.

To obtain the five number summary separately using the classmethod five_number(), which is essentially a utility function that can be use for any array-like objects compatible with numpy:

import poetic

results = poetic.Diagnostics(predictions = [1, 0, 0.5])
poetic.Diagnostic.five_number(results.predictions)

# As a stand-alone method:
poetic.Diagnostic.five_number([1, 0, 0.5])

Diagnostic Report¶

A diagnostic report is a string (or plain text) summary of the object with diagnostic statistics. To obtain the diagnostic report, the run_diagnostics() method has to be called previously on the object. Otherwise, a type error will be raised because the “diagnostics” attribute will be None.

An example usgae of the method is this:

import poetic

results = poetic.Diagnostics(predictions = [1, 0, 0.5])
results.run_diagnostics()
print(results.generate_diagnostics())

The contents of the report will be identical to the text file output as documented below.

File Output¶

The results and diagnostics can be saved to either .txt or .csv file. The former writes the diagnostics report to a plain text while the latter saves the actual values separated by comma. The usage is essentially identical to using the -o option on the command line.

Plain Text File¶

To save results to a text file:

import poetic

results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_file("<PATH>")

The output format is the following:

Poetic
Version: 1.0.2
For latest updates: www.github.com/kevin931/Poetic

Diagnostics Report

Model: Lexical Model
Number of Sentences: 2

~~~Five Number Summary~~~
Minimum: 0.6363636363636364
Mean: 0.6515151515151515
Median: 0.6515151515151515
Maximum: 0.6666666666666666
Standard Deviation: 0.015151515151515138

~~~All Scores~~~
Sentence #1: 0.6666666666666666
Sentence #2: 0.6363636363636364

The to_file() does not enforce file extension for text files, except for .csv ending. In the latter case, it will automatically call the to_csv() method. The text file output is more for a quick summary than a way to store data, and the format can potentially change with updates. If an object needs to be restored or data will be further processed, use the csv format instead.

.csv File Format¶

When a file path ending in .csv is encountered or the to_csv() method of the Diagnostics class is explicitly called, the results will be formatted with three columns separated with Sentence_num, Sentence, and Score as keywords in the first row. Each sentence and its prediction is in a new row, which follows the Tidy Data format for optimal compatibility. The Sentence_num column can be treated as the index.

To save to a csv file as an example:

import poetic

results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_csv("<PATH>")

Or, let to_file() handle it automatically:

import poetic

results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_file("<PATH>.csv")

The raw csv file looks like the following:

Sentence_num,Sentence,Score
1,Hi.,0.6666666666666666
2,This is poetic.,0.6363636363636364

If formated to a table (like if opened in excel or the like), this will be the result:

Sentence_num	Sentence	Score
1	Hi.	0.6666666666666666
2	This is poetic.	0.6363636363636364

Custom Build-in (Magic) Methods¶

String Representation¶

The str() method will return a short summary of the object. It will truncate the output to 14 characters after the description:

'Diagnostics object for the following predictions: [0.66666666666...'

The repr() method will return a dictionary cast into a string with all the predictions, sentences, and the diagnostics attributes. It does not truncate any results. This will be more appropraite for a full representation of the object. It will have the following format:

"{'Predictions': [0.6666666666666666, 0.6363636363636364], 'Sentences': None, 'Diagnostics': None}"

len()¶

The len() method returns the length of the predictions attribute of the object, which is the number of entries in the predictions list. Since the length of predictions and sentences are intended to match, the returned length logically represents the length of the object.

Comparison Operators¶

The Diagnostics class currently supports the following four operators: >, >=, <, and <=. They compare the mean values of the predictions attribute of the compared onjects.

When the distribution of the predictions are not normally distributed, such as skewed, the mean values may not be meaningful. In these cases, manual comparions are necessary.

Given that the predictions attribute is a list of float, the == and != operators are currently not implemented.

Concatenation¶

Both + and += operators are supported to concatenate two Diagnostics objects. The implementation of concatenation is to concatenate each attribute when applicable: the Diagnostics.predictions attributes are concatenated directly since they are mandetory for initialization; the Diagnostics.sentences and Diagnostics.diagnostics methods have different behaviors depending whether they are None for each object. Mathetical addition is undefined and nonsensical for either predictions or diagnostics. Therefore, both operators will concatenate objects, which will be useful for making multiple predictions and subsequently analyzing them together.

When Diagnostics.sentences is None for both objects, the resulting will also be None. When both objects have sentences as their attributes, the sentences will simply be concatenated as lists. When one object has a list of sentences and the other object’s sentences attribute is None, the resulting object will have a list as its Diagnostics.sentences with either the sentences themselves or None corresponding to each entry of Diagnostics.predictions.

The behavior for Diagnostics.diagnostics also depends on each object. When both are None, the run_diagnostics() method will not be called on the resulting object. Otherwise, the method will automatically call run_diagnostics() to update the results. The latter strategy will help avoid a situation in which the diagnostics and predictions are mismatched, leading to unintentionally wrong diagnostics.

The + operator returns a new onject, which means that it is copy-safe for existing objects. The += operator modifies the left-hand-side object as intended.