Prediction Results

As the final part of poetic’s main workflow, the post-processing of prediction results consists of diagnostics, summary, and file output. The package’s Diagnostics class provides all these functionalities with a few simple methods, which are documented below.


Predictions Class

Both predict() and predict_file() methods of the Predictor class returns an instance of the Predictions class:

import poetic

pred = poetic.Predictor()
score = pred.predict("Is this poetic?")

In the example above, the score object will be a Predictions object, which can then call methods to run diagnostics and save results.

Inheritance

The Predictions class inherits from the Diagnostics class, and all methods are also inherited with the only difference in the constructor. The advantage of using an inherited class instead of using the Diagnostics class directly is that the preprocessing of keras predictions can occur separately. Thus, the Predictions class serves as an internal interface to distinguish from manually instantiated instances of the Diagnostics class.

To use the toolchain and methods separately, use the Diagnostics class instead. All methods of the Predictons will be documented with the Diagnostics class unless they are overridden.


Diagnostics Class

As the base class for Predictions, the Diagnostics class provides a more genralized framework for working with any prediction results. In the future, more abstractions may be added to allow for more versatility to use independently.

A typical workflow will involve making predictions, running diagnostics, and saving the results to a file:

import poetic

pred = poetic.Predictor()
score = pred.predict("Is this poetic?")
score.run_diagnostics()

print(score.generate_report())
score.to_file(path="<PATH>")

Instantiation

To use the Diagnostics class, only the predictions argument is required as a list of floats:

import poetic

results = poetic.Diagnostics(predictions = [1, 0, 0.5])

# OR: with sentences
sentences = ["Hi.", "I am poetic", "How about you?"]
results_sentences = poetic.Diagnostics(predictions = [1, 0, 0.5], sentences=sentences)

The sentences argument is optional. If used, it will store the corresponding sentences of the predictions as a class attribute; otherwise, it will be None, and all other methods are largely unaffected, except the contents of the outputs.

Diagnostic Statistics

As of now, the Diagnostics class supports five-number summary for predictions. As part of the workflow, it is automatically called by the run_diagnostics() method, and the results are stored in the diagnostics attribute of the object. As an example:

import poetic

results = poetic.Diagnostics(predictions = [1, 0, 0.5])
results.run_diagnostics()
# Get the diagnostic results
print(results.diagnostics)

The diagnostics attribute is a dictionary with three keywords: “Sentence_count”, “Five_num”, and “Predictions”. The corresponding values are the following:

  • “Sentence_count”: An int of the length of entries.

  • “Five_num”: Five number summary stored with a dictionary.

  • “Predictions”: A list of floats from the predictions attribute.

To obtain the five number summary separately using the classmethod five_number(), which is essentially a utility function that can be use for any array-like objects compatible with numpy:

import poetic

results = poetic.Diagnostics(predictions = [1, 0, 0.5])
poetic.Diagnostic.five_number(results.predictions)

# As a stand-alone method:
poetic.Diagnostic.five_number([1, 0, 0.5])

Diagnostic Report

A diagnostic report is a string (or plain text) summary of the object with diagnostic statistics. To obtain the diagnostic report, the run_diagnostics() method has to be called previously on the object. Otherwise, a type error will be raised because the “diagnostics” attribute will be None.

An example usgae of the method is this:

import poetic

results = poetic.Diagnostics(predictions = [1, 0, 0.5])
results.run_diagnostics()
print(results.generate_diagnostics())

The contents of the report will be identical to the text file output as documented below.

File Output

The results and diagnostics can be saved to either .txt or .csv file. The former writes the diagnostics report to a plain text while the latter saves the actual values separated by comma. The usage is essentially identical to using the -o option on the command line.

Plain Text File

To save results to a text file:

import poetic

results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_file("<PATH>")

The output format is the following:

Poetic
Version: 1.0.2
For latest updates: www.github.com/kevin931/Poetic

Diagnostics Report

Model: Lexical Model
Number of Sentences: 2

~~~Five Number Summary~~~
Minimum: 0.6363636363636364
Mean: 0.6515151515151515
Median: 0.6515151515151515
Maximum: 0.6666666666666666
Standard Deviation: 0.015151515151515138

~~~All Scores~~~
Sentence #1: 0.6666666666666666
Sentence #2: 0.6363636363636364

The to_file() does not enforce file extension for text files, except for .csv ending. In the latter case, it will automatically call the to_csv() method. The text file output is more for a quick summary than a way to store data, and the format can potentially change with updates. If an object needs to be restored or data will be further processed, use the csv format instead.

.csv File Format

When a file path ending in .csv is encountered or the to_csv() method of the Diagnostics class is explicitly called, the results will be formatted with three columns separated with Sentence_num, Sentence, and Score as keywords in the first row. Each sentence and its prediction is in a new row, which follows the Tidy Data format for optimal compatibility. The Sentence_num column can be treated as the index.

To save to a csv file as an example:

import poetic

results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_csv("<PATH>")

Or, let to_file() handle it automatically:

import poetic

results = poetic.Diagnostics(predictions = [2/3, 7/11])
results.run_diagnostics()
results.to_file("<PATH>.csv")

The raw csv file looks like the following:

Sentence_num,Sentence,Score
1,Hi.,0.6666666666666666
2,This is poetic.,0.6363636363636364

If formated to a table (like if opened in excel or the like), this will be the result:

Sentence_num

Sentence

Score

1

Hi.

0.6666666666666666

2

This is poetic.

0.6363636363636364

Custom Build-in (Magic) Methods

String Representation

The str() method will return a short summary of the object. It will truncate the output to 14 characters after the description:

'Diagnostics object for the following predictions: [0.66666666666...'

The repr() method will return a dictionary cast into a string with all the predictions, sentences, and the diagnostics attributes. It does not truncate any results. This will be more appropraite for a full representation of the object. It will have the following format:

"{'Predictions': [0.6666666666666666, 0.6363636363636364], 'Sentences': None, 'Diagnostics': None}"

len()

The len() method returns the length of the predictions attribute of the object, which is the number of entries in the predictions list. Since the length of predictions and sentences are intended to match, the returned length logically represents the length of the object.