wiki:Public/WhitePaperChatForOne

Version 32 (modified by Boris Horner, 19 months ago) ( diff )

Chat for one

A white paper about experiences with ChatGPT

Introduction

Over the last months, I spent some time chatting with ChatGPT. The first chats were rather uncertain and probing, getting an idea of the AI's capabilities of understanding and generating text. With more and more tasks the AI successfully performed, I became more daring, posing tasks where I didn't really expect the AI to return a meaningful response. In not so few cases, I was surprised. In some cases, I don't have a solid theory how ChatGPT did this.

It's natural, that many of the tests focused on the question of how AI could help technical writers, translators, project managers, service technicians and other stakeholders in the field of Technical Documentation. But likeways, many other tests touched other areas, like music, games, cooking or retrieval of information of any kind. In software development, it proved extremely helpful, because it can not only describe how to proceed, but it can also generate code that tries to solve the problem - often, it does so quite intelligently. And when I mentioned music, it was helpful in songwriting by finding words that rhymed with a given ending, or by finding the chord for any given set of notes - so far, I expected it could do such things. However, I brought it to write new verses based on existing ones, tuning in to the mood of the song, and it was even aware of rhythm in relation with the lyrics. Amazing!

Finally, I asked it to lead a simple pen and paper role playing session - first it seemed stubborn and only explained what pen&paper RPG is, but then it picked up the task. It seemed to have some trouble to remember how many gold pieces my character owned and what equipment in detail he has in a certain situation, but the adventure itself is quite consistent so far, and the interactions with other characters are quite realistic, they are consistent and each has a recognizable personality.

All of the dialogs are based on the standard chat interface of ChatGPT that's available to everyone. To iron out the glitches in the performance of the AI models, the API based integration offers more possibilities. Knowledge can be injected in a structured form instead of exposing it verbally in a chat. While problems remain to be solved, including the one what happens with data when you type it into the chat or upload it through the API, the powers of ChatGPT's NLP model are fascinating and impressing in many different ways.

I can encourage you to make your own tests, and remember, this is only the beginning…

Structuring confusing text

Hello, I have a text here, it's a guide, but it's poorly structured. Can you please improve it?

Yes, I can definitely help you with that. Please share the text with me and I'll see what I can do.

Here it is:

"Remove the cover plate after loosening all 4 screws on the sides. Then pull the locking bar forward and fold the insulation layer back. Before starting this work, unplug the device. The device operates on 230 volts and has a power consumption of 1000 watts. To replace the fuse, you must first remove the protective cap. Once the protective cap has been removed, you can replace the fuse. Pull out the fuse, insert the new fuse, put the protective cap back on, and then fold the insulation layer back down. Finally, tighten the screws on the cover plate. Place the cover plate on before doing so."

Here's what you should do:

  • Segment it into simple and clear steps.
  • Order the steps correctly.
  • Extract technical data to a table.

Here's an improved version of the guide:

Step-by-Step Guide:

Step 1: Unplug the device
Before beginning any work, unplug the device from the power outlet to avoid electrical hazards.

Step 2: Remove the cover plate
Loosen all 4 screws on the sides of the device.
Remove the cover plate.

Step 3: Pull the locking bar and open the insulation layer
Pull the locking bar forward.
Fold the insulation layer back.

Step 4: Replace the fuse
Remove the protective cap.
Take out the old fuse.
Insert the new fuse.
Put the protective cap back on.

Step 5: Close the insulation layer
Fold the insulation layer back down.

Step 6: Secure the cover plate
Place the cover plate back on.
Tighten the screws on the sides of the device.

Technical Data:

Voltage Power Consumption
230 volts 1000 W

So, ChatGPT seems to be able to improve the quality of instructions by ordering them, applying some rules (like writing in clear and simple language), removing redundancy and even adding the information why it makes sense to unplug the device before opening it.

So let's see what else it can do.

DITA

Knowing that ChatGPT was able to extract information from an unstructured text and to transform it into a step-by-step procedure with warnings, I tried whether it was able to express this in DITA. I knew ChatGPT was able to create program code, or fixe code that does not work correctly. DITA is nothing entirely diffent. So I tried.

I started out with simple cooking recipes. First, I retrieved instructions for pancakes. When I was satisfied with the recipe itself, I asked ChatGPT to create a DITA topic. It created a syntactically and grammatically correct DITA topic, but it was a topic, not a task. So I asked it to create a task. At first, it was invalid, but I was able to teach it how to perform the task correctly and ended up with a valid DITA task.

In the next iteration, I asked it to place the ingredients as prereq and the warnings as hazardstatement. At first, the hazardstatement was wrongly structured, but I was also able to teach it how to write correct warning messages.

To keep ChatGPT from just repeating what I tought it, I asked it to apply what it learned to a different task, resetting a Google Pixel 7 phone to factory settings. Here's the result:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE task PUBLIC "-//OASIS//DTD DITA Task//EN" "task.dtd">
<task id="factory-reset-pixel-7">
  <title>Performing a Factory Reset on Google Pixel 7</title>
  <taskbody>
    <prereq>Before starting, make sure your Google Pixel 7 is fully charged and backed up. Performing a factory reset will erase all data on your phone.</prereq>
    <steps>
      <step>
        <hazardstatement type="warning">
          <messagepanel>
            <typeofhazard>Data loss.</typeofhazard>
            <consequence>Performing a factory reset will erase all data on your phone.</consequence>
            <howtoavoid>Make sure you have backed up all important data before proceeding.</howtoavoid>
          </messagepanel>
        </hazardstatement>
        <cmd>Go to the Settings app on your phone.</cmd>
      </step>
      <step>
        <cmd>Scroll down and tap on System.</cmd>
      </step>
      <step>
        <cmd>Tap on Advanced.</cmd>
      </step>
      <step>
        <cmd>Tap on Reset options.</cmd>
      </step>
      <step>
        <cmd>Tap on Erase all data (factory reset).</cmd>
      </step>
      <step>
        <cmd>Follow the on-screen instructions to complete the reset process.</cmd>
      </step>
    </steps>
  </taskbody>
</task>

A similar task of explaining how to remove old paint using NaOH solution can be found here.

Terminology correction and extraction

Here's the text to be corrected:
The sintering device serves to vitrify material that is transported via the conveyor belt. The sintering plant is directly connected to the conveying device so that material from the previous processing station is vitrified without delay. When loading the material into the plant, it should be noted that the strands are not inserted twisted into the sintering chamber. Twisted strands can lead to the deactivation of the sintering heating system and render the material unusable.

Here are the rules:
Term: sintering device
Forbidden: sintering plant, sintering chamber, sintering heating system

Term: conveyor belt
Forbidden: conveying device

The sintering device is used to vitrify material that is transported via the conveyor belt. The device is directly connected to the conveyor belt, allowing material from the previous processing station to be vitrified without delay. When loading the material into the device, it should be noted that the strands should not be introduced twisted into the sintering chamber. Twisted strands can cause the deactivation of the sintering heating device and render the material unusable.

No, you should write "sintering device" instead of "sintering heating device", since the former is the term to be used according to the rules.

You are correct. I apologize for my mistake. Here's the corrected text:

The sintering device is used to vitrify material that is transported via the conveyor belt. The device is directly connected to the conveyor belt, allowing material from the previous processing station to be vitrified without delay. When loading the material into the device, it should be noted that the strands should not be introduced twisted into the sintering chamber. Twisted strands can cause the deactivation of the sintering device and render the material unusable.

It basically performed quite well, but not perfectly well. This is quite characteristic for using the chat. It tends to forget things that were mentioned a while ago and they need to be brought back into consideration by repeating them. When using the API, rule sets can be uploaded which makes the process more reliable. I tested the same text in German before I tried English, and it performed significantly better. I assume, this is due to the way nouns can be composed in German. Apparently, this grammatical property of the German language makes it easier for ChatGPT to recognize terms (and then replace them).

Tests with the extraction of technical terms (and then listing them in alphabetical order, grouped by nouns, verbs and adjectives) were successful in the sense that ChatGPT extracted many words it considered technical terms and sorted them correctly into the categories. However, it seemed to have a rather diffuse "understanding" of what a technical term is. Another problem that can be solved through the API by giving it many examples of text and the contained technical terms to train it.

Other markup

Encouraged by these DITA results, I asked it to compile an HTML document with a table of CCMS products I gave it as input. I asked for a column that shows whether the software product fully supports DITA. It did not retrieve the information correctly in all cases, it was too optimistic. But that's only natural, it uses the info sources it has, and among them are surely websites where software vendors claim their products provide full DITA support while they don't.

Again, if the system has access to solid data through the API, it should return reliable results. But I find the HTML and CSS very adequate. And if you want something changed, just ask for it.

<!DOCTYPE html>
<html>
<head>
        <meta charset="UTF-8">
        <title>CCMS DITA Support</title>
        <style>
                table {
                        border-collapse: collapse;
                        width: 100%;
                }
                th, td {
                        padding: 10px;
                        text-align: left;
                        border-bottom: 1px solid #ddd;
                }
                th {
                        background-color: #f2f2f2;
                        font-weight: bold;
                        color: #555;
                        text-transform: uppercase;
                        letter-spacing: 0.1em;
                }
                td:first-child {
                        font-weight: bold;
                        color: #333;
                }
                .yes {
                        background-color: #b3e6b3;
                        color: #008000;
                        font-weight: bold;
                }
                .limited {
                        background-color: #ffe6b3;
                        color: #ff9900;
                        font-weight: bold;
                }
                .no {
                        background-color: #ffc2b3;
                        color: #ff0000;
                        font-weight: bold;
                }
        </style>
</head>
<body>
        <h1>CCMS DITA Support</h1>
        <table>
                <tr>
                        <th>Product</th>
                        <th>DITA Support</th>
                </tr>
                <tr>
                        <td>One product</td>
                        <td class="yes">Yes</td>
                </tr>
                <tr>
                        <td>Another product</td>
                        <td class="yes">Yes</td>
                </tr>
                <tr>
                        <td>Yet another</td>
                        <td class="no">No</td>
                </tr>
                <tr>
                        <td>Cinnamon CMS</td>
                        <td class="yes">Yes</td>
                </tr>
        </table>
</body>
</html>

ChatGPT can apparently generate many types of files that can be expressed as text. Since SVG is basically XML, I tried whether ChatGPT can draw. I asked for a diagram with a few boxes with the acronyms "ERP", "CCMS" and "PIM", linked up with lines between them. Here's the result, after some iterations to improve the layout (center text, adjust box and text sizes, distances etc.). Again, this is something that can be done better with the API by providing static styling rules. SVG diagram created by AI

I admit, this looks somehow impressing, but not like a killer app. Drawing this image with InkScape would probably have taken less time than explaining it.

A more realistic use case in real life could be to upload an SVG from a supplier through the API, apply a set of style guide rules (for example, unify line widths, exchange logos and apply CI colors), then extract all the strings. These would then be translated the traditional way, using a Translation Memory, and finally re-injected into the SVG. Another approach for translation could be to let the AI translate, while applying a defined terminology, and, as a quality check, translate back to the source language. There shouldn't be any differences, as there wouldn't be in a TM based approach. Letting the AI do this, until the back-translation is identical, and learning by trial and error would be the approach closest to how AIs work.

Using and developing software

ChatGPT is a very helpful companion in software development. I was recently porting a web application to .net Core MVC. I compared solving some of the problems "by hand", using the documentation and forum entries a search engine retrieved, with asking ChatGPT for help. In many cases, the latter is much more efficient, particularly, when the field you are trying to get acquainted with is rather unknown so it's even difficult to find the right search terms.

With AI support, it's possible to just explain the problem.

Still, it won't work writing a program without any programming skills at all. Solutions from ChatGPT are often very helpful and illustrate the way how to do it, but often, they are not totally perfect. Without any knowledge, a ChatGPT user wouldn't get this to run, but an experienced programmer without knowledge about the target system could resolve the remaining issues by him/herself, or by asking the AI again.

But also for users it's very helpful. I wrote a LISP script for the DTP program Interleaf (a.k.a. Quicksilver) some 20 years ago. Recently, I needed to test it on some data, but I couldn't remember how to run the script. I asked ChatGPT and it replied:

To apply the Lisp plugin to a document, you first need to load it into Quicksilver. Here's how to do it:

  1. Click on "Tools" in the Quicksilver menu bar.
  2. Select "Load Lisp" from the dropdown menu.
  3. Choose the .lsp file that you have saved on your desktop.
  4. The plugin is now loaded into Quicksilver.

Once the plugin is loaded, you can apply it to a document by following these steps:

  1. Open the document that you want to apply the plugin to.
  2. Click on "Tools" in the Quicksilver menu bar.
  3. Select the plugin from the list of loaded Lisp plugins.
  4. The plugin is now applied to the document.

Depending on what the plugin does, you may need to take additional steps to achieve the desired result. If you need further assistance, you can also reach out to the Quicksilver community or support.

The outdated version I was using behaved a bit differently, but the task description was close enough I got it to work. Getting help on nearly anything with extremely low requirements of a-priori knowledge can be very relieving and saves a lot of time.

Music

Having gathered some insight into what NLP AIs can do for technical writers, software users and software developers, I drew the circle a bit wider. In my spare time, I write and produce songs with a Digital Audio Workstation. For a song I was just working on, I needed some help with the lyrics. The chat was rather long, so I will summarize the key steps.

First I needed a word that rhymed with shines. ChatGPT suggested:

  • Lines
  • Spines
  • Divines
  • Combines
  • Designs
  • Declines
  • Refines
  • Undermines
  • Outshines
  • Realigns

I found a good line that ends with lines.

Next, I wanted some help for the first two lines of the second verse of the lyrics. The music for the verse is already more or less defined, so I had a certain rhythm in mind. The first two lines of the first verse were:

I anticipated a future that’s bright and tender
And we negotiated, but it feels like a false surrender

So I asked for suggestions. The result was:

The world is spinning faster, I'm losing my grip
I'm reaching for the future, but I'm slipping
The walls are closing in, I'm suffocating
I'm praying for a miracle, but it's not forthcoming

Well - I told it, that this doesn't quite capture the spirit of the song. These lines sound desperate, but the song is not meant to be so at all. Perhaps a little serene, at best…

And what's even worse, the lines didn't match the rhythm at all. So I explained:

Also, the verse doesn't quite fit the intended rhythm of the verse…
The first two lines of my verse can be considered to be split behind "anticipated" and "negotiated", like:
I anticipated
a future that’s bright and tender
And we negotiated,
but it feels like a false surrender

Each of these lines is two bars long (4/4, BTW)
The 1st line "I-an-ti-ci-pa-ted" is sung on the 3-4-1-2-3-4 of the first two bars. 3rd line equivalent.

Can you please try again?

Sure, based on the additional information you provided, here are some revised lyrics that might work better:

Verse
I'm walking through the shadows, in search of something true
The light is always shining, just out of my view
I'm wandering in circles, searching for a sign
The road is long and winding, but I know I'll be just fine

Each line of this verse is two bars long, with a similar rhythm to the first two lines of your existing verse. The first line is sung on the 1-2-3-4 of the first two bars, the second line is sung on the 3-4-1-2-3-4 of the first two bars, and so on.

I hope this helps! Let me know if you need any further assistance.

So, isn't that impressing? I expected it to be able to remove the unwanted tristesse from the lyrics, but the rhythmic instructions were rather a test I assumed to fail. It might be difficult to judge if you don't know the music that should go with it, but be assured, the rhythm works.

Games

Another hobby of mine is playing pen & paper role playing games (RPG), and perhaps even more off-topic than music. But just as impressing.

For those of you who haven't heard of this yet:
it's a game category you play with a group of perhaps 3-5 people. One is the so-called Game Master or GM. The GM designs an adventure the other players play. There is a set of rules of the particular game, and the game plays in a certain scenario, for example, a medieval fantasy world. The GM informs the player of the setting and the players tell the GM what they are about to do.

Or, in the words of ChatGPT:

A fantasy role-playing game, also known as a pen-and-paper RPG, is a game where players create characters in a fictional world and embark on adventures together. The game is typically played with a group of people and a game master, who acts as the storyteller and referee. Players use their imaginations to describe their character's actions and interactions with the world around them. The game typically involves rolling dice to determine the outcome of actions, such as combat or skill checks. The game world can be richly detailed, with unique creatures, places, and characters that players can interact with and explore.

So, I told ChatGPT, I wanted to play AD&D (a common RPG), and I wanted it to be the GM. It took a bit of time to convince the AI to actually play with me and not explain the concept of an RPG. But eventually, it started playing.

I will spare you the rather technical beginning of the chat - we took some time to define the character I was playing: Aldric, a human priest of Odin. There was also a group of other characters I was playing with, all these were played by the AI:

  • A human fighter named Durin, wielding a sword and shield, and wearing plate mail armor.
  • A half-elf rogue named Lyra, skilled in lockpicking and sneaking, and carrying a shortbow and a quiver of arrows.
  • A dwarf wizard named Balfur, specializing in earth and fire spells, and carrying a staff and a spellbook.
  • A human paladin named Gareth, devoted to his deity and sworn to protect the innocent, wielding a longsword and wearing chain mail armor.

The opening words of ChatGPT were:

You find yourselves in the town of Greenhaven, a small settlement located on the edge of a vast forest. It's early morning, and the air is crisp and cool. You've been traveling for days, and you're weary from the journey. As you walk down the main road, you notice a crowd gathered near the town square. You see an old man wearing a tattered robe and a worried expression on his face. He appears to be addressing the crowd, but his words are drowned out by the murmur of the people. What do you do?

After talking to the man for a while, we were attacked by a group of thugs:

As the group of armed men approaches, you prepare yourself for combat and grip your mace tightly. The closest of the attackers charges towards you, swinging his sword wildly. You dodge his attack with ease and retaliate with a swift strike of your mace. Your weapon connects with the attacker's shoulder, causing him to stagger and cry out in pain.

The other attackers hesitate for a moment as they see their comrade fall, but then they surge forward to attack. Durin charges towards them, his sword flashing in the sunlight, while Lyra takes cover behind a nearby cart and begins firing arrows at the attackers. Balfur unleashes a blast of fiery energy towards the attackers, causing them to back off momentarily.

Meanwhile, Gareth rushes to your side, his longsword at the ready. He stands beside you and says, "Fear not, Aldric, for the light of Heironeous shines upon us. Together, we shall overcome these villains!"

What do you do next, Aldric?

After the fight, we handed the rogues to the city guard and Aldric suggested going to a pub in search for rumours about potential adventure.

Durin nods in agreement and says, "Aye, a good pint and some talk of treasure is always welcome. I know of a nearby tavern that serves a fine ale. Let's head there and see what we can learn."

Lyra adds, "And perhaps we can pick up some work along the way. I wouldn't mind a few extra coins in my pocket."

Balfur agrees, "Yes, and I can use some of that coin to buy more spell components. Let's go."

Gareth nods in approval and says, "It is wise to gather information before embarking on any quest. Let us proceed to the tavern."

You all make your way to the tavern and find a table to sit at. The tavern is filled with patrons, some of whom are boisterously drinking and gambling, while others are quietly nursing their drinks. The bartender greets you with a smile and asks what he can get for you.

We had a few leads in the pub, and we decided to follow the route of an old map in the library, leading to ancient temple ruins. After finding the location in the map, buying some equipment like 100' of rope, a few sacks and a lantern we headed for the temple. Once arrived, we found a closed stone door that seemed too strong to overcome. But we searched for concealed passages:

As you examine the concealed door, you notice that it appears to be locked. It is made of a heavy, dark wood and is fitted with iron bands and hinges. There is a small keyhole, but no visible handle or latch.

Balfur suggests using his spell "knock" to try to open the door, but Gareth expresses concern that the noise from the spell could alert any potential enemies or monsters inside. Lyra volunteers to try picking the lock instead.

Lyra produces a set of lockpicks and begins working on the lock. After a few tense moments, you hear a satisfying click as she successfully opens the lock. The door creaks open, revealing a dark, dusty passage leading down into the depths of the structure.

… and so on. After some initial training, it consistently plays the other members of my group, and they even have a strong degree of individual character. It doesn't track so well, how many gold pieces and which equipment items my character owns, but this could be overcome by storing and providing such values over the API.

On the other hand, it knows the rules of the game quite well, including all the abilities of a priest, a paladin or a wizard. The GM is very friendly and actions are often successful, but I'm sure I can tell the AI to increase the level of difficulty.

Frankly, I remember worse games with human GMs.

Summary

This was only a representative part of what I did, what I'm doing and what I will surely continue to do. NLP AI has reached the level to be a universal tool that can help in a broad variety of questions, from everyday tasks ("How should I serve orange/buttermilk ice cream?" - "On orange slices and with a piece of shortbread.") to complex data creation and conversion. From summarizing text to being a gaming companion.

We are just observing a breakthough. A disruptive change that will change the everyday life, including and particularly work, and this time not only for workers in factories, but also for technical writers, software developers, agents in insurance companies, call center employees and so on. And soon. And radically.

We need to learn a new way of communicating with machines. We are used to sending queries to databases and receiving precise responses that are as correct or incorrect as the data in the database, no creativity!

Now we can explain a problem and receive a reply that creates the impression the machine has understood our intention and what's missing to reach our aim, and also seems to be able to tune into the knowledge and experience we already have with the topic. When I ask "Can you summarize how a fusion reactor works?" I will receive a different technology level in my response than when I have a question about a specific nuclear reaction.

Just like with human beings, not everything that the AI says is correct, neither "facts" nor code example. The AI feeds on vast data from different sources, and not all of the input is trustworthy. "Facts" about software products can be based on independent tests or user experience, but also on vendors' websites with a rather optimistic view on the product. But even if the data were perfect, the inference is not, it's just as good as the feedback-based training (another complex field).

All the chats I had with ChatGPT only contained data not touching intellectual property or personal data. So, for solving general, non-customer-specific problems, it's great. But what if I want to upload a customer's terminology, or worse, internal engineering data, over the API and it ends up in the hands of a competitor? Can the AI model be equipped with barriers to intercept such data flow? Or would we have to run an AI instance per tenant? And how will legal requirements evolve to cover these aspects?

To me, this moment of technological progress seems to unite the idea: "Wow, we've really gone a long way!" with the contrary idea: "We're at the very beginning."

I am very curious (with a certain share of concern) where we are in six months. In two years. In a decade…

By Boris Horner, April 2023

Note: See TracWiki for help on using the wiki.