---
title: "Translating Sumerian Texts"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
vignette: >
  %\VignetteIndexEntry{Translating Sumerian Texts}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  dev = "ragg_png"
)
```

## 1. Introduction

The first vignette ("Getting Started with sumer") introduced the basic concepts: cuneiform sign representations, dictionary lookup, the type system, text analysis, and the interactive `translate()` function. This vignette describes the complete workflow for translating an entire document and building a custom dictionary from the results.

The workflow consists of the following steps:

1. Set up a project folder for the text
2. Translate line by line using `translate_line()`
3. Build a custom dictionary from the translations
4. Merge it with an existing dictionary for future use

Each translated line improves the dictionary, and the improved dictionary makes the next translation easier. This creates a virtuous cycle.

```{r}
library(sumer)
```


## 2. Setting Up a Project

A translation project consists of a folder with the following structure:

```
project/
  complete_cuneiform_text.txt    # cuneiform text
  lines/                         # translated lines
    Line_1.txt
    Line_2.txt
    ...
```

The **text file** contains the cuneiform text. If you have a transliterated text, you can convert it with `as.cuneiform()`. Each line can optionally begin with a line number (e.g. `8)\t...`). Lines starting with `#` are treated as comments and ignored during analysis.

The **`lines/`** subfolder stores one file per translated line. The file for line *n* is called `Line_n.txt`. These files are created automatically by `translate_line()` when you click "Done".

The package includes an example project for the Sumerian myth "Enki and the World Order". Since the project folder inside the installed package is read-only, we first copy it to a temporary directory:

```{r}
pkg_path <- system.file("extdata", "project", package = "sumer")
file.copy(from = pkg_path, to = tempdir(), recursive = TRUE)
project_dir <- file.path(tempdir(), "project")

cat(head(list.files(project_dir, recursive=TRUE)), sep="\n")
```

Now we can set up the translation context and load the text:

```{r}
ctx <- translation_context(
  project_dir   = project_dir,
  text          = "enki_and_the_world_order.txt",
  dic           = file.path(pkg_path, "sumer-dictionary.txt"),
  mapping       = NULL,
  sentence_prob = 0.25
)

text <- readLines(ctx$text, encoding = "UTF-8")
```

The parameters of `translation_context()` are:

- **`project_dir`**: The project directory with subfolder `lines`.
- **`text`**: The file path (or filename relative to project_dir) of the file with the full cuneiform text.
- **`dic`**: One or more dictionary files. The first has priority for automatic suggestions.
- **`mapping`**: A custom sign mapping table (data frame or file path). If `NULL`, the package's built-in mapping is used. A custom mapping is needed when working with texts that contain signs not covered by the default table.
- **`sentence_prob`**: Corrects for verb underrepresentation in the dictionary. A value of 0.25 means that an estimated 25% of the dictionary entries come from complete sentences; verb probabilities are upweighted accordingly.


## 3. Interactive Translation with `translate_line()`

To translate a specific line, call:

```{r, eval = FALSE}
translate_line(8, ctx)
```

This opens the interactive translation tool. If a file `Line_8.txt` already exists in the `lines/` folder, the previous translation is loaded so that you can continue where you left off. When you click "Done", the result is saved back to the file.

Additionally, `translate_line()` builds on the fly a project-specific dictionary from all previously translated lines in the `lines/` folder that can be used in the Shiny App. The current line is excluded to avoid confirmation bias. This project dictionary appears alongside the primary dictionary in the lookup panel.

The gadget displays several sections on a scrollable page. The following sections describe each of them.


### 3.1 N-gram patterns

The first section displays frequent sign combinations (n-grams) computed from the entire text that appear in the current line. Recurring patterns point to fixed terms or compound words. Combinations that also appear in neighbouring lines are marked with a checkmark in the "Theme" column -- these are thematic connections across lines.

Outside the gadget, the same analysis is available through the functions `ngram_frequencies()` and `mark_ngrams()` (see Vignette 1, Section 5.1).


### 3.2 Sign combination suggestions

This section lists sign combinations from the current line for which one of the dictionaries offers a translation. This helps identify multi-sign expressions that have known meanings and can guide you in setting up bracket structures.


### 3.3 Context

The neighbouring lines (up to 2 before and 2 after the current line) are shown with frequent n-grams marked in curly braces. This reveals patterns that repeat across line boundaries and helps understand the thematic flow of the text.

Outside the gadget, you can mark n-grams in any text with `mark_ngrams()` (see Vignette 1, Section 5.1).


### 3.4 Grammar probabilities

A bar chart shows the probability of each grammatical type for each sign in the current line, based on the dictionary. This is the same visualization produced by `plot_sign_grammar()` (see Vignette 1, Section 5.2). Tall green bars suggest a noun (S), red bars suggest a verb (V), and blue bars suggest an operator producing an attribute (A).


### 3.5 Translation

The main interactive section is the translation area. Here you see the skeleton template with input fields for type and translation. This is where the actual translation work happens -- assigning types, looking up dictionary entries, adjusting the bracket structure, and composing translations. The basic mechanics (green lookup button, brown compose button, bracket input, "Update Skeleton") are described in Vignette 1, Section 6. Verb prefixes and suffixes are explained in Vignette 1, Section 4.3.

Let us demonstrate the translation process on **line 8**, which features verb prefixes.


#### Line 8 in detail

```{r}
i <- which(startsWith(text, "8)"))
cat(text[i], sep = "\n")
cat(as.sign_name(text[i]), sep = "\n")
```

Line 8 contains two sentences:

**First sentence: 𒀭𒂗𒆠𒂗𒃶𒅅𒆷** 

Here, 𒀭𒂗𒆠 forms the subject (Enki), 𒂗 is the object ("cultural leader"), and 𒃶𒅅𒆷 is a complex verb with two prefixes:

| Sign          |  Type  | Translation                                          |
|---------------|--------|------------------------------------------------------|
| an=AN=𒀭      | ☒S→S  | the god of heaven who is S                           |
| en=EN=𒂗      | ☒S→S  | the cultural leader of S                             |
| ki=KI=𒆠      | S      | the Earth                                            |
| en=EN=𒂗      | S      | cultural leader                                      |
| gan=GAN=𒃶    | ☒V→V  | may V                                                 |
| ig=IG=𒅅      | ☒V→V  | V with the task of establishing sustenance of human existence |
| la=LA=𒆷      | Vt     | to equip S                                           |

The verb builds up from the core outward: 𒆷 (Vt) is the core verb, 𒅅 wraps it with additional meaning, and 𒃶 adds modality. The final composed verb is: "may equip S with the task of establishing sustenance of human existence" (Vt).

The bracket structure `((𒀭𒂗𒆠)𒂗(𒃶𒅅𒆷))` groups the subject `(𒀭𒂗𒆠)` and the verb `(𒃶𒅅𒆷)` as units within the sentence. This grouping guides the skeleton hierarchy and makes the compose button work correctly for each unit. Sign combinations that should later be included in a dictionary must be written in brackets.

**Second sentence: 𒀭𒀀𒉣𒈾𒆤𒉈** 

This sentence demonstrates the operator type `S☒→A`, which produces an attribute:

| Sign   Type   | Translation                            |
|---------------|--------|-------------------------------|
| an=AN=𒀭      | ☒S→S  | god of heaven with S          |
| a=A=𒀀        | S      | transformative power          |
| nun=NUN=𒉣    | S      | exaltedness                   |
| na=NA=𒈾      | S☒→S  | being bound to S              |
| ke4=KID=𒆤    | S☒→A  | who is defined as S           |
| ne=NE=𒉈      | V      | to be used as a resource      |

The sign sequence AN.A.NUN.NA.KID denotes the **Anunnaki**, the gods of the Sumerian pantheon. The compositional translation of this sign sequence is: "the gods of heaven with transformative power who are defined as being bound to exaltedness". Here, KID (`S☒→A`) transforms the noun phrase to its left into an attribute (A). The attribute then combines with the remaining noun phrase (S + A -> S) before meeting the verb.


#### The completed Line_8.txt

```
Structure: ((𒀭𒂗𒆠)𒂗(𒃶𒅅𒆷)). ((𒀭𒀀𒉣𒈾𒆤)𒉈).

|an-en-ki-en-gan-ig-la-an-a-nun-na-ke4-ne: SEN: Enki, the god of heaven
  who is the cultural leader of the Earth may equip cultural leaders with
  the task of establishing sustenance of human existence. The Anunnaki,
  the gods of heaven with transformative power who are defined as being
  bound to exaltedness are used as a resource.

|an-en-ki-en-gan-ig-la=AN.EN.KI.EN.GAN.IG.LA=𒀭𒂗𒆠𒂗𒃶𒅅𒆷: SEN: ...
|	an-en-ki=AN.EN.KI=𒀭𒂗𒆠: S: Enki, the god of heaven who is the
	  cultural leader of the Earth
|		an=AN=𒀭: ☒S→S: the god of heaven who is S
|		en=EN=𒂗: ☒S→S: the cultural leader of S
|		ki=KI=𒆠: S: the Earth
|	en=EN=𒂗: S: cultural leader
|	gan-ig-la=GAN.IG.LA=𒃶𒅅𒆷: Vt: may equip S with the task of
	  establishing sustenance of human existence
|		gan=GAN=𒃶: ☒V→V: may V
|		ig=IG=𒅅: ☒V→V: V with the task of establishing sustenance
		  of human existence
|		la=LA=𒆷: Vt: to equip S

|an-a-nun-na-ke4-ne=AN.A.NUN.NA.KID.NE=𒀭𒀀𒉣𒈾𒆤𒉈: SEN: ...
|	an-a-nun-na-ke4=AN.A.NUN.NA.KID=𒀭𒀀𒉣𒈾𒆤: S: The Anunnaki, the gods
	  of heaven with transformative power who are defined as being bound
	  to exaltedness
|		an=AN=𒀭: ☒S→S: god of heaven with S
|		a=A=𒀀: S: transformative power
|		nun=NUN=𒉣: S: exaltedness
|		na=NA=𒈾: S☒→S: being bound to S
|		ke4=KID=𒆤: S☒→A: who is defined as S
|	ne=NE=𒉈: V: to be used as a resource
```


## 4. Working with Dictionaries

### 4.1 Formatting rules for dictionary entries

The line files produced by `translate()` and `translate_line()` use a pipe format where each entry starting with `|` becomes a dictionary entry. When building a dictionary from these files, some automatic normalization is applied. It is helpful to understand these conventions when writing translations:

**Curly braces `{specific meaning}`** in a translation indicate a context-specific interpretation. For example, "container {country}" means that the general compositional meaning is "container" but in this context it refers to "country". When composing entries with the compose button, only the specific meaning inside the curly braces is used for substitution.

**Angle brackets `<comment>`** in a translation contain comments or annotations. The text inside angle brackets is stripped out from translations. This can be used to add explanatory notes, for example: "S &lt;the agent of the transitive verb&gt;".

**Leading articles are removed.** Nouns and noun phrases should be translated as they fit into the sentence, including articles where appropriate. When the dictionary is built, leading articles ("the", "a", "an") at the beginning of a translation string are automatically stripped. This ensures clean dictionary entries while allowing natural English in the line files.

**Verbs should be in base form.** Verb translations should be in their base form, optionally preceded by "to" (e.g. "to create" or "create"). A leading "to" is automatically removed when building the dictionary.


### 4.2 Creating a dictionary

Once you have translated several lines, the line files in the `lines/` folder can be combined into a dictionary. The function `make_dictionary()` reads all line files and aggregates the entries:

```{r}
line_files <- list.files(ctx$line_folder, full.names = TRUE)
head(basename(line_files))

project_dic <- make_dictionary(line_files)
```

The function counts how often each combination of sign name, type, and translation occurs across all files. Signs that appear frequently with the same meaning get higher counts, making them more reliable dictionary entries.

Let us inspect some entries:

```{r}
look_up("AN", project_dic)
```

The sign AN appears with multiple types and meanings, each with a count reflecting how often that particular usage was attested in the translated lines.


### 4.3 Merging dictionaries

A project dictionary built from a single text is most useful when combined with a broader dictionary. The function `merge_dictionaries()` combines two or more dictionaries:

```{r}
dic1 <- read_dictionary()

merged_dic <- merge_dictionaries(dic1, project_dic)
```

Translation entries that agree in sign name, type, and meaning are merged by summing their counts. Cuneiform and reading rows are taken from the first dictionary.

We can verify that the merged dictionary contains entries from both sources:

```{r}
look_up("LAM", dic1)
look_up("LAM", project_dic)
look_up("LAM", merged_dic)
```

**A note on combining dictionaries:** Merging is most meaningful when the underlying texts come from comparable periods and regions of Mesopotamia. The same sign can carry different meanings across time and place, so combining dictionaries from widely different epochs may produce misleading frequency counts.


### 4.4 Saving dictionaries

The completed dictionary can be saved with metadata:

```{r, eval = FALSE}
save_dictionary(
  dic     = merged,
  file    = "my_dictionary.txt",
  author  = "My Name",
  year    = "2026",
  version = "1.0",
  url     = "https://example.com/dictionary"
)
```

The saved dictionary can be loaded in future sessions with `read_dictionary("my_dictionary.txt")` and used as the primary dictionary for new translation projects.


## 5. The Cycle

The workflow described in this vignette forms a self-reinforcing cycle:

1. **Translate** a line using `translate_line()`, guided by the dictionary and n-gram analysis.
2. **Save** the result. The line file is written automatically when you click "Done".
3. **Reuse** the translations. The next time you call `translate_line()`, it automatically builds a project dictionary from all saved lines (excluding the current one). This project dictionary appears alongside the primary dictionary, providing suggestions based on your own previous work.
4. **Build** a dictionary with `make_dictionary()` and merge it with an existing one using `merge_dictionaries()`.
5. **Translate another text** with the improved dictionary.

With each translated line, the dictionary grows. Frequent signs and expressions accumulate higher counts, and the automatic pre-filling of translation templates becomes increasingly accurate. Over time, you build a comprehensive dictionary grounded in your own texts.