Transverbis | Translations

Summer 2018 update


That's Luxembourgish for hello. While on holidays in Luxembourg, I found some time to update the Celex aligner launched in 2016.

The most visible change is the automatic update of the results page. You won't have to hit F5 to see the results.

However, there are many more changes:

- while still awkward, the manual split of segments in Firefox and Internet Explorer works better;

- segments with aligner errors will be highlighted in yellow;

- segments with large source/target length differences (most likely alignment errors) will be highlighted in pink after pressing the corresponding quality check button;

- now EU treaties and CJUE documents can be processed too;

- the segmentation of consolidated versions has been improved;

- also the segmentation of Special Edition OJs (i.e. RO, BG, HR pre-accession translations) has been improved;

- the processing of large Eur-Lex files has been enabled, but note that you need a good computer to display those big HTML files.

That would be all for now!

Published by Filip, Aug. 5, 2018, 10:55 a.m.

Comments: 0

Christmas update: better user interface & Github repositories

Hi there!

We have updated our manual alignment interface in order to make it more cross-browser compatible. Although Google Chrome and Opera are our preferred browsers, now the manual alignment also works decently with Internet Explorer and Firefox. More details about browser compatibility is available on this Github document.

Yes, we have published our code on Github! There are two main repositories. The first repository, eunlp, contains the Python package running in the belly of this website and aligning those long EU documents. The second repository, jsalign, contains the code necessary to manually edit the alignments (you know, the HTML file spitted out by the Celex aligner).

Happy New Year 2018!

Published by Filip, Dec. 27, 2017, 9:22 p.m.

Comments: 0

Cut and paste paragraphs

There are cases where the paragraphs are ordered differently in the source and target segments. In such cases, you have to move them manually one by one. This can happen in particular when your directive or regulation includes an alphabetically-ordered glossary: the problem is that the ordering is specific to each language.

For such purposes we have introduced a pair of cut-paste buttons, to the right of the original buttons.

How does it work? 

The black button is used to mark the segments you want to move. When marked, the icon changes into a star and the other buttons are disabled:

Now these segments are selected. In order to move them, you have to click the Paste button (the rightmost button) of the segment below which you want to move the segments:

In the above example, I have clicked the Paste button in the "RO" segment and the two selected paragraphs have been moved right below that segment.

Enjoy the new functionalities and let us know what you think!

Published by Filip, March 1, 2016, 4:52 p.m.

Comments: 0

Faster manual alignment

We noticed that our manual alignment interface becomes slow for large files. Therefore, we made some improvements, of which the most important is that the nice buttons will be displayed by default just on the active paragraph, when the mouse is over it. Here's an example:

The mouse (not visible) is over the third paragraph in the left column. If you move the mouse outside the paragraph, the buttons disappear and new buttons are generated on the fly in the newly active paragraph.

Otherwise, the capabilities of the manual alignment tool are the same. We will update the tutorial further down the line, perhaps after we implement more improvements. Here's an example of an alignment job generated with the new interface.

Published by Filip, Feb. 25, 2016, 11:59 a.m.

Comments: 0

New drag and drop capabilities

We have added drag and drop capabilities to the manual alignment tool. Sometimes, the paragraphs are ordered differently in the source and target languages (e.g. alphabetically ordered glossaries). In such cases, it is very convenient to just drag the paragraph into the correct location. See an example below, where the fourth target language paragraph is moved into the third position:

You might need a little time to get used to the interface. The basic rule is as follows: when a dragged paragraph hovers over a paragraph,  it will be inserted below that paragraph.

Published by Filip, Feb. 23, 2016, 7:58 p.m.

Comments: 0

Using the Celex aligner

Our Celex aligner is finally on-lineYou can access it from the top menu.

What is Celex?

A Celex number is a unique code given to every regulation, directive or other type of EU document. For example, the Celex number of Regulation No 1024/2013 is "32013R1024", where "3" is a domain code, "2013" is the adoption year, "R" is used for regulations, and, finally, "1024" is the number of the document. You can find further details on the EUR-Lex website.

What is an alignment?

When translating a document, you often have some reference translations (together with the original text) you want to follow for consistency. Sometimes the client provides reference translations, sometimes you use your own past translations. 

If you translate using translation memory software, you'll want to introduce the reference translations into your database. For that purpose, you will have to create a parallel list of original and translated sentences. That process is called alignment.

A Celex aligner?

Some translators use EU documents extensively as a reference for their translations. Those EU documents are adopted in all EU languages and in many cases they are an official terminological reference for subsequent translations in certain fields.

However, manually aligning those documents is time-consuming and prone to frustrating software crashes. As EU laws are usually consistently formatted across all languages, we have found a way to align them automatically (well, most of the time).

The basic workflow

The alignment interface consists of a simple form where you enter the Celex number of the document you want to align, the source language and the target language. After pressing "Submit", you are taken to a page where you can download the TMX alignment (if the alignment is successful) and an HTML table where you can manually refine and save your alignment. If the alignment is unsuccessful, you get only the HTML file, which you can adjust and finally export into a TMX memory.

The form

The form has three fields and a submission button. The fields are as follows:

The Celex number field is a field where you can enter the required number. If you enter an invalid Celex number, the script will give you an error in the next screen.

The source and target languages are drop-down menus, where the language codes are ordered alphabetically.

Here's an example:

There are also some optional fields, but you can safely ignore them. At the end you'll see a big blue button:

After hitting the "Submit" button, you'll get a new page with something like that:

There's an automatically generated title with several technical details and a green or blue button with a text ("Executing" or "Waiting"). Then there's an empty table and a black box (the console) with some abstruse stuff. At the end of the process, which usually takes from 10 to 30 seconds, the table will be populated with the resulting file(s) and the console will include the error messages, if any.

The page will NOT refresh itself with the results. You have to refresh/reload it manually using the appropriate icon or the F5 key.

After reloading the page, you'll get the final results:

There'a nice green "Success" button and a list of output files. You will usually see three files in the list.  First, there's the TMX file, which you can readily import into your favorite translation memory tool. Second, there's an HTML file to manually refine the alignment. And thirdly, there's a "log.txt" file, which is usually empty.

If you want to download all the files at once, click the small arrow next to the "Download" button and choose your preferred format, .zip or. tar.gz:

A failed alignment

Sometimes the alignment fails. In that case, the resulting web page looks different:

The console header is red and there's also a red error message telling us that the alignment failed. The list of files comprises just two files: the HTML file and the "log.txt" file, wich actually contains the red error message in the console.

The HTML file contains an imperfect alignment, which you can correct manually, usually in a couple of minutes. You can either open the file immediately in the browser or download it and open it later. Note however that, in order to properly use the file, you need to be connected to the Internet when you load it in the browser.

The manual alignment

Let's open the HTML file:

Basically, there's a big table with two columns, with some instructions at the beginning and a big green button at the end.

Let's look more carefully at the first part:

First, the Celex code and the source and target languages are available in the top left corner.

Below them, there's a big backup "Save and continue" button. If you get tired of aligning you can click it to save your work in a new HTML file. You can reopen later that file to complete the alignment. 

On the right side of the header you will find a description of the buttons and other functionalities useful during the alignment.

Then comes the table. There are two columns, for the source and target languages.  A source language paragraph is automatically connected with the target language paragraph in the same row. You do not have to connect them one by one.

(These two segments are already connected to each other.)

Every paragraph has four buttons, which can be used to correct the alignment. Also, you can edit directly the text by clicking into a cell:

In certain cases, the paragraph is very long and doesn't fit completely in the cell. You can use the scrollbar to the right to view the entire text.

Now, let's see what those four buttons do.

Add a new paragraph

Suppose we have the following alignment:

You can see that the paragraph with the date is absent in the target language. You can fix that by adding a new paragraph below the first paragraph in the target language column. You add a new paragraph by clicking the green button in the cell above the insertion place. This will create a new cell with an editable text "<Add text here.>".

Then you edit the text as shown before and you're done.

Delete an existing paragraph

In some cases the alignment generates useless paragraphs, which you would rather delete.

To do that, click the second button (the red one). The buttons will change into a confirmation dialog:

If you press "Cancel", nothing will happen. If you press "Delete", the respective paragraph will be deleted:

Merging two paragraphs

In some cases two paragraphs should be combined into one paragraph. To do that, use the purple button in the first paragraph.

You are asked to confirm the action. 


If you click "Cancel", nothing happens. If you click "Merge", the two paragraphs will be combined.

Splitting a paragraph

You may also need to split large paragraphs like in the following example.

The first source language paragraph actually contains two separate sentences and we need to split it in order to align the text with the target language text. To do that, we use the orange button. We click it once to expand the cell and see the entire text. You'll see that the button changes too.

Now use your mouse to click wherever you want to split the cell into two separate cells. You would probably split the paragraphs between the words "alike." and "Coordination". After clicking between those words, you get the following outcome: