Backpropagation of wrongly (double) encoded CSV

I have a CSV file that someone encoded wrongly. It looks like this:

movieId,title,actors(...)61,Eye for an Eye (1996),(a ton of other actors)|Dolores VelÌÁzquez|(more actors)59,The Confessional (1995),(a ton of other actors)|Richard FrÌ©chette|FranÌ¤ois Papineau|Marie Gignac|Normand Daneau|Anne-Marie Cadieux|Suzanne ClÌ©ment|Lynda Beaulieu|Pascal Rollin|Billy Merasty|Paul HÌ©bert|Marthe Turgeon|Adreanne Lepage-Beaulieu|AndrÌ©e-Anne ThÌ©roux-Faille|Rodrigue Proteau|Philippe Paquin|Pierre HÌ©bert|Nathalie D'Anjou|Danielle Fichaud|Jules Philip|Jacques Laroche|Claude-Nicolas Demers|Jean-Philippe CÌ«tÌ©|Tristan Wiseman|Marc-Olivier Tremblay|Jacques Brouillet|Jean-Paul L'Allier|Denis Bernard|RenÌ©e Hudon|Serge Laflamme|Carl Mathieu(...)

Now as you can see, instead of Umlauts and letters with accents (ÄÖÜ, É, À, Û etc.), the actors have a combination of other special characters instead. I suspect this is because it was encoded two times in a row, rendering two byte that belong together in UTF-8 encoding, to form one Umlaut or letter with accent, into two separate UTF-8 symbols instead (taking the two bytes individually).

My goal is to restore the correct Umlauts etc.

I have found out that all broken Umlauts etc. follow the following scheme:The first byte is an "Ì" and then there is a second symbol, unless the original letter was an "Á", like in "Ángel", which would be "Ìngel" in the CSV that I have.

The broken letters seem to be case sensitive, so the original letters Á and á are not the same broken letter in the file.

I have tried every common encoding to rule out that this is just something very similar to UTF-8, but only UTF-8 comes close to being correct (the other encodings break more characters and the Umlauts etc. are always broken).

I have tried Regexing known Umlauts etc. for which i know the original actor name, and can therefore assume which broken combination can be backpropagated to which original letter. The problem is that it's not always a set of two letters, as you can see in the above example for Á, which only has one letter, so I can basically never replace this with regex, until all other replacements have been done, and on the way there, I have found that some replacements went wrong, so I suspect that there is possibly some combination of 3 bytes instead of 2 for very special letters.

I think this faulty CSV has been generated in Java.

Is there any way for me to

Find out which two encodings have happened after one another, which lead to the broken file and
Fix the errors somehow programmatically?

Edit: Here is a list of characters in the CSV that i have, and with respective original characters that I know:

current | original===================Ì_      | ü or ä or íÌ¦      | öÌÏ      | ÜÌÐ      | ÖÌµ      | õÌÁ      | áÌÙ      | ßåÁÌ     | ¡å¡2     | °C or ° (i am not sure)Ì¨      | îÌÈ      | ûÌ«      | ô

Backpropagation of wrongly (double) encoded CSV

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112