[SOLVED] Find duplicates and replace them with adjacent value

Dracaryu · November 5, 2019, 7:17am

Hi everyone,
I have a problem, and I would like to ask for help.

I have two text files.

The first one contains multiple lines in this format:
characterstring : hash
characterstring1 : hash1
characterstring2: hash2

The second one is in this format:
hash: plaintext
hash2 : plaintext2
hash3 : plaintext3

I am looking for a way to replace in file 1 all the hashes by their plain text versions contained in the file 2.

The result should be a third file in this format:
characterstring1:plaintext1
characterstring2:plaintext2
characterstring3:plaintext3

Can anyone help me to achieve this? If possible with a python script

Thanks

TheJoker · November 5, 2019, 8:58am

There are some sample python script available, if you can develop your own using these will do the work!

https://pythontesting.net/python/regex-search-replace-examples/

Python: Find Replace by Regex

Dracaryu · November 5, 2019, 1:09pm

Sadly, I still struggle to achieve what I’m looking for

TheJoker · November 5, 2019, 2:23pm

It’s hard to scan big text files with any tools, coded script will do the work for you.

Shunya · November 5, 2019, 2:55pm

Hi @Dracaryu , You can use pandas for that!
Something like a vlook operation.

Eg: pokeman is a csv file say
with data in the below

gist.github.com

https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6

pokemon.csv

#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
6,Charizard,Fire,Flying,534,78,84,78,109,85,100,1,False
6,CharizardMega Charizard X,Fire,Dragon,634,78,130,111,130,85,100,1,False
6,CharizardMega Charizard Y,Fire,Flying,634,78,104,78,159,115,100,1,False

This file has been truncated. show original

Now Try this

pokemon_names.map(pokemon_types)

pokemon_names = pd.read_csv("pokemon.csv", usecols= ["Pokemon"], squeeze= True)

pokemon_types = pd.read_csv("pokemon.csv", index_col= "Pokemon", squeeze= True).to_dict()

pokemon_names.map(pokemon_types).head()

Does this help you?

Dracaryu · November 5, 2019, 3:06pm

I am not used to that, but I will try.
Thank you for your help.

Shunya · November 5, 2019, 3:07pm

How long is the text with the hash?

Dracaryu · November 5, 2019, 3:10pm

530k for the first one.
174k for the second one.

Shunya · November 5, 2019, 3:16pm

Oh, you can try pandas, But it might be less efficient.
Here, I found some more helpful links.

This might actually help you.

Dracaryu · November 5, 2019, 3:21pm

Thanks. I will check that.

Friendly Websites

[SOLVED] Find duplicates and replace them with adjacent value

There are some sample python script available, if you can develop your own using these will do the work!

Python: Find Replace by Regex

It’s hard to scan big text files with any tools, coded script will do the work for you.