Environment: python3.
There are many files ,some of them encoding with gbk,others encoding with utf-8.I want to extract all the jpg with regular expression
For s.html encoding with gbk.
tree = open("/tmp/s.html","r").read()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 135: invalid start byte
tree = open("/tmp/s.html","r",encoding="gbk").read()pat = "http://.+\.jpg"result = re.findall(pat,tree)print(result)
['http://somesite/2017/06/0_56.jpg']
It is a huge job to open all the files with specified encoding,i want a smart way to extract jpg urls in all the files.