Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1145

Converting a binary to a string variable in Polars (Python Library) with non-UTF-8 characters

$
0
0

I'm having trouble manipulating a dataset in Python which has non-UTF-8 characters. The strings are imported as a binary. But I am having issues converting the binary columns to strings where a cell has non UTF-8 characters.

A minimal working example of my issue is

import polars as plimport pandas as pdpd_df = pd.DataFrame([[b"bob", b"value 2", 3], [b"jane", b"\xc4", 6]], columns=["a", "b", "c"])df = pl.from_pandas(pd_df)column_names = df.columns# Loop through the column namesfor col_name in column_names:    # Check if the column has binary values    if df[col_name].dtype ==pl.Binary:        # Convert the binary column to string format        print(col_name)        df = df.with_columns(pl.col(col_name).cast(pl.String))

This throws an error when converting column b. For a solution, I'm fine converting any non-utf 8 characters to blanks.

Have tried many other suggestions for conversion in online suggestions, but I can't get any of them to work.


Viewing all articles
Browse latest Browse all 1145

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>