Quantcast
Channel: Active questions tagged utf-8 - Stack Overflow
Viewing all articles
Browse latest Browse all 1223

Getting "stream did not contain valid UTF-8" while trying pull in data into a Pandas dataframe in Azure Machine Learning

$
0
0

I have some data stored inside a storage account in Azure.

I have created a datastore linking this storage account to the Azure Machine Learning workspace.I have created 2 data assets in the azure ML workspace :

  1. One for the individual parquet file containing the data
  2. Another for the folder that holds the file.

I want to pull this data into a pandas dataframe in the azure ML notebook.The folder will contain multiple files and I want to create a single dataframe using all these files so I want something that points to the folder and pulls in all the data from that folder into a data frame.

When I pull in the data for the individual file, I am able to populate the dataframe without any issue.However when I try to do the same for the the entire folder, I get errors.

This is the code I am using. It is generated by Azure itself when we go to the 'Consume' tab of the data asset.

import mltablefrom azure.ai.ml import MLClientfrom azure.identity import DefaultAzureCredentialml_client = MLClient.from_config(credential=DefaultAzureCredential())data_asset = ml_client.data.get("folder_name", version="1")path = {'folder': data_asset.path}tbl = mltable.from_delimited_files(paths=[path])df = tbl.to_pandas_dataframe()df

When I run this code, I get this error:

UserErrorException:Error Code: ScriptExecution.StreamAccess.UnexpectedNative Error: Dataflow visit error: ExecutionError(StreamError(Unknown("stream did not contain valid UTF-8", Some(Error { kind: InvalidData, message: "stream did not contain valid UTF-8" }))))VisitError(ExecutionError(StreamError(Unknown("stream did not contain valid UTF-8", Some(Error { kind: InvalidData, message: "stream did not contain valid UTF-8" })))))

=> Failed with execution error: error in streaming from input data sources ExecutionError(StreamError(Unknown("stream did not containvalid UTF-8", Some(Error { kind: InvalidData, message: "stream did notcontain valid UTF-8" })))) Error Message: Got unexpected error: streamdid not contain valid UTF-8. Error { kind: InvalidData, message:"stream did not contain valid UTF-8" }|

The data contains some different scripts like Chinese, Japanese and Hindi but that is not causing any issue when I try to pull in the data from the single file.


Viewing all articles
Browse latest Browse all 1223

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>