BUS5CA Assessment data file issue
BUS5CA Assessment data file issue
Issue: Non-English characters in text columns, causing issues in loading the file into SAS
The data file given in the assignments were coded to generate random dataset based on the last digit of the student ID. This file may generate records with non-English characters in the text fields which cause issues in loading the dataset into SAS.
SOLUTION: It is recommend that you should clean the dataset before loading into SAS and exclude those records from analysis.
If you have only a few such records, then you can manually remove them. Alternatively, you can consider using Excel sheet and use the below formula (this can be done in many other ways as well) to identity those records and sort the column and then remove the records with non-English characters before loading the data file into SAS.
Formula 1: =MAX(UNICODE(MID(C15,SEQUENCE(LEN(C15)),1)))>127
Alternatively,
Formula 2: =MAX(UNICODE(MID(C15,ROW(INDIRECT("1:"&LEN(C15))),1)))>127
ifyou don't have SEQUENCE in your Excel then try this instead)
Both above formula should work, use any of them. That should give you a TRUE if that column has non-English characters or FALSE (otherwise).
You can apply the same on the ReviewBody column separately and then apply sorting accordingly to remove those records.
Reference:
https://www.reddit.com/r/excel/comments/qbswix/how_to_filter_out_cells_that_are_not_in_english/