UTF8 인코딩을 사용하여 Excel에서 CSV로
가져 오기 파일로 사용하기 위해 CSV 파일로 변환해야하는 일부 스페인어 문자 (물결표 등)가있는 Excel 파일이 있습니다. 그러나 CSV로 저장하면 ASCII 문자가 아닌 "특수"스페인어 문자가 망가집니다. 또한 Mac에서 Excel 파일을 만드는 원래 사용자가 보낸 것처럼 보이는 왼쪽 및 오른쪽 따옴표와 긴 대시로이 작업을 수행하는 것 같습니다.
CSV는 텍스트 파일 일 뿐이므로 UTF8 인코딩을 처리 할 수 있으므로 Excel 제한 사항이라고 생각하지만 Excel에서 CSV로 가져오고 비 ASCII 문자를 유지하는 방법을 찾고 있습니다. 손대지 않은.
간단한 해결 방법은 Google 스프레드 시트를 사용하는 것입니다. 붙여 넣거나 (복잡한 수식이있는 경우에만 값) 시트를 가져온 다음 CSV를 다운로드합니다. 나는 단지 몇 개의 캐릭터를 시도했고 그것은 꽤 잘 작동합니다.
참고 : Google 스프레드 시트는 가져올 때 제한이 있습니다. 를 참조하십시오 여기 .
참고 : Google 스프레드 시트로 민감한 데이터에주의하세요.
편집 : 또 다른 대안 -기본적으로 VB 매크로 또는 추가 기능을 사용하여 강제로 UTF8로 저장합니다. 나는 이러한 솔루션을 시도하지 않았지만 합리적으로 들립니다.
OpenOffice 의 스프레드 시트 응용 프로그램 인 Calc가 CSV 데이터를 정말 잘 처리한다는 사실을 알게되었습니다 .
"다른 이름으로 저장 ..."대화 상자에서 "형식 옵션"을 클릭하여 CSV에 대한 다른 인코딩을 가져옵니다. LibreOffice 는 AFAIK와 동일한 방식으로 작동합니다.
Excel 시트를 "유니 코드 텍스트 (.txt)"로 저장합니다. 좋은 소식은 모든 국제 문자가 UTF8이 아니라 UTF16이라는 것입니다. 그러나 새 "* .txt"파일은 쉼표로 구분되지 않고 탭으로 구분되므로 실제 CSV가 아닙니다.
(선택 사항) 가져 오기에 TAB으로 구분 된 파일을 사용할 수없는 경우 자주 사용하는 텍스트 편집기를 사용하고 TAB 문자를 쉼표 ","로 바꾸십시오.
대상 응용 프로그램에서 * .txt 파일을 가져옵니다. UTF16 형식을 허용 할 수 있는지 확인하십시오.
UTF-16이 비 BMP 코드 포인트에 대한 지원으로 올바르게 구현 된 경우 정보 손실없이 UTF-16 파일을 UTF-8로 변환 할 수 있습니다. 나는 당신이 좋아하는 방법을 찾기 위해 당신에게 맡깁니다.
이 절차를 사용하여 Excel에서 Moodle로 데이터를 가져옵니다.
나는 이것이 오래된 질문이라는 것을 알고 있지만 OP와 같은 문제로 고심 하면서이 질문에 우연히 나왔습니다.
제공된 솔루션 중 실행 가능한 옵션을 찾지 못한 채 Excel을 사용하여이를 수행 할 수있는 방법이 있는지 알아보기 시작했습니다.
다행히도 문자 손실 문제는 xlsx 형식에서 csv 형식으로 저장할 때만 발생한다는 것을 알았습니다. 먼저 xlsx 파일을 xls에 저장 한 다음 csv에 저장하려고했습니다. 실제로 작동했습니다.
그것을 시도하고 그것이 당신을 위해 작동하는지 확인하십시오. 행운을 빕니다.
Unix에서 iconv 명령을 사용할 수 있습니다 (Windows에서도 libiconv 로 사용 가능 ).
Excel에서 명령 줄에 CSV로 저장 한 후 다음을 입력합니다.
iconv -f cp1250 -t utf-8 file-encoded-cp1250.csv > file-encoded-utf8.csv
(cp1250을 인코딩으로 바꾸는 것을 잊지 마십시오).
Works fast and great for big files like post codes database, which cannot be imported to GoogleDocs (400.000 cells limit).
The only "easy way" of doing this is as follows. First, realize that there is a difference between what is displayed and what is kept hidden in the Excel .csv file.
- Open an Excel file where you have the info (.xls, .xlsx)
- In Excel, choose "CSV (Comma Delimited) (*.csv) as the file type and save as that type.
- In NOTEPAD (found under "Programs" and then Accessories in Start menu), open the saved .csv file in Notepad
- Then choose -> Save As... and at the bottom of the "save as" box, there is a select box labelled as "Encoding". Select UTF-8 (do NOT use ANSI or you lose all accents etc). After selecting UTF-8, then save the file to a slightly different file name from the original.
This file is in UTF-8 and retains all characters and accents and can be imported, for example, into MySQL and other database programs.
This answer is taken from this forum.
You can do this on a modern Windows machine without third party software. This method is reliable and it will handle data that includes quoted commas, quoted tab characters, CJK characters, etc.
1. Save from Excel
In Excel, save the data to file.txt
using the type Unicode Text (*.txt)
.
2. Start PowerShell
Run powershell
from the Start menu.
3. Load the file in PowerShell
$data = Import-Csv C:\path\to\file.txt -Delimiter "`t" -Encoding BigEndianUnicode
4. Save the data as CSV
$data | Export-Csv file.csv -Encoding UTF8 -NoTypeInformation
Another one I've found useful: "Numbers" allows encoding-settings when saving as CSV.
"nevets1219" is right about Google docs, however if you simply "import" the file it often does not convert it to UTF-8.
But if you import the CSV into an existing Google spreadsheet it does convert to UTF-8.
Here's a recipe:
- On the main Docs (or Drive) screen click the "Create" button and choose "Spreadsheet"
- From the "File" menu choose "Import"
- Click "Choose File"
- Choose "Replace spreadsheet"
- Choose whichever character you are using as a Separator
- Click "Import"
- From the "File" menu choose "Download as" -> CSV (current sheet)
The resulting file will be in UTF-8
Using Notepad++
This will fix the corrupted CSV file saved by Excel and re-save it in the proper encoding.
- Export CSV from Excel
- Load into Notepad++
- Fix encoding
- Save
Excel saves in CP-1252 / Windows-1252. Open the CSV file in Notepad++. Select
Encoding > Character Sets > Western European > Windows-1252
Then
Encoding > Convert to UTF-8
File > Save
First tell Notepad++ the encoding, then convert. Some of these other answers are converting without setting the proper encoding first, mangling the file even more. They would turn what should be ’
into 達
. If your character does not fit into CP-1252 then it was already lost when it was saved as CSV. Use another answer for that.
For those looking for an entirely programmatic (or at least server-side) solution, I've had great success using catdoc's xls2csv tool.
Install catdoc:
apt-get install catdoc
Do the conversion:
xls2csv -d utf-8 file.xls > file-utf-8.csv
This is blazing fast.
Note that it's important that you include the -d utf-8
flag, otherwise it will encode the output in the default cp1252
encoding, and you run the risk of losing information.
Note that xls2csv
also only works with .xls
files, it does not work with .xlsx
files.
What about using Powershell.
Get-Content 'C:\my.csv' | Out-File 'C:\my_utf8.csv' -Encoding UTF8
Easiest way: No need Open office and google docs
- Save your file as "Unicode text file";
- now you have an unicode text file
- open it with "notepad" and "Save as" it with selecting "utf-8" or other code page that you want
- rename file extension from "txt" to "csv". This will result in a tab-delimited UTF-8 csv file.
- If you want a comma-delimited file, open the
csv
file you just renamed and replace all tabs with commas. To do this in Notepad on Win 10, simply select one tab field then clickCtrl+H
. In the window that opens, type a comma,
in the "Replace with" field then click "Replace All". Save your file. The result will be a comma-delimited UTF-8 csv file.
Don't open it with MS-Office anyway!!! Now you have a tab delimited CSV file. Or, a comma-delimited one if you applied step number 5.
As funny as it may seem, the easiest way I found to save my 180MB spreadsheet into a UTF8 CSV file was to select the cells into Excel, copy them and to paste the content of the clipboard into SublimeText.
Under Excel 2016, we have a CSV export option dedicated to UTF-8 format.
I was not able to find a VBA solution for this problem on Mac Excel. There simply seemed to be no way to output UTF-8 text.
So I finally had to give up on VBA, bit the bullet, and learned AppleScript. It wasn't nearly as bad as I had thought.
Solution is described here: http://talesoftech.blogspot.com/2011/05/excel-on-mac-goodbye-vba-hello.html
Assuming an Windows environment, save and work with the file as usual in Excel but then open up the saved Excel file in Gnome Gnumeric (free). Save Gnome Gnumeric's spreadsheet as CSV which - for me anyway - saves it as UTF-8 CSV.
Easy way to do it: download open office (here), load the spreadsheet and open the excel file (.xls
or .xlsx
). Then just save it as a text CSV file and a window opens asking to keep the current format or to save as a .ODF format. select "keep the current format" and in the new window select the option that works better for you, according with the language that your file is been written on. For Spanish language select Western Europe (Windows-1252/ WinLatin 1
) and the file works just fine. If you select Unicode (UTF-8
), it is not going to work with the spanish characters.
Save xls file (Excel file) as Unicode text=>file will be saved in text format (.txt)
Change format from .txt to .csv (rename the file from XYX.txt to XYX.csv
I have also came across the same problem but there is an easy solution for this.
- Open your xlsx file in Excel 2016 or higher.
- In "Save As" choose this option: "(CSV UTF-8(Comma Delimited)*.csv)"
It works perfectly and a csv file is generated which can be imported in any software. I imported this csv file in my SQLITE database and it works perfectly with all unicode characters intact.
Came across the same problem and googled out this post. None of the above worked for me. At last I converted my Unicode .xls to .xml (choose Save as ... XML Spreadsheet 2003) and it produced the correct character. Then I wrote code to parse the xml and extracted content for my use.
I have written a small Python script that can export worksheets in UTF-8.
You just have to provide the Excel file as first parameter followed by the sheets that you would like to export. If you do not provide the sheets, the script will export all worksheets that are present in the Excel file.
#!/usr/bin/env python
# export data sheets from xlsx to csv
from openpyxl import load_workbook
import csv
from os import sys
reload(sys)
sys.setdefaultencoding('utf-8')
def get_all_sheets(excel_file):
sheets = []
workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
all_worksheets = workbook.get_sheet_names()
for worksheet_name in all_worksheets:
sheets.append(worksheet_name)
return sheets
def csv_from_excel(excel_file, sheets):
workbook = load_workbook(excel_file,use_iterators=True,data_only=True)
for worksheet_name in sheets:
print("Export " + worksheet_name + " ...")
try:
worksheet = workbook.get_sheet_by_name(worksheet_name)
except KeyError:
print("Could not find " + worksheet_name)
sys.exit(1)
your_csv_file = open(''.join([worksheet_name,'.csv']), 'wb')
wr = csv.writer(your_csv_file, quoting=csv.QUOTE_ALL)
for row in worksheet.iter_rows():
lrow = []
for cell in row:
lrow.append(cell.value)
wr.writerow(lrow)
print(" ... done")
your_csv_file.close()
if not 2 <= len(sys.argv) <= 3:
print("Call with " + sys.argv[0] + " <xlxs file> [comma separated list of sheets to export]")
sys.exit(1)
else:
sheets = []
if len(sys.argv) == 3:
sheets = list(sys.argv[2].split(','))
else:
sheets = get_all_sheets(sys.argv[1])
assert(sheets != None and len(sheets) > 0)
csv_from_excel(sys.argv[1], sheets)
Excel typically saves a csv file as ANSI encoding instead of utf8.
One option to correct the file is to use Notepad or Notepad++:
- Open the .csv with Notepad or Notepad++.
- Copy the contents to your computer clipboard.
- Delete the contents from the file.
- Change the encoding of the file to utf8.
- Paste the contents back from the clipboard.
- Save the file.
A second option to "nevets1219" is to open your CSV file in Notepad++ and do a convertion to ANSI.
Choose in the top menu : Encoding -> Convert to Ansi
Encoding -> Convert to Ansi will encode it in ANSI/UNICODE. Utf8 is a subset of Unicode. Perhaps in ANSI will be encoded correctly, but here we are talking about UTF8, @SequenceDigitale.
There are faster ways, like exporting as csv ( comma delimited ) and then, opening that csv with Notepad++ ( free ), then Encoding > Convert to UTF8. But only if you have to do this once per file. If you need to change and export fequently, then the best is LibreOffice or GDocs solution.
Microsoft Excel has an option to export spreadsheet using Unicode encoding. See following screenshot.
open .csv fine with notepad++. if you see your encoding is good (you see all characters as they should be) press encoding , then convert to ANSI else - find out what is your current encoding
또 다른 해결책은 winword로 파일을 열고 txt로 저장 한 다음 Excel로 다시 여는 것입니다. 그러면 ISA가 작동합니다.
저장 대화 상자> 도구 단추> 웹 옵션> 인코딩 탭
저도 같은 문제를 가지고 건너 이 에 추가하고, 그것은에 대한 언급이다 엑셀 2007 년과 2010 년 옆에 엑셀 2013 년에 완벽하게 잘 작동합니다.
참고 URL : https://stackoverflow.com/questions/4221176/excel-to-csv-with-utf8-encoding
'developer tip' 카테고리의 다른 글
단위 테스트에 대한 합리적인 코드 커버리지 %는 무엇이며 그 이유는 무엇입니까? (0) | 2020.10.03 |
---|---|
멀티 코어 머신의 Node.js (0) | 2020.10.03 |
정규식에서 특정 단어를 부정하는 방법은 무엇입니까? (0) | 2020.10.03 |
원래 GitHub 저장소에서 분기 된 GitHub 저장소로 새 업데이트 가져 오기 (0) | 2020.10.03 |
일반 Git 저장소를 베어 저장소로 변환하는 방법은 무엇입니까? (0) | 2020.10.03 |