Saturday, 17 August 2013

Html Utility Pack not reading non-ASCII text correctly

Html Utility Pack not reading non-ASCII text correctly

When I parse a html document instead of getting Japanese text I get
something like:
�͂��߂܂��āB���̓C�t�T�[���ł��A21�΂ł��A�����b�R�ɂ���ł��܂��A���͓��{�̕��������������A�N�������ɓ��{�������邱�Ƃ��ł��܂����A����3�N�ԓ��{���׋����܂����A���̓t��&#6
5533;��X���p���A���r�A�������邱�Ƃɂ��������������邱�Ƃł��傤
^^���͓��{�l�̗F�B�ɉ�����A���������ɂ��闝�R�ł��A�ł́A�܂��B�C�t�T�[��
(^^)\r\n\t\t\t
The encoding in HtmlDocument is set to iso-2022-jp, which seems correct. I
also tried
HtmlWeb web = new HtmlWeb();
web.OverrideEncoding = Encoding.UTF8;
Any ideas?

No comments:

Post a Comment