wordpress hit counter
Re: Excel dates are wrong - SpreadsheetML - Formats - OpenXML Developer

Re: Excel dates are wrong

Formats

Discussions about working with different Open XML Formats

Excel dates are wrong

  • rated by 0 users
  • This post has 11 Replies |
  • 0 Followers
  • Excel believes that 1900/2/29 is a valid date - which is wrong. Since dates are stored as an integer, all dates after this date are wrong. Is this intentional? The ECMA specification does not mention if this is the case.

    Could anyone from Microsoft clarify this issue?

     

  • More details:
    If you enter
    2/28/1900
    3/1/1900

    in Excel, the dates will be stored as 59 and 61 (see below [1])

    However, a simple C# program proves this is wrong:
                Console.WriteLine((new DateTime(1900, 2, 28) - new DateTime(1900, 1, 1)).Days + 1);
                Console.WriteLine((new DateTime(1900, 3, 1) - new DateTime(1900, 1, 1)).Days + 1);
    outputs
    59 and 60.


    [1]
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/5/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
      <dimension ref="A1:A2"/>
      <sheetViews>
        <sheetView tabSelected="1" workbookViewId="0">
          <selection activeCell="A3" sqref="A3"/>
        </sheetView>
      </sheetViews>
      <sheetFormatPr defaultRowHeight="15"/>
      <cols>
        <col min="1" max="1" width="9.7109375" bestFit="1" customWidth="1"/>
      </cols>
      <sheetData>
        <row r="1" spans="1:1">
          <c r="A1" s="1">
            <v>59</v>
          </c>
        </row>
        <row r="2" spans="1:1">
          <c r="A2" s="1">
            <v>61</v>
          </c>
        </row>
      </sheetData>
      <printOptions/>
      <pageMargins left="0.7" right="0.7" top="0.75" bottom="0.75" header="0.3" footer="0.3"/>
      <headerFooter/>
    </worksheet>


  • Anyone here?
  • That was actually a behavior introduced many years ago in order to have compabitility with Lotus. Lotus had that bug so for compatibility reasons, we decided to calculate workbooks the same way they did. The ecma documentation will be really clear about the expected behavior, and what the results/values in the file mean.

     

    -Brian

  • Brian,

    Thanks for your reply.
    Which way will the ecma documentation go?
    compatibility or correctness?

  • Hey John, one of the top goals for the Ecma TC is for the format to be 100% compatible with the existing set of Microsoft Office binary documents. If we broke people's spreadsheets when they move them into the new formats, then no one would use it.

    It's a very important point for folks to understand and I want to be really clear around that. The goal of our new formats from the start (even before we went to Ecma) was that we wanted to move the old formats into XML, and we wanted to do that with as little negative impact to the end user (since the average user doesn't really care about XML, they only care about the contents and appearance of their files).

    The documentation still has a ways to go, and we're working really hard in Ecma to fully document everything. You may notice that the WordprocessingML section is the furthest along. The Excel sections still have a ways to go (I'm actually in Sapporo right now for an Ecma meeting hosted by Toshiba)...

    -Brian

  • Brian,

    I did not question your hard work.
    However, since you criticized the open xml standard, it would be only fair that you would accept criticism for your own standard.

     

    I understand you intend to fully document all aspects of the format but I am skeptical so far. It looks like Microsoft approached the problem the wrong way.

    The “right” way would have been to define a clean standard and rewrite Office to recognize it. Instead, it looks like the standard is simply an “XMLized” version of your proprietary legacy format. Of course, it’s a step in the right direction but I am not convinced it will allow third party developers to be really comfortable when generating office files. Not only is the schema awkward in some areas but I fear there won’t be any reliable way to generate Office files.

     

    Here are some examples:

     

    The row element uses "ht" for the height and "customHeight" to specify that it's a custom height.
    1) Why use "ht" instead of "height" since you use "customHeight"
    2) Why use "customHeight" in the first place? If one sets the height attribute, presumably it's a custom height.

    Why encode dates as double? Encoding the actual date would take about the same amount of space and should not impact loading performance. The fact that Excel uses doubles internally should not pull the standard in the wrong direction.

     

    It looks Excel needs a fill with patternType=none in the style sheet. Otherwise, it fails to load the workbook. Not only do I not expect such a behavior to be documented but I fear that in the end the real standard will be whether Office opens the file correctly, not whether the file is "correct".

     

    Why use indexed colors? Why not use only rgb colors instead? Indexed colors are a legacy from the old time when palettes where actually used.

     

    When converting from legacy format, Excel creates parts with legacy binary format. For example, it still uses .emf images and oleObject.bin when it’s not needed. Why not just toss these formats unless absolutely necessary?

     

    For all the issues I mentioned, I don’t think there is a performance impact either in file size, memory or cpu usage.

     

    In general, Microsoft does a great job in forums. For example, I see many Microsoft employees posting on WinFX. However, it’s not the case for openxmldeveloper.org. Why not hire someone to answer questions in the forums?

     

    Don’t get me wrong. I really like the whole concept of using XML for Office files. I believe it’s the right thing to do. However, I am not convinced Microsoft is putting enough resources behind it. In fact, I would not be surprised if the “old guard” of Office was mostly against it. Pure speculation of course.

    In any case, I hope my criticism will prove constructive and look forward to your reply.

     





  • Oh, and I forgot:
    CT_Col's width's documentation:
    "Column width measured as the number of characters of the
    average digit width of the normal style's font.
    The possible values for this attribute are defined by the double
    type in the http//www.w3.org/2001/XMLSchema namespace.
    "
    Are you serious?
    OK, I want to generate an excel file with a column with width = 20 points?
    What is the value of width? Why not use points or pixels?

    Oh and Word uses "w" for the width while Excel uses "width".
    Why?

  • Have a read of this http://www.joelonsoftware.com/items/2006/06/16.html for some details on the joy of date formats in Excel - might put some colour into the background of this...
    www.alignment-systems.com
  • Thanks for the link.
    Very informative.
    I guess Brian Jones did not  get his BillG interview yet.....

  • Hmm, I guess Bill is a bit too busy trying to work out what to do with Warren Buffetts money...

    www.alignment-systems.com
  • Bill may be too busy. But what about other MS developers? It seems they are too busy to answer posts on this group.

Page 1 of 1 (12 items)