The documentation for the w14:textid and w14:paraid tags at http://msdn.microsoft.com/en-us/library/dd773080(v=office.12) says:
I haven't been able to find information about how this data is actually used, or whether it's safe to remove them from a document without changing anything critical. Does anybody have any thoughts about this?
Thanks,
Wayne
It's safe to remove.
This attribute is in the schemas.microsoft.com/.../wordml which is a Word 2010 extension, and not part of the spec. You'll see an mc:Ignorable attribute at the top of the document which includes the w14 prefix associated with this namespace.
Have a look at part 3 of the ISO spec (markup compatibility). This explains a mechanism by which the spec can be extended (as Word 2010 does) in such a manner that maintains backwards compatibility with apps that do not implement these extensions. The idea is that when loading a document, you go through a preprocessing stage where you remove any elements or attributes that are both listed as Ignorable and in namespaces that aren't part of the spec (there's also ProcessContent and MustUnderstand attributes which involve different processing). After that you have a "clean" XML file which only contains elements and attributes that are part of the spec.
From my reading of the descriptions of paraId, it's not safe to have two different paragraphs with the same id, so you should strip these out.